用tf搭建多元线性回归模型

用tf搭建多元线性回归模型

Mr.GGLS 633 2022-04-17

用tf搭建多元线性回归模型

环境准备

有条件的可以使用google提供的colab,无需安装,即开即用(至福)

或者下载Anaconda,在conda里下载jupyter notebook,并安装如下包

  • tensorflow 2.8
  • numpy
  • pandas
  • matplotlib

目标

训练一个MLR模型预测California房价

环境准备

打开jupyter notebook,导入下列环境

import tensorflow as tf
from tensorflow import keras
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

数据准备

# 填你自己数据集的路径,数据集地址贴在文末
data = pd.read_csv('sample_data/california_housing_test.csv')
data.head(10)
# output
total_rooms	total_bedrooms	population	households	median_income	median_house_value
0	-122.05	37.37	27.0	3885.0	661.0	1537.0	606.0	6.6085	344700.0
1	-118.30	34.26	43.0	1510.0	310.0	809.0	277.0	3.5990	176500.0
2	-117.81	33.78	27.0	3589.0	507.0	1484.0	495.0	5.7934	270500.0
3	-118.36	33.82	28.0	67.0	15.0	49.0	11.0	6.1359	330000.0
4	-119.67	36.33	19.0	1241.0	244.0	850.0	237.0	2.9375	81700.0
5	-119.56	36.51	37.0	1018.0	213.0	663.0	204.0	1.6635	67000.0
6	-121.43	38.63	43.0	1009.0	225.0	604.0	218.0	1.6641	67000.0
7	-120.65	35.48	19.0	2310.0	471.0	1341.0	441.0	3.2250	166900.0
8	-122.84	38.40	15.0	3080.0	617.0	1446.0	599.0	3.6696	194400.0
9	-118.02	34.08	31.0	2402.0	632.0	2830.0	603.0	2.3333	164200.0

切分数据集为x和y

x = data.iloc[0:, 1:9]
x
# output
	latitude	housing_median_age	total_rooms	total_bedrooms	population	households	median_income	median_house_value
0	37.37	27.0	3885.0	661.0	1537.0	606.0	6.6085	344700.0
1	34.26	43.0	1510.0	310.0	809.0	277.0	3.5990	176500.0
2	33.78	27.0	3589.0	507.0	1484.0	495.0	5.7934	270500.0
3	33.82	28.0	67.0	15.0	49.0	11.0	6.1359	330000.0
4	36.33	19.0	1241.0	244.0	850.0	237.0	2.9375	81700.0
...	...	...	...	...	...	...	...	...
2995	34.42	23.0	1450.0	642.0	1258.0	607.0	1.1790	225000.0
2996	34.06	27.0	5257.0	1082.0	3496.0	1036.0	3.3906	237200.0
2997	36.30	10.0	956.0	201.0	693.0	220.0	2.2895	62000.0
2998	34.10	40.0	96.0	14.0	46.0	14.0	3.2708	162500.0
2999	34.42	42.0	1765.0	263.0	753.0	260.0	8.5608	500001.0

y

y = data.median_house_value
y
# output
0       344700.0
1       176500.0
2       270500.0
3       330000.0
4        81700.0
          ...   
2995    225000.0
2996    237200.0
2997     62000.0
2998    162500.0
2999    500001.0
Name: median_house_value, Length: 3000, dtype: float64

模型搭建

# 创建序列模型
model = keras.Sequential()
# Dense代表加一层,第一个参数1代表输出只有一个,input_shape=(x的列数,x的数量(不填也罢))
model.add(keras.layers.Dense(1, input_shape=(x.shape[1], )))

查看搭建的模型

model.summary()
# output
Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_4 (Dense)             (None, 1)                 9         
                                                                 
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0

添加优化器和loss函数

model.compile(optimizer='adam',
              loss='mse')
# 训练模型,history会保留模型训练的历史数据
history = model.fit(x, y, epochs=400)

测试模型

读取测试集

test = pd.read_csv('/content/sample_data/california_housing_test.csv')
test.head(10)
# output
total_rooms	total_bedrooms	population	households	median_income	median_house_value
0	-122.05	37.37	27.0	3885.0	661.0	1537.0	606.0	6.6085	344700.0
1	-118.30	34.26	43.0	1510.0	310.0	809.0	277.0	3.5990	176500.0
2	-117.81	33.78	27.0	3589.0	507.0	1484.0	495.0	5.7934	270500.0
3	-118.36	33.82	28.0	67.0	15.0	49.0	11.0	6.1359	330000.0
4	-119.67	36.33	19.0	1241.0	244.0	850.0	237.0	2.9375	81700.0
5	-119.56	36.51	37.0	1018.0	213.0	663.0	204.0	1.6635	67000.0
6	-121.43	38.63	43.0	1009.0	225.0	604.0	218.0	1.6641	67000.0
7	-120.65	35.48	19.0	2310.0	471.0	1341.0	441.0	3.2250	166900.0
8	-122.84	38.40	15.0	3080.0	617.0	1446.0	599.0	3.6696	194400.0
9	-118.02	34.08	31.0	2402.0	632.0	2830.0	603.0	2.3333	164200.0

切割测试集为x和y,用模型进行预测

test_x = test.iloc[0:, 1:9]
test_y = test.median_house_value
predict_y = model.predict(test_x)

绘制预测结果

x = np.linspace(0,100,100)
# 取前100个测试集y和预测的结果进行对比
# 也可以用model.evaluate(test_x, test_y)来衡量准确度
plt.plot(x, test_y[:100], "x-", label = "real")
plt.plot(x, predict_y[:100], "+-", label = "predict")

666

还不错嘛( ‵▽′)ψ

数据集地址

MrGGLS/Some_Codes


# tensorflow # python