用tf搭建多元线性回归模型
环境准备
有条件的可以使用google提供的colab
,无需安装,即开即用(至福)
或者下载Anaconda,在conda里下载jupyter notebook,并安装如下包
- tensorflow 2.8
- numpy
- pandas
- matplotlib
目标
训练一个MLR模型预测California房价
环境准备
打开jupyter notebook,导入下列环境
import tensorflow as tf
from tensorflow import keras
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
数据准备
# 填你自己数据集的路径,数据集地址贴在文末
data = pd.read_csv('sample_data/california_housing_test.csv')
data.head(10)
# output
total_rooms total_bedrooms population households median_income median_house_value
0 -122.05 37.37 27.0 3885.0 661.0 1537.0 606.0 6.6085 344700.0
1 -118.30 34.26 43.0 1510.0 310.0 809.0 277.0 3.5990 176500.0
2 -117.81 33.78 27.0 3589.0 507.0 1484.0 495.0 5.7934 270500.0
3 -118.36 33.82 28.0 67.0 15.0 49.0 11.0 6.1359 330000.0
4 -119.67 36.33 19.0 1241.0 244.0 850.0 237.0 2.9375 81700.0
5 -119.56 36.51 37.0 1018.0 213.0 663.0 204.0 1.6635 67000.0
6 -121.43 38.63 43.0 1009.0 225.0 604.0 218.0 1.6641 67000.0
7 -120.65 35.48 19.0 2310.0 471.0 1341.0 441.0 3.2250 166900.0
8 -122.84 38.40 15.0 3080.0 617.0 1446.0 599.0 3.6696 194400.0
9 -118.02 34.08 31.0 2402.0 632.0 2830.0 603.0 2.3333 164200.0
切分数据集为x和y
x = data.iloc[0:, 1:9]
x
# output
latitude housing_median_age total_rooms total_bedrooms population households median_income median_house_value
0 37.37 27.0 3885.0 661.0 1537.0 606.0 6.6085 344700.0
1 34.26 43.0 1510.0 310.0 809.0 277.0 3.5990 176500.0
2 33.78 27.0 3589.0 507.0 1484.0 495.0 5.7934 270500.0
3 33.82 28.0 67.0 15.0 49.0 11.0 6.1359 330000.0
4 36.33 19.0 1241.0 244.0 850.0 237.0 2.9375 81700.0
... ... ... ... ... ... ... ... ...
2995 34.42 23.0 1450.0 642.0 1258.0 607.0 1.1790 225000.0
2996 34.06 27.0 5257.0 1082.0 3496.0 1036.0 3.3906 237200.0
2997 36.30 10.0 956.0 201.0 693.0 220.0 2.2895 62000.0
2998 34.10 40.0 96.0 14.0 46.0 14.0 3.2708 162500.0
2999 34.42 42.0 1765.0 263.0 753.0 260.0 8.5608 500001.0
y
y = data.median_house_value
y
# output
0 344700.0
1 176500.0
2 270500.0
3 330000.0
4 81700.0
...
2995 225000.0
2996 237200.0
2997 62000.0
2998 162500.0
2999 500001.0
Name: median_house_value, Length: 3000, dtype: float64
模型搭建
# 创建序列模型
model = keras.Sequential()
# Dense代表加一层,第一个参数1代表输出只有一个,input_shape=(x的列数,x的数量(不填也罢))
model.add(keras.layers.Dense(1, input_shape=(x.shape[1], )))
查看搭建的模型
model.summary()
# output
Model: "sequential_5"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense_4 (Dense) (None, 1) 9
=================================================================
Total params: 9
Trainable params: 9
Non-trainable params: 0
添加优化器和loss函数
model.compile(optimizer='adam',
loss='mse')
# 训练模型,history会保留模型训练的历史数据
history = model.fit(x, y, epochs=400)
测试模型
读取测试集
test = pd.read_csv('/content/sample_data/california_housing_test.csv')
test.head(10)
# output
total_rooms total_bedrooms population households median_income median_house_value
0 -122.05 37.37 27.0 3885.0 661.0 1537.0 606.0 6.6085 344700.0
1 -118.30 34.26 43.0 1510.0 310.0 809.0 277.0 3.5990 176500.0
2 -117.81 33.78 27.0 3589.0 507.0 1484.0 495.0 5.7934 270500.0
3 -118.36 33.82 28.0 67.0 15.0 49.0 11.0 6.1359 330000.0
4 -119.67 36.33 19.0 1241.0 244.0 850.0 237.0 2.9375 81700.0
5 -119.56 36.51 37.0 1018.0 213.0 663.0 204.0 1.6635 67000.0
6 -121.43 38.63 43.0 1009.0 225.0 604.0 218.0 1.6641 67000.0
7 -120.65 35.48 19.0 2310.0 471.0 1341.0 441.0 3.2250 166900.0
8 -122.84 38.40 15.0 3080.0 617.0 1446.0 599.0 3.6696 194400.0
9 -118.02 34.08 31.0 2402.0 632.0 2830.0 603.0 2.3333 164200.0
切割测试集为x和y,用模型进行预测
test_x = test.iloc[0:, 1:9]
test_y = test.median_house_value
predict_y = model.predict(test_x)
绘制预测结果
x = np.linspace(0,100,100)
# 取前100个测试集y和预测的结果进行对比
# 也可以用model.evaluate(test_x, test_y)来衡量准确度
plt.plot(x, test_y[:100], "x-", label = "real")
plt.plot(x, predict_y[:100], "+-", label = "predict")
还不错嘛( ‵▽′)ψ