XGboost参数调优完整指南和预测附完整代码

噜噜啦啦咯

2024-04-20 帮助1人

导入所需要的库

subsample 和 colsample_bytree

调整正则项

学习速率

打印出最优模型

绘制 XGBoost 模型的 feature_importance 图像

用此模型进行预测

训练模型

得出预测图像并打印出RMSE

XGBoost 的模型建立将主要依靠陈天奇的 XGBoost 类库，参数的调优主要基于 python sklearn 类库的网格搜索方法选择最优的超参数。

导入所需要的库

from xgboost import XGBRegressor as XGBR
from sklearn.model_selection import KFold, cross_val_score as CVS, train_test_split as TTS
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import GridSearchCV
import pandas as pd
from numpy import nan as NA
import pickle

导入数据集并对其进行训练集和测试集的划分

data = pd.read_excel(r'C:\Users\HUAWEI\Desktop\pollution.xlsx')
X = data.iloc[:,1:7]
Y = data.iloc[:,0]
Xtrain,Xtest,Ytrain,Ytest = TTS(X,Y,test_size=0.1,random_state=420)

调参步骤

树的最大深度以及最小叶子节点样本权重

首先对这个值为树的最大深度以及最小叶子节点样本权重和这个组合进行调整。最大深度控制了树的结构，最小叶子节点样本权重这个参数用于避免过拟合。当它的值较大时，可以避免模型学习到局部的特殊样本。但是如果这个值过高，会导致欠拟合。

param_test1 = {'max_depth':range(3,10,2),'min_child_weight':range(2,7,2)}
gsearch1 = GridSearchCV(estimator =XGBR( learning_rate =0.1, n_estimators=140, max_depth=5,
min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'reg:linear',
nthread=4, scale_pos_weight=1, seed=27),
param_grid = param_test1, scoring='r2',n_jobs=4, cv=5)
gsearch1.fit(Xtrain,Ytrain)
gsearch1.best_params_, gsearch1.best_score_

gamma

再对参数 gamma 进行调整。在 XGBoost 节点分裂时，只有分裂后损失函数的值下降了，才会分裂这个节点。gamma 指定了节点分裂所需的最小损失函数下降值。这个参数的大小决定了模型的保守程度。参数越高，模型越不保守。

param_test3 = {'gamma':[i/100.0 for i in range(0,100)]}

subsample 和 colsample_bytree

再对参数 subsample 和 colsample_bytree 进行调整。subsample 控制对于每棵树的随机采样的比例。减小这个参数的值，算法会更加保守，避免过拟合。但是，如果这个值设置得过小，它可能会导致欠拟合。colsample_bytree 用来控制每棵随机采样的列数的占比(每一列是一个特征)。

param_test4 = {'subsample':[i/10.0 for i in range(1,10)],'colsample_bytree':[i/10.0 for i in range(1,10)]}

调整正则项

接着再对模型的 gamma 参数进行调整，控制模型的正则项，防止出现过拟合的现象。

param_test5 = {'reg_alpha':[0, 0.001, 0.005, 0.01, 0.05]}

学习速率

最后进行学习速率的调整，选择最优的学习速率最终确定适合的模型。

param_test6 = {'learning_rate':[0, 0.001, 0.005, 0.01, 0.05,0.1,0.5,1]}

打印出最优模型

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=0.8, enable_categorical=False,
gamma=0, gpu_id=-1, importance_type=None,
interaction_constraints='', learning_rate=0.1, max_delta_step=0,
max_depth=3, min_child_weight=2, missing=nan,
monotone_constraints='()', n_estimators=140, n_jobs=4, nthread=4,
num_parallel_tree=1, objective='reg:linear', predictor='auto',
random_state=27, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
seed=27, subsample=0.6, tree_method='exact', validate_parameters=1, ...)

绘制 XGBoost 模型的 feature_importance 图像

importance [0.19813064 0.13919306 0.09813585 0.04205703 0.02167364 0.50080985]

学新通

用此模型进行预测

训练模型

params = { 'base_score':0.5, 'booster':'gbtree', 'colsample_bylevel':1,
'colsample_bynode':1, 'colsample_bytree':0.8, 'enable_categorical':False,
'gamma':0, 'gpu_id':-1, 'importance_type':None,
'interaction_constraints':'','learning_rate':0.1, 'max_delta_step':0,
'max_depth':3, 'min_child_weight':2,
'monotone_constraints':'()', 'n_estimators':140,'n_jobs':4, 'nthread':4,
'num_parallel_tree': 1, 'objective':'reg:linear', 'predictor:':'auto',
'random_state':27, 'reg_alpha':0,'reg_lambda':1,'scale_pos_weight':1,
'seed':27, 'subsample':0.6, 'tree_method':'exact','validate_parameters':1}
dtrain = xgb.DMatrix(X_train, y_train)
num_rounds = 300
plst = list(params.items())
model = xgb.train(plst, dtrain, num_rounds)

得出预测图像并打印出RMSE

学新通

得出RMSE:9.473，拟合效果不错

这篇好文章是转载于：学新通技术网

XGboost参数调优完整指南和预测附完整代码

导入所需要的库

导入数据集并对其进行训练集和测试集的划分

调参步骤

树的最大深度以及最小叶子节点样本权重

gamma

subsample 和 colsample_bytree

调整正则项

学习速率

打印出最优模型

绘制 XGBoost 模型的 feature_importance 图像

用此模型进行预测

训练模型

得出预测图像并打印出RMSE

photoshop保存的图片太大微信发不了怎么办

《学习通》视频自动暂停处理方法

word里面弄一个表格后上面的标题会跑到下面怎么办

Android 11 保存文件到外部存储，并分享文件

photoshop扩展功能面板显示灰色怎么办

微信公众号没有声音提示怎么办

excel下划线不显示怎么办

excel打印预览压线压字怎么办

TikTok加速器哪个好免费的TK加速器推荐

怎样阻止微信小程序自动打开