来自插入符的交叉验证预测分配把不同的折叠
问题说明
我想知道为什么"Fold1"中的预测实际上是我预定义的折叠中第二个折叠的预测.我附上我的意思的例子.
I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean.
# load the library
library(caret)
# load the cars dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)
model$pred$rowIndex[model$pred$Resample == 'Fold1'] %in% cv_folds[[2]]
正确答案
'Fold1'
的重采样数据是不在 cv_folds [[1]] .这些记录包含在
cv_folds
2-5中.这是正确的,因为您正在运行5倍交叉验证.测试了重采样折叠1相对于在折叠2-5上训练模型的能力.对重采样的第2折进行测试,以防对第1、3-5折进行训练,依此类推.
The Resample data of 'Fold1'
are the records which are not in cv_folds[[1]]
. These records are contained in cv_folds
2-5. This is correct as you are running a 5-fold cross-validation. Resample Fold 1 is tested against training the model on folds 2-5. Resample fold 2 is tested against training on folds 1, 3-5, and so on.
总结: Fold1
中的预测是在cv_folds 2-5上训练模型的测试预测.
In summary: The predictions in Fold1
are the test predictions from training a model on cv_folds 2-5.
基于评论
所有必需的信息都在model $ pred表中.我添加了一些代码进行说明:
All the needed info is in the model$pred table. I added a bit of code for clarification:
model$pred %>%
select(rowIndex, pred, Resample) %>%
rename(predection = pred, holdout = Resample) %>%
mutate(trained_on = case_when(holdout == "Fold1" ~ "Folds 2, 3, 4, 5",
holdout == "Fold2" ~ "Folds 1, 3, 4, 5",
holdout == "Fold3" ~ "Folds 1, 2, 4, 5",
holdout == "Fold4" ~ "Folds 1, 2, 3, 5",
holdout == "Fold5" ~ "Folds 1, 2, 3, 4"))
rowIndex predection holdout trained_on
1 610 13922.60 Fold2 Folds 1, 3, 4, 5
2 623 38418.83 Fold2 Folds 1, 3, 4, 5
3 604 12383.55 Fold2 Folds 1, 3, 4, 5
4 607 15040.07 Fold2 Folds 1, 3, 4, 5
5 95 33549.40 Fold2 Folds 1, 3, 4, 5
6 624 40357.35 Fold2 Folds 1, 3, 4, 5
基本上,需要与预测进一步叠加的是model $ pred表中的 pred
和 rowIndex
列.
Basicly what you need for further stacking with the predictions are the pred
and rowIndex
columns from the model$pred table.
rowIndex引用原始数据中的行.因此,rowIndex 610引用了汽车数据集中的记录610.您可以将数据与obs进行比较,obs是来自汽车数据集的Price列的值.
The rowIndex refers to the row from the original data. So rowIndex 610 refers to record 610 in the cars dataset. You can compare that the data in obs, which is the value of the Price column from the cars dataset.
这篇好文章是转载于:学新通技术网
- 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
- 本站站名: 学新通技术网
- 本文地址: /reply/detail/tanhcfjhaa
-
YouTube API 不能在 iOS (iPhone/iPad) 工作,但在桌面浏览器工作正常?
it1352 07-30 -
保持在后台运行的 iPhone 应用程序完全可操作
it1352 07-25 -
iPhone,一张图像叠加到另一张图像上以创建要保存的新图像?(水印)
it1352 07-17 -
使用 iPhone 进行移动设备管理
it1352 07-23 -
在android同时打开手电筒和前置摄像头
it1352 09-28 -
扫描 NFC 标签时是否可以启动应用程序?
it1352 08-02 -
检查邮件是否发送成功
it1352 07-25 -
Android微调工具-删除当前选择
it1352 06-20 -
希伯来语的空格句子标记化错误
it1352 06-22 -
Android App 和三星 Galaxy S4 不兼容
it1352 07-20