来自插入符的交叉验证预测分配把不同的折叠

Question

问题说明

我想知道为什么"Fold1"中的预测实际上是我预定义的折叠中第二个折叠的预测.我附上我的意思的例子.

I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean.

# load the library
library(caret)
# load the cars dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)

model$pred$rowIndex[model$pred$Resample == 'Fold1'] %in% cv_folds[[2]]

Answer 1

正确答案

#1

'Fold1'的重采样数据是不在 cv_folds [[1]] .这些记录包含在 cv_folds 2-5中.这是正确的，因为您正在运行5倍交叉验证.测试了重采样折叠1相对于在折叠2-5上训练模型的能力.对重采样的第2折进行测试，以防对第1、3-5折进行训练，依此类推.

The Resample data of 'Fold1' are the records which are not in cv_folds[[1]]. These records are contained in cv_folds 2-5. This is correct as you are running a 5-fold cross-validation. Resample Fold 1 is tested against training the model on folds 2-5. Resample fold 2 is tested against training on folds 1, 3-5, and so on.

总结: Fold1 中的预测是在cv_folds 2-5上训练模型的测试预测.

In summary: The predictions in Fold1 are the test predictions from training a model on cv_folds 2-5.

基于评论

所有必需的信息都在model $ pred表中.我添加了一些代码进行说明:

All the needed info is in the model$pred table. I added a bit of code for clarification:

model$pred %>% 
  select(rowIndex, pred, Resample) %>%
  rename(predection = pred, holdout = Resample) %>% 
  mutate(trained_on = case_when(holdout == "Fold1" ~ "Folds 2, 3, 4, 5",
                                holdout == "Fold2" ~ "Folds 1, 3, 4, 5", 
                                holdout == "Fold3" ~ "Folds 1, 2, 4, 5", 
                                holdout == "Fold4" ~ "Folds 1, 2, 3, 5", 
                                holdout == "Fold5" ~ "Folds 1, 2, 3, 4"))

  rowIndex predection holdout       trained_on
1      610   13922.60   Fold2 Folds 1, 3, 4, 5
2      623   38418.83   Fold2 Folds 1, 3, 4, 5
3      604   12383.55   Fold2 Folds 1, 3, 4, 5
4      607   15040.07   Fold2 Folds 1, 3, 4, 5
5       95   33549.40   Fold2 Folds 1, 3, 4, 5
6      624   40357.35   Fold2 Folds 1, 3, 4, 5

基本上，需要与预测进一步叠加的是model $ pred表中的 pred 和 rowIndex 列.

Basicly what you need for further stacking with the predictions are the pred and rowIndex columns from the model$pred table.

rowIndex引用原始数据中的行.因此，rowIndex 610引用了汽车数据集中的记录610.您可以将数据与obs进行比较，obs是来自汽车数据集的Price列的值.

The rowIndex refers to the row from the original data. So rowIndex 610 refers to record 610 in the cars dataset. You can compare that the data in obs, which is the value of the Price column from the cars dataset.

这篇好文章是转载于：学新通技术网

来自插入符的交叉验证预测分配把不同的折叠

问题说明

正确答案

YouTube API 不能在 iOS (iPhone/iPad) 工作，但在桌面浏览器工作正常?

保持在后台运行的 iPhone 应用程序完全可操作

iPhone，一张图像叠加到另一张图像上以创建要保存的新图像?(水印)

使用 iPhone 进行移动设备管理

在android同时打开手电筒和前置摄像头

扫描 NFC 标签时是否可以启动应用程序?

检查邮件是否发送成功

Android微调工具-删除当前选择

希伯来语的空格句子标记化错误

Android App 和三星 Galaxy S4 不兼容