• 首页 首页 icon
  • 工具库 工具库 icon
    • IP查询 IP查询 icon
  • 内容库 内容库 icon
    • 快讯库 快讯库 icon
    • 精品库 精品库 icon
    • 问答库 问答库 icon
  • 更多 更多 icon
    • 服务条款 服务条款 icon

来自插入符的交叉验证预测分配把不同的折叠

用户头像
it1352
帮助1

问题说明

我想知道为什么"Fold1"中的预测实际上是我预定义的折叠中第二个折叠的预测.我附上我的意思的例子.

I am wondering why predictions from 'Fold1' are actually predictions from the second fold in my predefined folds. I attach an example of what I mean.

# load the library
library(caret)
# load the cars dataset
data(cars)
# define folds
cv_folds <- createFolds(cars$Price, k = 5, list = TRUE, returnTrain = TRUE)
# define training control
train_control <- trainControl(method="cv", index = cv_folds, savePredictions = 'final')
# fix the parameters of the algorithm
# train the model
model <- caret::train(Price~., data=cars, trControl=train_control, method="gbm", verbose = F)

model$pred$rowIndex[model$pred$Resample == 'Fold1'] %in% cv_folds[[2]]

正确答案

#1

'Fold1'的重采样数据是不在 cv_folds [[1]] .这些记录包含在 cv_folds 2-5中.这是正确的,因为您正在运行5倍交叉验证.测试了重采样折叠1相对于在折叠2-5上训练模型的能力.对重采样的第2折进行测试,以防对第1、3-5折进行训练,依此类推.

The Resample data of 'Fold1' are the records which are not in cv_folds[[1]]. These records are contained in cv_folds 2-5. This is correct as you are running a 5-fold cross-validation. Resample Fold 1 is tested against training the model on folds 2-5. Resample fold 2 is tested against training on folds 1, 3-5, and so on.

总结: Fold1 中的预测是在cv_folds 2-5上训练模型的测试预测.

In summary: The predictions in Fold1 are the test predictions from training a model on cv_folds 2-5.

基于评论

所有必需的信息都在model $ pred表中.我添加了一些代码进行说明:

All the needed info is in the model$pred table. I added a bit of code for clarification:

model$pred %>% 
  select(rowIndex, pred, Resample) %>%
  rename(predection = pred, holdout = Resample) %>% 
  mutate(trained_on = case_when(holdout == "Fold1" ~ "Folds 2, 3, 4, 5",
                                holdout == "Fold2" ~ "Folds 1, 3, 4, 5", 
                                holdout == "Fold3" ~ "Folds 1, 2, 4, 5", 
                                holdout == "Fold4" ~ "Folds 1, 2, 3, 5", 
                                holdout == "Fold5" ~ "Folds 1, 2, 3, 4"))

  rowIndex predection holdout       trained_on
1      610   13922.60   Fold2 Folds 1, 3, 4, 5
2      623   38418.83   Fold2 Folds 1, 3, 4, 5
3      604   12383.55   Fold2 Folds 1, 3, 4, 5
4      607   15040.07   Fold2 Folds 1, 3, 4, 5
5       95   33549.40   Fold2 Folds 1, 3, 4, 5
6      624   40357.35   Fold2 Folds 1, 3, 4, 5

基本上,需要与预测进一步叠加的是model $ pred表中的 pred rowIndex 列.

Basicly what you need for further stacking with the predictions are the pred and rowIndex columns from the model$pred table.

rowIndex引用原始数据中的行.因此,rowIndex 610引用了汽车数据集中的记录610.您可以将数据与obs进行比较,obs是来自汽车数据集的Price列的值.

The rowIndex refers to the row from the original data. So rowIndex 610 refers to record 610 in the cars dataset. You can compare that the data in obs, which is the value of the Price column from the cars dataset.

这篇好文章是转载于:学新通技术网

  • 版权申明: 本站部分内容来自互联网,仅供学习及演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,请提供相关证据及您的身份证明,我们将在收到邮件后48小时内删除。
  • 本站站名: 学新通技术网
  • 本文地址: /reply/detail/tanhcfjhaa
系列文章
更多 icon
同类精品
更多 icon
继续加载