nan.xiao 谢谢老师解答!我用set.seed(1:5000)做了lasso特征选择,代码如下:
library(glmnet)
number_metabo <- data.frame(ID = 1:5000 , number = NA)
T_LASSO_metabo=list(); length(T_LASSO_metabo)=5000
for(i in 1:5000){
set.seed(i)
cvfit_trainT <- cv.glmnet(as.matrix(train_Tdif_data[,3:28]) , as.matrix(train_Tdif_data$group) , family = "binomial" , type.measure = "auc" , nlambda = 50 , alpha = 1 , nfolds = 5)
coef_trainT <- coef(cvfit_trainT,s = cvfit_trainT$lambda.1se)
T_LASSO_metabo[[i]] <- rownames(coef_trainT)[which(coef_trainT!=0)][-1]
number_metabo[i,2] <- length(T_LASSO_metabo[[i]])
}
plot(number_metabo)
大概能得出10种不同的结果,但是stability selection 只有包含一个变量,如果使用只含有这一个变量的模型,在独立的test set中,求出的曲线下面积(AUC)效果不好。请问我可以选择独立test set的AUC结果较好的变量群写进文章,作为区分两组的markers吗?
还有第二个问题,希望老师解答:我使用Boruta做特征选择,Boruta()建立模型后,使用predict()计算预测值,报错如下:
library(Boruta)
set.seed(111)
boruta_trainT <- Boruta(group~.,data = train_Tdif_data[,2:22], doTrace = 2)
library(pROC)
Tboruta_predict <- predict(boruta_trainT,test_Tboruta_dvc)
Error in UseMethod("predict") :
no applicable method for 'predict' applied to an object of class "Boruta"
请问我应该如何采用Boruta选择的变量计算predictor?
谢谢老师!