xgboost: 速度快效果好的boosting模型

hetong_007 · 2015年5月13日

[未知用户] 已修正，谢谢！

阿文 · 2015年5月15日

devtools::install_github('dmlc/xgboost',subdir='R-package')
Downloading github repo dmlc/xgboost@master
Installing xgboost
"C:/Program Files/R/R-31~1.3/bin/x64/R" --no-site-file --no-environ --no-save \
--no-restore CMD INSTALL \
"C:/Users/Administrator/AppData/Local/Temp/RtmpOGz8Fz/devtools153c65a7363d/dmlc-xgboost-3a7808d/R-package" \
--library="C:/Program Files/R/R-3.1.3/library" --install-tests

'C:\Program' 不是内部或外部命令，也不是可运行的程序
或批处理文件。
Error: Command failed (1)

64bit windows 7
Rstudio 3.1.3
求助

hetong_007 · 2015年5月15日

[未知用户] 请安装Rtools： http://cran.r-project.org/bin/windows/Rtools/

丁星光 · 2015年5月27日

在install.packages('xgboost')的时候，提示package ‘xgboost’ is not available (for R version 3.2.0) 。求指点啊。。rtools已装。。

hetong_007 · 2015年5月27日

[未知用户] 认真读这篇文章，你就会发现安装命令其实是

devtools::install_github('dmlc/xgboost',subdir='R-package')

丁星光 · 2015年5月27日

[未知用户] 能用了，感谢!!!

aaliwei91 · 2015年11月18日

xboost里面有没有类似gbm的shrinkage参数?

hetong_007 · 2015年11月18日

[未知用户] xgboost的eta就相当于shrinkage，更多参数请见https://xgboost.readthedocs.org/en/latest/parameter.html

aaliwei91 · 2015年11月18日

上面那个找到了是eta.....另一个问题是模型训练完之后如何得到具体的模型呢?比如我想把模型树出来,看看每一棵树具体的结构.

hetong_007 · 2015年11月18日

[未知用户] 可以参考xgb.dump()函数，要做图可以用xgb.plot.tree()函数

aaliwei91 · 2015年11月18日

谢谢！我先学习一下！

aaliwei91 · 2015年12月11日

比如训练用了500棵树，预测的时候想看从10，20，30...500各个模型的情况，有类似predict.gbm()里面n.trees类似的参数吗？

hetong_007 · 2015年12月11日

[未知用户] 设置 ntreelimit 参数

kent_2006 · 2016年1月6日

[未知用户] 请教：“目前支持直接从硬盘读取libsvm格式的文件。”，有没有将CSV文件或TXT文件转换为libsvm格式文件的脚本？

kent_2006 · 2016年1月6日

请教：“目前支持直接从硬盘读取libsvm格式的文件。”，有没有将CSV文件或TXT文件转换为libsvm格式文件的脚本？

kent_2006 · 2016年1月6日

再请教个问题，对于SVM来说，以二分类为例，当训练数据集的数据不平衡（比如，0类是1类的1%）时，SVM建模后的精度会降低，那么，对于XGBOOST也需要在建模前对数据不平衡问题先处理下吗？

hetong_007 · 2016年1月14日

[未知用户] 可以参考这个例子给不同类的数据点设置权重：https://github.com/dmlc/xgboost/blob/master/demo/kaggle-higgs/higgs-train.R#L17

kent_2006 · 2016年1月25日

我用xgboost过程中遇到2个问题：
1、安装过程中，遇到错误提示，“无法定位程序输入点_chsize_s于动态链接库msvcrt.dll上“，我用的语句是devtools::install_github(‘dmlc/xgboost’,subdir=’R-package’)；
2、我用CRAN上的xgboost包，安装后训练模型
bst <- xgboost(data = traindata, label = trainlebel, max.depth = 2, eta = 1, nround = 2, objective = "binary:logistic")
其中traindata是matrix类型，
但出现错误提示Error in xgb.DMatrix(data, label = label) : can not open file " 34"
请问是什么原因呢？
多谢了！

kent_2006 · 2016年2月3日

关于如何调xgboost参数的文档《Notes on Parameter Tuning》，其中有这么一段话，不太明白
"Handle Imbalanced Dataset
For common cases such as ads clickthrough log, the dataset is extremely imbalanced. This can affect the training of xgboost model, and there are two ways to improve it.
If you care only about the ranking order (AUC) of your prediction Balance the positive and negative weights, via scale_pos_weight Use AUC for evaluation.
If you care about predicting the right probability. In such a case, you cannot re-balance the dataset.In such a case, set parameter max_delta_step to a finite number (say 1) will help convergence"

我的问题是， "care about predicting the right probability" 和"care only about the ranking order (AUC) of your prediction"分别是什么意思，有什么区别呢？假设一个案例，比如，我们做手机用户离网预测，其中离网用户很少，未离网用户要多得多，是个不平衡的数据集，这种情况下，我用xgboost建模，该选择调整"max_delta_step" 还是调整 "scale_pos_weight" that I shoud tune?英语不太好，请教哈，谢谢！

Fisher. · 2016年5月3日

您好，我在使用xgboost的时候出现错误：Error in xgb.DMatrix(data, label = label) : can not open file "1"
请问这是什么原因？该怎么改进