JRSSB上新出的文章：LASSO回顾

cloud_wei · 2011年4月22日

里面的一张牛图。牛paper的引用比较图：

[attachment=217306,1014]

rtist · 2011年4月22日

Fun to read.

He thinks that spike-and-slab could achieve sparseness (in posterior mean or median?), which is not intuitive to me.

linkim · 2011年4月24日

非常感谢！！

yihui · 2011年4月26日

谁能讲讲第二篇评论大概是啥意思（Holmes的）？没看懂……

还有收缩和变量选择的区别是啥？

windwail · 2011年4月26日

回复第6楼的谢益辉：是说group lasso相对于lasso的改进吧

Bayesian（确切的说是图模型）确实有这样的优点，可以纳入variable之间的dependency

由于引入更多的bias，在小数据集上可能有更好的表现

bootstrap · 2011年4月26日

回复第7楼的 windwail：group lasso实用意义不大，需要事先知道分组策略

bootstrap · 2011年4月26日

回复第1楼的 cloud_wei：这个主要是老T不服Tao他们做的Dantzig selector才搞的这个吧？给Dantzig selector的discusstion里面火药味太浓了，记忆犹新

bootstrap · 2011年4月26日

回复第7楼的 windwail：兄弟你到底是学什么专业的～

windwail · 2011年4月26日

回复第10楼的 bootstrap：四处瞎看而已啊。。。增加点见识，不求甚解。。。

rtist · 2011年4月26日

回复第6楼的谢益辉：I think he argues that when there are highly correlated predictors, a small change in the data would lead to a different set of selected predictors, as losso tends to randomly pick one of the predictors from a group of higly correlated predictors. Therefore, frequentists' way of inference, i.e., sentences like "conditional on the selected predictor, blah, blah", does not take account of the high uncertainty of the selected subset of predictors. In the Bayesian approach, we report a whole distribution instead of a single subset of selected variables and the uncertainty has be taken account of by marginalization. Moreover, Bayesian lasso can provide estimates of the joint probability of being selected into the model, which by itself is useful because it gives us knowledge about how predictors are related to each other.

If we like MCMC, then Bayesian penalized estiamtors are easy. But for extremely high dimensional data (like millions of predictors seen in genomics), its speed is still not quite practical, although many people are still using them. Many Bayesian estimators loose the nice sparsity property, which might be fine if the goal is only to do prediction, but is not desirable if we also need some real-world interpretattion.

I think his view of constrained total prior variance [eqn (3)] is quite interesting and deserves more thinking.

ruc_sir · 2011年4月26日

Thanks a lot!

rtist · 2011年4月26日

回复第8楼的 bootstrap：Agree. Also, our prior knowledge does not always produces discrete groups.

jiao_taishan · 2011年6月29日

好好学习下，谢谢楼主

colinisstudent · 2011年6月30日

我推荐另外一条研究线路

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized

likelihood and its oracle properties. Journal of the American Sta-

tistical Association 96, 1348–1360.

这篇文章中提到Lasso方法没有Oracle的性质。

Zou, H. (2006) The adaptive Lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418–1429.

这篇文章中提出了Adaptive Lasso方法，新的方法具有Oracle的性质。

lolo · 2011年6月30日

回复第16楼的 colinisstudent：什么是oracle性质,是不是就是什么无偏性这些?

colinisstudent · 2011年6月30日

回复第17楼的 lolo：嗯，有这样的意思。Oracle这个词对应的中文翻译叫做“神谕”，就是神的启示，它是指通过媒介（男女祭司或器物）传达神的难以捉摸或谜一般的启示或言语。在罚函数(比如LASSO)的研究领域,Oracle指的是以下的渐进性质

1.真值为0的参数的估计也为0

2.真值不为0的参数的估计值一致收敛到真值，并且协方差矩阵不受那些真值为0的参数估计的影响。

简而言之：罚函数的估计结果就好像事先已经得到了神的启示，知道哪些是真值为0的参数一样。可以参考

Donoho, D. L., and Johnstone, I . M. (1994a), “Ideal Spatial Adaptation by

Wavelet Shrinkage,” Biometrika, 81, 425–455.

但是范老师的文章里面已经解释的很清楚了。

temp_110 · 2011年12月8日

回复第8楼的 bootstrap：

此言差亦。很多实际场合分组是知道的，比如加性模型。在很多情况下，分组选择是一种要求，而不是选项。

twinsken · 2012年11月13日

学习下水好深

swimming · 2013年4月18日

回复第9楼的 bootstrap：求link看火药味很重的评价