JRSSB上新出的文章：LASSO回顾

cloud_wei

JRSSB上刚出来的综述性文章，Tibshirani写的，算是LASSO发展历史以及现状的纲领性文件了~~

Regression shrinkage and selection via the lasso: a retrospective

Robert Tibshirani

Keywords:

l1-penalty;Penalization;Regularization

Summary：

In the paper I give a brief review of the basic idea and some history and then discuss some developments since the original paper on regression shrinkage and selection via the lasso.

bigknife

最近正好在看lasso，LARS这方面的文章，楼主真是及时雨啊。拜谢拜谢。

cloud_wei

里面的一张牛图。牛paper的引用比较图：

[attachment=217306,1014]

rtist

Fun to read.

He thinks that spike-and-slab could achieve sparseness (in posterior mean or median?), which is not intuitive to me.

linkim

非常感谢！！

yihui

谁能讲讲第二篇评论大概是啥意思（Holmes的）？没看懂……

还有收缩和变量选择的区别是啥？

windwail

回复第6楼的谢益辉：是说group lasso相对于lasso的改进吧

Bayesian（确切的说是图模型）确实有这样的优点，可以纳入variable之间的dependency

由于引入更多的bias，在小数据集上可能有更好的表现

bootstrap

回复第7楼的 windwail：group lasso实用意义不大，需要事先知道分组策略

bootstrap

回复第1楼的 cloud_wei：这个主要是老T不服Tao他们做的Dantzig selector才搞的这个吧？给Dantzig selector的discusstion里面火药味太浓了，记忆犹新

bootstrap

回复第7楼的 windwail：兄弟你到底是学什么专业的～

windwail

回复第10楼的 bootstrap：四处瞎看而已啊。。。增加点见识，不求甚解。。。

rtist

回复第6楼的谢益辉：I think he argues that when there are highly correlated predictors, a small change in the data would lead to a different set of selected predictors, as losso tends to randomly pick one of the predictors from a group of higly correlated predictors. Therefore, frequentists' way of inference, i.e., sentences like "conditional on the selected predictor, blah, blah", does not take account of the high uncertainty of the selected subset of predictors. In the Bayesian approach, we report a whole distribution instead of a single subset of selected variables and the uncertainty has be taken account of by marginalization. Moreover, Bayesian lasso can provide estimates of the joint probability of being selected into the model, which by itself is useful because it gives us knowledge about how predictors are related to each other.

If we like MCMC, then Bayesian penalized estiamtors are easy. But for extremely high dimensional data (like millions of predictors seen in genomics), its speed is still not quite practical, although many people are still using them. Many Bayesian estimators loose the nice sparsity property, which might be fine if the goal is only to do prediction, but is not desirable if we also need some real-world interpretattion.

I think his view of constrained total prior variance [eqn (3)] is quite interesting and deserves more thinking.

ruc_sir

Thanks a lot!

rtist

回复第8楼的 bootstrap：Agree. Also, our prior knowledge does not always produces discrete groups.

jiao_taishan

好好学习下，谢谢楼主

colinisstudent

我推荐另外一条研究线路

Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized

likelihood and its oracle properties. Journal of the American Sta-

tistical Association 96, 1348–1360.

这篇文章中提到Lasso方法没有Oracle的性质。

Zou, H. (2006) The adaptive Lasso and its oracle properties. J. Am. Statist. Ass., 101, 1418–1429.

这篇文章中提出了Adaptive Lasso方法，新的方法具有Oracle的性质。

lolo

回复第16楼的 colinisstudent：什么是oracle性质,是不是就是什么无偏性这些?

colinisstudent

回复第17楼的 lolo：嗯，有这样的意思。Oracle这个词对应的中文翻译叫做“神谕”，就是神的启示，它是指通过媒介（男女祭司或器物）传达神的难以捉摸或谜一般的启示或言语。在罚函数(比如LASSO)的研究领域,Oracle指的是以下的渐进性质

1.真值为0的参数的估计也为0

2.真值不为0的参数的估计值一致收敛到真值，并且协方差矩阵不受那些真值为0的参数估计的影响。

简而言之：罚函数的估计结果就好像事先已经得到了神的启示，知道哪些是真值为0的参数一样。可以参考

Donoho, D. L., and Johnstone, I . M. (1994a), “Ideal Spatial Adaptation by

Wavelet Shrinkage,” Biometrika, 81, 425–455.

但是范老师的文章里面已经解释的很清楚了。

temp_110

回复第8楼的 bootstrap：

此言差亦。很多实际场合分组是知道的，比如加性模型。在很多情况下，分组选择是一种要求，而不是选项。

twinsken

学习下水好深