回复 第6楼 的 谢益辉:I think he argues that when there are highly correlated predictors, a small change in the data would lead to a different set of selected predictors, as losso tends to randomly pick one of the predictors from a group of higly correlated predictors. Therefore, frequentists' way of inference, i.e., sentences like "conditional on the selected predictor, blah, blah", does not take account of the high uncertainty of the selected subset of predictors. In the Bayesian approach, we report a whole distribution instead of a single subset of selected variables and the uncertainty has be taken account of by marginalization. Moreover, Bayesian lasso can provide estimates of the joint probability of being selected into the model, which by itself is useful because it gives us knowledge about how predictors are related to each other.
If we like MCMC, then Bayesian penalized estiamtors are easy. But for extremely high dimensional data (like millions of predictors seen in genomics), its speed is still not quite practical, although many people are still using them. Many Bayesian estimators loose the nice sparsity property, which might be fine if the goal is only to do prediction, but is not desirable if we also need some real-world interpretattion.
I think his view of constrained total prior variance [eqn (3)] is quite interesting and deserves more thinking.