请教关于constrained linear regression如何通过R实现

lightman.guo

请教各位版友

我有1个因变量，2个自变量

对于其线性回归得到的beta1,beta2需要加以如下的限制，

lamda*(beta1^2+beta2^2)^0.5+(1-lamda)*(|beta1|+|beta2|)<C

C是一个合适的常数

请问各位这个C如何选取？这个过程应该如何在R中实现？

希望各位能给我一些启发与帮助！

先行谢过！

Ihavenothing

回复第1楼的 lightman.guo：

你这是elastic-net吧？有一个现成的enet包是解这个问题的。

lolo

回复第2楼的 Ihavenothing：

elastic-net不应是<bblatex>(1-\alpha)|\beta|_1+\alpha |\beta|^2\leq t</bblatex>么

rtist

回复第3楼的 lolo：Ohhhh, you are so careful! I'd thought it's enet, too. [s:18] But it doesn't seem to be so now...

dengyishuo

目前，好像没有现成的方法。

不过，可以写出来极大似然函数，然后，转化为带约束条件的优化问题来解决。

lightman.guo

回复第5楼的 dengyishuo：

请问版主，我感觉后面的（1-lamda)(|beta1|+|beta2|)应该是一个LASSO方法吧

是不是可以先对于模型变换，你可以得到将 lamda penalty变化进新的数据

整个过程就是等价于对变换后的变量做lasso，即只有 l1 penalty，随后再用lasso的方法处理？

但是您能不能传授一下，如何将lamda*(beta1^2+beta2^2)^0.5变换进原回归中？

大谢不言！

rtist

This is interesting (at least to me), and seems deserving some research (unless it is already studied... I don't know).

If we just use the L2 norm alone as a penalty, it is equivalent to using the square L2 norm, and hence the same as a ridge regression. L1 penalty produces lasso as usual. However, the convex combination of the two, which seems naturally following the idea of elastic net, produces a different criterion than the elastic net. Is it better or worse than the elastic net, and in what sense it is so?

More surprisingly, the univariate case of this penalty becomes the lasso, instead of any new penalized estimator. This suggests that the usual approach of studying the simple case of orthogonal designs is not helpful any more.

rtist

回复第6楼的 lightman.guo：

If you absorb the L2 penalty into the least squares part, it is no longer quadratic in <bblatex>\beta</bblatex>. Though it's doable, I don't know if there is any advantage in doing that.

lightman.guo

回复第8楼的 Rtist：Yeah, I see...but I'm really new to R, do you have some ideas on how to do it in convex optimization ways in R, thanks a lot!

rtist

回复第9楼的 lightman.guo：The Hessian and gradient can be found, but it turns out to be a little bit messy (at least not very informative to me as of how this penalty should behave). Compared with usual squared L2 penalty, this penalty affects both the quadratic term and the linear term simultaneously. In theory, you can still find the minimizer --- replace the L2 penalty with a quadratic approximation, and solve the resulting lasso problem; then replace the L2 with a new approximation, and iterate on and on. If you like, you can simply use a linear approximation, instead of a quadratic one. That "might" have some computational advantage, but I am concerned about its performance in high dimensional setting, where such methods are usually used.

rtist

I made some contour plots on the constrained region, compared with the elastic net.

It can be seen that they are indeed very similar, but not equivalent. For some choices of the constraint, they are nearly the same. For larger constraints, this one penalizes less than the elastic net, i.e., applying a prior with heavier tails than the elastic net prior. For smaller constraints, this one penalizes more than the elastic net, i.e., using a prior with a sharper peak at the origin. These are all reasonable behavior for the square root function.

However, the bad news is, overall, they seem to be too similar to justify routine use over the elastic net, the latter of which is much cheaper computationally. So, I'm not convinced this is a better alternative that deserves extensive study.

Could you please tell me why you want to use this penalty over the elastic net? Do you have particular applications where this penalty is more interpretable? Or do you believe this penalty has better theoretical properties? Thanks.

[attachment=217220,1008]

lightman.guo

回复第11楼的 Rtist：actually I don't know why. [s:12]I study management now and want to pursue a phd degree later...so I choose a statistics course, however, it is too late for me to drop this course when I realized it is too hard for me since I have no background in programming...I struggle for this course for really a long time. Anyway, I think I learn a lot from it...

I was thinking doing it this way.

For example, for lambda=1

the deviation of the likelihood would be like this

2t(X)%*%X+lambda/||lambda||-2t(X)%*%Y=0

Could you give me some instruction on how to do this convex optimization in R?

...

Really thanks a lot for your help. I'm really a beginner for R...

rtist

回复第12楼的 lightman.guo：I don't think there are any built in functions that directly do this. You will need some (maybe quite some) programming. Some useful functions / packages are optim, constrOptim, lars, etc.

I'm still surprised that you are asked to do this in a course...

lightman.guo

回复第13楼的 Rtist：yeah...I'm surprise too. An Indian professor...what they do is write the matrix on the board and give us the task to programming but he never teach those things...

And Do you mind post the code you do the contour graph here?

If you do mind, also OK

Anyway, thanks so much for you help!