I made some contour plots on the constrained region, compared with the elastic net.
It can be seen that they are indeed very similar, but not equivalent. For some choices of the constraint, they are nearly the same. For larger constraints, this one penalizes less than the elastic net, i.e., applying a prior with heavier tails than the elastic net prior. For smaller constraints, this one penalizes more than the elastic net, i.e., using a prior with a sharper peak at the origin. These are all reasonable behavior for the square root function.
However, the bad news is, overall, they seem to be too similar to justify routine use over the elastic net, the latter of which is much cheaper computationally. So, I'm not convinced this is a better alternative that deserves extensive study.
Could you please tell me why you want to use this penalty over the elastic net? Do you have particular applications where this penalty is more interpretable? Or do you believe this penalty has better theoretical properties? Thanks.
[attachment=217220,1008]