adjusted R-square

土豆泥

在回归里面的model selection里有这末句话

R²=1-SSE/SST. In the limit, if you have n observations and if the model contains n parameters, SSE=0 and R²=1.

For the reasons ,we consider the adjusted R², which incorporate a penalty for each estimated coefficient, R²adj.p=1-SSEp/(n-p-1)/SST/(n-1).

但是不是很清楚为何要做修正呢，修正的意思是什莫呢，能不能给解释一下？谢谢

yuanxn

样本量大时修正和不修正r方（负相关系数）差距不大，严格来说不论样本大小应该用修正的r方，但是样本量大或说明变量多时，不论SSE 或 SST都要被其自由度除r方才精确。

neige

.......

你自己不是说的很清楚了，R^2 is an non-decreasing function of p, when n=p, R^2=1, you overfit the data, you model has no predictive power, does this model make sense to you?

土豆泥

谢谢

[quote]引用第1楼yuanxn于2008-04-29 01:10发表的“”:

样本量大时修正和不修正r方（负相关系数）差距不大，严格来说不论样本大小应该用修正的r方，但是样本量大或说明变量多时，不论SSE 或 SST都要被其自由度除r方才精确。[/quote]

土豆泥

谢谢，现在好像明白一点了

[quote]引用第2楼neige于2008-04-29 09:26发表的“”:

.......

你自己不是说的很清楚了，R^2 is an non-decreasing function of p, when n=p, R^2=1, you overfit the data, you model has no predictive power, does this model make sense to you?[/quote]

nlinyn

修正的目的更常用於自變數個數多於1時,因為每加入一個自變數,不管顯著或不顯著,都會使最後的迴歸方程式對樣本數據的解釋力增加, 為了修正因為變數增加造成的影響,故使用修正的相關係數平方.

所以2樓的同學有所誤解,公式中的p指的是自變數個數,不是樣本數

gucasluo

这是不是意味着，对于自变量只有1个的情况下，线性回归的Adjusted R-Square和R-Square任用一个就好了？