huangxiaoyi
数据多数集中在值较大的区间,值较低的区间很少. 导致做线性回归分析时参数估计过大,怎么办啊?想找资料,又不知道从哪方面找,请大家帮帮忙啊!
yihui
把数据拆成两部分做模型,或者暂时把低值的样本扔掉,看作出来模型与所有样本做的模型差异多大。
foreseer201
这种情况是不是就要考虑非线性回归了啊
outsider
Since your data have some kind of clustering by nature, why not introduce a dummy variable to see if it is going to help your parameter estimation. Be sure to check if the dummy and Xs has any interactions.
huangxiaoyi
十分感谢各位的提示.
to foreseer201: 因为原始数据是默认为pareto分布(类似指数分布),所以ln以后应该是线性关系.
准备试试outsider的方法, 另外谢老大,比较两组数据后然后怎么办啊?
rtist
weighting
huangxiaoyi
楼上的版主是建议weighting么?
rtist
[quote]引用第6楼huangxiaoyi于2007-10-07 14:38发表的“”:
楼上的版主是建议weighting么?[/quote]
that's just one possibility to try. there are certainly others, like conditioning, as outsider suggested.
yihui
[quote]引用第4楼huangxiaoyi于2007-10-07 18:02发表的“”:
准备试试outsider的方法, 另外谢老大,比较两组数据后然后怎么办啊? [/quote]
如果差异不大那还好说,差异大的话就要检查那一小部分数据出现的原因了,是真实就是那样还是由于某种原因引起的非正常观测。