bootstrap critical value 的算法

心灵之园

有篇文章提出一个检验回归方程的设定是否正确的test statistic，它基于Kolmogorov-Smirnov test, 主要思想是看empirical distribution function 和 theoretical distribution funciton 的差距。经证明，该 test statistic 的 asymptotic distribution is case-dependent, 所以采用 bootstrap critical value.

H_0 给定y conditional on x 的分布函数，但参数\theta未知，检验H_0的方法是，

（1）先用 H_0 的模型估计参数 \hat{\theta}, 再利用\hat{\theta}计算出 test statistic T_0.

（2）然后使用同样的x, 根据H_0的模型，和在（1）中估计的\hat{\theta} simulate y for B times. 对于每个bootstrap sample, 算出跟（１）同一公式的test statistic T_b.

（３）将所有T_b排序，找出95% quantile, T_cv, 即为　critical value at 5% significance level.　If T_0>T_cv, reject H_0. 注：all T>0.

我的第一个问题出现在第2步。在计算T_b时，作者没有对每个bootstrap sample重新估计 \hat{\theta_b}, 而是直接用了第（１）步的 \hat{\theta}, 即用来generate bootstrap sample的\hat{\theta}。这样，所有T_b明显比 T_0大，因为针对每次sample算一次{\hat\theta_b}能够使empirical distribution function 和 theoretical distribution funciton 的差距尽可能缩小，但仍沿用从原始sample来的\hat{\theta}没有最小化这个差距，从而使T_b变大。

作者通过理论证明他的方法是正确的，要义是\hat{\theta}趋进于true theta. 但我的simulation 表明几乎所有T_b明显比 T_0大，所以如果original sample 是从H_0来的，从来不拒绝H_0。我认为，在计算每个T_b时，都应根据该次的bootstrap sample重新一次\hat{\theta_b}，这样T_b就不会都大于 T_0了。不知我的想法对吗？

yihui

那这篇文章的检验岂不是变成了检验y=f(\hat{\theta}, x)的设定是否正确了？

心灵之园

谢谢回复！

但即使是针对每一boostrap sample 重新估计\hat{\theta_b}，每次的bootstrap sample 本身还是必须从f(y|\hat{\theta},x)得到，因为H_0只包括distribution function 的形式，参数是未知的，而且用bootstrap 找critical values必须在原样本的基础上。那么，从bootstrap sample 的来源这个角度来看，就可以说是在检验f(y|\hat{\theta},x)。所以我认为，蜕化成检验f(y|\hat{\theta},x)的危险本身并不一定表明必须要重新估计\hat{\theta_b}。理论上作者证明了asymptotically correct, (作者是本领域的世界顶尖学者），但我认为实际应用上，不重新估计\hat{\theta_b}就会导致T_b>T_0。

我觉得重不重新估计参数的问题应该在其他bootstrap method中也出现过，不过我对bootstrap literature不是很熟，能给我一个类似的例子吗？我想看看别人是怎么处理的？

再次感谢！

longoR

I only read the first sentence, and it is quite surprising. The KS test needs iid, but the regression is about the conditional mean. I can hardly think of KS would be useful in regression. Anyway, I'll read the rest of the post to see what's going on once I have time. But bootstrap should work in regression under some conditions.

Is lz a theoretical guy?

rtist

In general, we need to re-estimate the parameters. But I'm not sure about this specific case.