如果

<bblatex>Y=\log\left(X\right)\sim\mathcal{LN}\left(\mu,\,\sigma^{2}\right)</bblatex>

那么有均值

<bblatex>\mathbb{E}\left(Y\right)&=\exp\left(\mu+\frac{1}{2}\sigma^{2}\right)</bblatex>

和方差

<bblatex>\text{var}\left(Y\right)&=\left[\text{exp}\left(\sigma^{2}\right)-1\right]\exp\left(2\mu+\sigma^{2}\right)</bblatex>

对于简单的对数线性回归

<bblatex>\lnY_{i}=\alpha+\beta X_{i}+\varepsilon_{i}</bblatex>

满足

<bblatex>\varepsilon_{i}\overset{\text{i.i.d}}{\sim}\mathcal{N}\left(0,\,\sigma^{2}\right)</bblatex>



<bblatex>\text{cov}\left(\varepsilon_{i}\,\varepsilon_{j}\right)=0</bblatex>

此时对于任意一个$X$有

<bblatex>\hat{\ln Y}\sim\mathcal{N}\left(\alpha+\beta X,\,\sigma^{2}\left(\frac{1}{n}+\frac{\left(X-\bar{X}\right)}{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}}\right)\right)</bblatex>

那么结果似乎是

<bblatex>\mathbb{E}\left(\hat{Y}\right)=\exp\left[\alpha+\beta X+\frac{1}{2}\sigma^{2}\left(\frac{1}{n}+\frac{\left(X-\bar{X}\right)}{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}}\right)\right]</bblatex>

但一般认为结果似乎是

<bblatex>\mathbb{E}\left(\hat{Y}\right)=\exp\left[\alpha+\beta X+\frac{1}{2}\sigma^{2}\right]</bblatex>

表示不能理解的是在《Reducing Transformation Bias in Curve Fitting》一文中给的是

<bblatex>\mathbb{E}\left(\hat{Y}\right)=\exp\left[\left(\alpha+\beta X+\frac{1}{2}\sigma^{2}\right)\left(\frac{1}{n}+\frac{\left(X-\bar{X}\right)}{\sum_{i=1}^{n}\left(X_{i}-\bar{X}\right)^{2}}\right)\right]</bblatex>

另外,真实的$\alpha$、$\beta$和$\sigma^{2}$又是不知道的,采用其最小二乘估计值$\hat{\alpha}$、$\hat{\beta}$和$\hat{\sigma}^{2}$进行计算的时候感觉以上表达式都有问题。

求大牛指教!