Metropolis算法的接受率太低怎么办？

longoR · 2007年4月4日

做mcmc的时候，刚开始的几千个iteration还能接受很多，可是后来就停在那儿不动了，过了好几万个iteration还在原地一点没动。这是proposal分不选得不好的原因么？

lindammay · 2007年4月5日

我用gibbs仿真的时候,如果接受率太低,就 thinning一下.Metropolis算法的我不清楚,不知道是否类似...

lindammay · 2007年4月5日

刚才又查了下，好像是proposal的问题，但是书上只是说：such a chain will frequently not move, again resulting in slow mixing。。。ideally, the proposal distribution should be scaled to avoid it....我是在markov chain monte carlo in practice （Gilks, Richardson, Spiegelhalter）这本书上的第9页看到一点，我想还应该有其它专门解决这个问题的文献的,希望能对楼主有所帮助。

longoR · 2007年4月5日

[quote]引用第1楼lindammay于2007-04-05 18:03发表的“”:

我用gibbs仿真的时候,如果接受率太低,就 thinning一下.Metropolis算法的我不清楚,不知道是否类似... [/quote]

gibbs好像应该没有接不接受的问题啊？你说thinning一下应该是serial correlation太高的问题吧？

longoR · 2007年4月5日

[quote]引用第2楼lindammay于2007-04-05 19:36发表的“”:

刚才又查了下，好像是proposal的问题，但是书上只是说：such a chain will frequently not move, again resulting in slow mixing。。。ideally, the proposal distribution should be scaled to avoid it....我是在markov chain monte carlo in practice （Gilks, Richardson, Spiegelhalter）这本书上的第9页看到一点，我想还应该有其它专门解决这个问题的文献的,希望能对楼主有所帮助。 [/quote]

谢谢，我想我的问题可能是维数太高，proposal很难覆盖R^5000的空间，郁闷。

我不太清楚scale指什么，是说scale到the spaces with high posterior density么？

lindammay · 2007年4月5日

[quote]引用第3楼longoR++于2007-04-06 00:31发表的“”:

gibbs好像应该没有接不接受的问题啊？你说thinning一下应该是serial correlation太高的问题吧？[/quote]

我看到楼主的问题的时候,脑子里直接反应的就是trace plot如果在接受率低的时候可能呈现的样子,然后就想到如果是我用gibbs出现类似的图象时怎么解决;-P不好意思,丢人啦.我对M-H懂得不多,具体指什么我也不确定啦~~~不过貌似见过相关主题的讨论文献....

cran · 2007年4月6日

how many dimension?

longoR · 2007年4月6日

[quote]引用第6楼cran于2007-04-06 18:31发表的“”:

how many dimension?[/quote]

I only tried 5000 dimensions, but the actual data have 24000 dimensions.

hangover · 2007年4月6日

啥模型？24000 dim?

hangover · 2007年4月6日

而且你用的是啥MH? independence MH or Random-Walk MH or AR-MH?

For random-walk MH and Langevin Metropolis, you can control the acceptance rate by scaling (or tuning) the variance of the error term of the random walk model.

longoR · 2007年4月6日

我决定放弃了，感觉高维数据好像mcmc根本没用，远不如frequentist的empirical bayes实际。

24000已经不算高了，再高十倍、百倍的数据都很常见了。

比如很多互联网数据、基因组数据、天体物理数据等等等等。

hangover · 2007年4月6日

模型拿来看看，我做过10万*3维的模型，收敛速度很快，关键看你的算法。

longoR · 2007年4月7日

模型是：

<br />
y[i]为数据，i=1, ... , 24123。<br />
<br />
y[i] | s[i] ~ scaled chi-square with known df<br />
s[i] | b ~ mixture of K inverse gamma (with known parameters), with mixing proportion b<br />
b ~ dirichlet (1)<br />
<br />
K is known, and not extremely large (<50). <br />
b is K-dimensional. <br />
<br />
所关心的是b和s[i]的posterior。<br />

30万维居然还能收敛速度快？太强了！怎么做的？

你说的是十万个数据点但是只有三个随机的参数，还是随机的参数就有三十万个？

cran · 2007年4月7日

应该是bioinfo的东西吧？成百上千的parameter,十几个obs

longoR · 2007年4月7日

[quote]引用第13楼cran于2007-04-07 15:22发表的“”:

应该是bioinfo的东西吧？成百上千的parameter,十几个obs[/quote]

<br />
其实放到frequentist下来看，参数很少，只有K个；<br />
但是放在mcmc下来看，中间的s[i]也要当作参数来模拟，所以维数剧增。<br />
或者在简单的情况下直接把s[i] integrate掉，但是这种情况很少见，而且就算能积掉，通常也顶多积掉一维的，而很难积掉joint的。

hangover · 2007年4月7日

有两种可能性造成 slow convergence或者不收敛:

1。模型根本不合适

2。算法不合适

我猜你的模型可能是这样的:

 <br />
Model 1:<br />
y[i] | s[i] ~ Gamma(df/2, s[i]/2)<br />
s[i] | b ~ \sum_{j=1}^K b[j] Inv-Gamma(somegivenparameter_j,somegivenparameter_j)<br />

如果我必须要去评估这个模型，我会 augment data space by introducing "mixture indicator" 就是说把"参数"再增加24123

  <br />
Let x=(x_1,...x_{number of observations})<br />
<br />
Model 2:<br />
y[i] | s[i] ~ Gamma(df/2, s[i]/2)<br />
s[i] | x[i] = j ~ Inv-Gamma(para_j,para_j), x[i] denotes the mixture indicator, j = 1,...,K. Given x[i], s[i] has an inverse gamma distribution.<br />
Prob(x[i]=j)=b[j], j = 1,...,K <br />
<br />
Sampling s[i]:<br />
the consitional posterior of s[i] is p(s[i]|...) \propto p(y[i]|s[i],...) prior(s[i]|x[i]=j,...).<br />
<br />
Since s[i]s are not correlated, this algoritm should not affect the convergence of MCMC algorithm. However, Gibbs sampler can not apply to sampling s[i] as Inv-Gamma prior may not conjugate to Gamma likelihood, not 100% sure. You dont have this concen if you are using WINBUGs.<br />
<br />
Sampling x[i]:<br />
The conditional posterior of x[i] given observations, y, and mixture components, b, has a multinominal-distribution(1, b), so we can easily sample x. <br />
<br />
Prob(x[i]=j | y ,...) \propto b[j] * p(y|...)<br />
<br />
Sampling b[j]?<br />
The next question turns to how to sampling b, if it is necessary? generally, we use b as a prior. Now, we assume b~Dirichlet(a_1,...,a_K) and let a=(a_1,...,a_k).<br />
we need to specify hyperparameters for a. However, to be honest, I have no idea about how to sampling b :( Maye we should discuss it latter.<br />

哦，你确实是使用b作为prior

 b ~ dirichlet (1)<br />

试一下data-augmentation，这个是非常基本而且非常好用的办法在mixture distribution.

你的问题应该可以解决。

如果有问题我们可以再讨论。

:)

longoR · 2007年4月7日

good idea, I'll try it. Thanks.

longoR · 2007年4月8日

Thanks. Both inverse gamma and dirichlet are conjugate :)

hangover · 2007年4月8日

[quote]引用第17楼longoR++于2007-04-08 16:10发表的“”:

Thanks. Both inverse gamma and dirichlet are conjugate :)[/quote]

normal likelihood吧？

longoR · 2007年4月8日

是从normal data中事先summarize出来的：）