我决定放弃了,感觉高维数据好像mcmc根本没用,远不如frequentist的empirical bayes实际。



24000已经不算高了,再高十倍、百倍的数据都很常见了。

比如很多互联网数据、基因组数据、天体物理数据等等等等。
模型拿来看看,我做过10万*3维的模型,收敛速度很快,关键看你的算法。
模型是:
<br />
y[i]为数据,i=1, ... , 24123。<br />
<br />
y[i] | s[i] ~ scaled chi-square with known df<br />
s[i] | b ~ mixture of K inverse gamma (with known parameters), with mixing proportion b<br />
b ~ dirichlet (1)<br />
<br />
K is known, and not extremely large (<50). <br />
b is K-dimensional. <br />
<br />
所关心的是b和s[i]的posterior。<br />




30万维 居然还能收敛速度快?太强了!怎么做的?

你说的是十万个数据点但是只有三个随机的参数,还是随机的参数就有三十万个?
应该是bioinfo的东西吧?成百上千的parameter,十几个obs
[quote]引用第13楼cran2007-04-07 15:22发表的“”:

应该是bioinfo的东西吧?成百上千的parameter,十几个obs[/quote]
<br />
其实放到frequentist下来看,参数很少,只有K个;<br />
但是放在mcmc下来看,中间的s[i]也要当作参数来模拟,所以维数剧增。<br />
或者在简单的情况下直接把s[i] integrate掉,但是这种情况很少见,而且就算能积掉,通常也顶多积掉一维的,而很难积掉joint的。
有两种可能性造成 slow convergence或者不收敛:

1。模型根本不合适

2。算法不合适



我猜你的模型可能是这样的:
 <br />
Model 1:<br />
y[i] | s[i] ~ Gamma(df/2, s[i]/2)<br />
s[i] | b ~ \sum_{j=1}^K b[j] Inv-Gamma(somegivenparameter_j,somegivenparameter_j)<br />




如果我必须要去评估这个模型,我会 augment data space by introducing "mixture indicator" 就是说把"参数"再增加24123
  <br />
Let x=(x_1,...x_{number of observations})<br />
<br />
Model 2:<br />
y[i] | s[i] ~ Gamma(df/2, s[i]/2)<br />
s[i] | x[i] = j ~ Inv-Gamma(para_j,para_j), x[i] denotes the mixture indicator, j = 1,...,K. Given x[i], s[i] has an inverse gamma distribution.<br />
Prob(x[i]=j)=b[j], j = 1,...,K <br />
<br />
Sampling s[i]:<br />
the consitional posterior of s[i] is p(s[i]|...) \propto p(y[i]|s[i],...) prior(s[i]|x[i]=j,...).<br />
<br />
Since s[i]s are not correlated, this algoritm should not affect the convergence of MCMC algorithm. However, Gibbs sampler can not apply to sampling s[i] as Inv-Gamma prior may not conjugate to Gamma likelihood, not 100% sure. You dont have this concen if you are using WINBUGs.<br />
<br />
Sampling x[i]:<br />
The conditional posterior of x[i] given observations, y, and mixture components, b, has a multinominal-distribution(1, b), so we can easily sample x. <br />
<br />
Prob(x[i]=j | y ,...) \propto b[j] * p(y|...)<br />
<br />
Sampling b[j]?<br />
The next question turns to how to sampling b, if it is necessary? generally, we use b as a prior. Now, we assume b~Dirichlet(a_1,...,a_K) and let a=(a_1,...,a_k).<br />
we need to specify hyperparameters for a. However, to be honest, I have no idea about how to sampling b :( Maye we should discuss it latter.<br />




哦, 你确实是使用b作为prior
 b ~ dirichlet (1)<br />




试一下data-augmentation, 这个是非常基本而且非常好用的办法在mixture distribution.

你的问题应该可以解决。



如果有问题我们可以再讨论。



:)

good idea, I'll try it. Thanks.
Thanks. Both inverse gamma and dirichlet are conjugate :)
[quote]引用第17楼longoR++2007-04-08 16:10发表的“”:

Thanks. Both inverse gamma and dirichlet are conjugate :)[/quote]



normal likelihood吧?
是从normal data中事先summarize出来的:)
3 个月 后