根据大家的建议，我重新把问题编辑了一下：两个结构方程模型（两者结构完全相同，只是相关参数的估计不同），如何进行比较？有没有现成的好的方法？例如：一个是优秀样本的结构方程模型，一个是差样本的结构方程模型，两个模型之间如何进行定量的比较？注意：结构方程模型并不是限值用LISREL或PLS-SEM构建的，它只是一种普遍的结构。所以大家不要仅限于两者的理论来论证是否可以进行比较。我现在的几个模型是预先把数据分类以后得到的，主要是为了检验几组数据在预设理论模型（实质就是几个变量之间的结构关系图）上面有哪些差异，问题是： 1 分类以后，并不是每一类都可以用LISREL做的出来。从而模型参数之间的比较就无从说起。 2 用PLS做结构方程，虽然都做得出来，但是由于PLS本身的原理特征：结构关系图上的系数比较是可疑的，应为PLS更注重预测，对于SEM的结构解释现在理论界并没有普遍的意见。所以现在关键的问题是能不能有其他的方法构建结构模型，从而可以使得在样本量不是很多的情况下，一是可以得到SEM，二是可以进行有效的模型参数的比较。本人才疏学浅，希望得到大家的指正，也希望与大家一起进步。

怎么没有人回答阿，急！

模型参数是否identifiable？

先把什么叫优秀样本，什么叫差样本的定义说出来吧。

感谢斑竹：模型都是可以识别的。用lisrel做有的模型拟合优度都很好GFI，CFI都在0.9以上。但是并不是所有的拟合优度都很好。主要是样本数量的不同造成的。所以本人在做的时候也利用了PLS-SEM来做，但是问题就出在这里，PLS-SEM对于潜变量的预测来说是可信的，但是对于潜变量之间的关系（coefficients）来说进行比较却是有争议的。现在就是无法找到一种方法能在两个矛盾中权衡。 1 LISREL的结构方程并不能保证每个都能拟合的好，这样就导致很多类型的样本结构方程出不来。 2 PLS-SEM虽然不涉及识别等问题，但模型之间的比较却存在争议。感谢hexm26：差样本与优秀样本的区别并不是指数据本身不好，而是代表相关的指标值较低的样本。本人先按照一定的规则把样本分成了A， B，C，D四类。也就是出来了4个结构方程模型，模型的假设基本一致，只是估计的参数值有差异。

如果是同一组数据，不同的模型，比较起来就容易一些，但现在4组数据，4个模型怎么比较就不知道，我也很想知道

问大家一个关于模型比较的问题

djjuese

[quote]引用第9楼rtist于2007-08-27 12:53发表的“”:

我不了解sem，也不清楚怎么搞文科的都这么爱用这玩意儿。

一般意义上模型比较，最常见的是比较likelihood。但是这里不行，因为你用了两个不同的数据集。

另外比较常见的就是比较模型的预测能力，这个用在不同数据上应该可以。

但是对sem怎么样我不清楚了。

.......[/quote]

好了，为了解决你的疑惑，引用过来。

我对你的这个回答还算客气，但阁下却有点不饶人。

rtist

[quote]引用第20楼djjuese于2007-08-28 08:23发表的“”:

好了，为了解决你的疑惑，引用过来。

我对你的这个回答还算客气，但阁下却有点不饶人。[/quote]

谢谢你的提醒，不过我在重新看过我的留言之后，决定不改变一个字！

还是不清楚搞文科的为什么这么爱用这玩意儿。。。

我还是不喜欢这种模型！

但是19楼的话我可以收回。

djjuese

[quote]引用第21楼rtist于2007-08-28 21:28发表的“”:

谢谢你的提醒，不过我在重新看过我的留言之后，决定不改变一个字！

还是不清楚搞文科的为什么这么爱用这玩意儿。。。

我还是不喜欢这种模型！

.......[/quote]

谢谢你能冷静下来，想你也是性情中人。说句实在话，本人也不喜欢，但是老板催得紧，没办法，硬要想出个方法来，又不愿自己骗自己，所以没有用ＰＬＳ－ＳＥＭ直接做比较。

本人只是不习惯阁下的表达方式，阁下谦虚一点可能更有人格魅力一些。从阁下帖子看也不是什么“泛泛之辈”，只是希望您能对“新手”仁慈一点。

送君一句话：

原则不能丢，但注意表达方式。不恰当的表达，不仅不利于解决问题，往往会适得其反。

ypchen

[quote]引用第14楼rtist于2007-08-27 15:44发表的“”:

好像没学过很正常，我也没学过，也不了解它。粗略的印象就是，这东西如果放在latent variable下面考虑，也许还是统计问题。放在应用中，好像基本就是背景知识的问题了。反正我个人很不喜欢这种模型。但是搞文科的经常经常用，搞理科的用这东西的似乎很少见，可能搞理科的总认为不能测量到的东西总是不怎么站得住脚吧？[/quote]

我也不觉得rtist这么说有什么不妥啊是有些人太敏感了吧

学理科的人讲究效率自己不懂绝对不装懂不喜欢就绝对不强迫自己接触表达方式显得很直接

可能有悖于中国传统文化被深受传统儒家思想熏陶的人们所误解是可以理解的大家千万别见怪

统计剑侠

俺现在学习的是arma模型,也有许多不明白的地方.

楼主的问题比较急,俺心有余而力不足,希望能够解决这个ＳＥＭ模型的问题.

djjuese

[quote]引用第23楼ypchen于2007-08-28 22:36发表的“”:

我也不觉得rtist这么说有什么不妥啊是有些人太敏感了吧

学理科的人讲究效率自己不懂绝对不装懂不喜欢就绝对不强迫自己接触表达方式显得很直接

可能有悖于中国传统文化被深受传统儒家思想熏陶的人们所误解是可以理解的大家千万别见怪[/quote]

事情已告一段落，就不要再如此如此了。。。

还是谈问题吧，希望大家能提供一些宝贵意见，让本人也长些见识。

djjuese

帖子又沉下去了。

hexm26

我对LISREL或PLS-SEM模型不熟，所以就模型本身我无法发表意见，即使说了也是误导人的。我的建议是仍然可以用likelihood来比较，即使用了两个不同的数据集。因为likelihood的本质是在假定你的数据在能够估计参数的前提下，越大的数值表明越好的估计性能，所以在没有别的更好的办法下，借用一下likelihood的思想还是可以的，至少模型结构是一样的。

不过我对你问题本身就有怀疑：严格的意义上说，你其实就一个模型，只是用了２个不同的数据而已，所以最终反应出来的估计参数不一样，甚至是有的数据有某个参数的估计，而另一个数据压根就没有。你如果非要进行“定量”比较，应该说是你人为的定义了一个标准来区分２种数据。从你的样本数据本身性质来解释２个结果应该更合理和可靠一些，单纯的数值大小比较是没有意义的。

rtist

it doesn't make any sense to me to compare likelihood on different data sets.

a simple counter-example is when the sample sizes are different.

more common examples are from the widely used mixed linear models, where REML cannot be compared when the fixed effect structure is different, whereas ML can.

hexm26

[quote]引用第28楼rtist于2007-08-31 04:47发表的“”:

it doesn't make any sense to me to compare likelihood on different data sets.

a simple counter-example is when the sample sizes are different.

more common examples are from the widely used mixed linear models, where REML cannot be compared when the fixed effect structure is different, whereas ML can.[/quote]

这其实并没有什么不好理解的。如果说是不同的数据来估计不同的模型，likelihood当然是不适合。现在楼主的前提是“两者结构完全相同，只是相关参数的估计不同”，更进一步说，两个样本只是代表相关的指标值有所不同，而你举的例子是“the fixed effect structure is different”，所以和楼主想表达的特定环境下的定义并不同。

换个角度想likelihood，它其实是条件概率：p(x|parameter)=p(parameter|x)，将数据和被估计的参数换个方向考虑就可以了，当然前提是模型结构完全相同。不过必须承认，即使用likelihood来比较数据还是太过牵强，我个人的意见是这种情况就没有一个统一的“量化”比较，还是从数据本身代表的指标值考虑吧。

rtist

[quote]引用第29楼hexm26于2007-08-31 16:06发表的“”:

这其实并没有什么不好理解的。如果说是不同的数据来估计不同的模型，likelihood当然是不适合。现在楼主的前提是“两者结构完全相同，只是相关参数的估计不同”，更进一步说，两个样本只是代表相关的指标值有所不同，而你举的例子是“the fixed effect structure is different”，所以和楼主想表达的特定环境下的定义并不同。

换个角度想likelihood，它其实是条件概率：p(x|parameter)=p(parameter|x)，将数据和被估计的参数换个方向考虑就可以了，当然前提是模型结构完全相同。不过必须承认，即使用likelihood来比较数据还是太过牵强，我个人的意见是这种情况就没有一个统一的“量化”比较，还是从数据本身代表的指标值考虑吧。[/quote]

My example is only to demonstrate this principle is clearly wrong. I'm not saying the mixed model example is exactly the situation here.

To see why likelihood cannot be compared, even if the model is the same, think of these two examples further:

1) Y1,Y2,...,Y10~i.i.d. N(0,1); X1,X2,...,X1000~i.i.d. N(0,1). That is, the only difference is the sample size. Then the likelihood for X's is much smaller than that of Y's, as here, the more data we have, the smaller the likelihood is. But clearly, the data set X's is no worse than the data set Y's.

> n1=rnorm(10)<br />
> n2=rnorm(1000)<br />
> sum(dnorm(n1,log=T))<br />
[1] -13.90256<br />
> sum(dnorm(n2,log=T))<br />
[1] -1392.248

2) Y1,Y2,...,Y100~i.i.d. N(0,1); X1,X2,...,X100~i.i.d N(0,0.01). Assume the same correct model N(mu,sigma), then the likelihood for data set X's should be a log larger then that on Y's, but clearly the two data set is of the same quality, neither being better than the other. One can plot the density to see why.

> y1=rnorm(100)<br />
> y2=rnorm(100,,.1)<br />
> sum(dnorm(y1,mean(y1),sqrt(var(y1)*99/100),log=T))<br />
[1] -130.2282<br />
> sum(dnorm(y2,mean(y2),sqrt(var(y2)*99/100),log=T))<br />
[1] 86.12534

hexm26

[quote]引用第30楼rtist于2007-09-01 12:07发表的“”:

My example is only to demonstrate this principle is clearly wrong. I'm not saying the mixed model example is exactly the situation here.

To see why likelihood cannot be compared, even if the model is the same, think of these two examples further:

1) Y1,Y2,...,Y10~i.i.d. N(0,1); X1,X2,...,X1000~i.i.d. N(0,1). That is, the only difference is the sample size. Then the likelihood for X's is much smaller than that of Y's, as here, the more data we have, the smaller the likelihood is. But clearly, the data set X's is no worse than the data set Y's.

.......[/quote]

呵呵,你这么举2个例子,我还差点被你给糊弄进去了.这里我们所说的likelihood是模型拟合的概率 f(y)=p(parameter|x), 而不是你列举的数据X的marginal likelihood。相对而言，数据X越多，拟合出的y的likelihood应该是越大。不过，这是假定模型是正确的情况下才成立的。

如果你并不知道模型本身的正确性（或者说还处在模型探索阶段，因为归根到低，没有“正确”的模型），单纯从数据大小判断是不可行的，甚至会得出错误结论。我想楼主的本意应该是在两组数据大小相等或差不多的情况下，只是各自代表相关的指标值不同，那么用模型拟后的likelihood来必较大小未尝不是个办法。

rtist

[quote]引用第31楼hexm26于2007-09-02 01:48发表的“”:

呵呵,你这么举2个例子,我还差点被你给糊弄进去了.这里我们所说的likelihood是模型拟合的概率 f(y)=p(parameter|x), 而不是你列举的数据X的marginal likelihood。相对而言，数据X越多，拟合出的y的likelihood应该是越大。不过，这是假定模型是正确的情况下才成立的。

如果你并不知道模型本身的正确性（或者说还处在模型探索阶段，因为归根到低，没有“正确”的模型），单纯从数据大小判断是不可行的，甚至会得出错误结论。我想楼主的本意应该是在两组数据大小相等或差不多的情况下，只是各自代表相关的指标值不同，那么用模型拟后的likelihood来必较大小未尝不是个办法。[/quote]

No. just on the contrary.

Look at the 2nd example.

If you have the same prior, the posterior (what you referred to as likelihood) should be proportional to the likelihood (your marginal likelihood). It doesn't matter what you call it.

Comparing likelihood on different data is completely non-sense.

hangover

[quote]引用第0楼djjuese于2007-08-25 17:02发表的“问大家一个关于模型比较的问题”:

　根据大家的建议，我重新把问题编辑了一下：

　　两个结构方程模型（两者结构完全相同，只是相关参数的估计不同），如何进行比较？有没有现成的好的方法？

例如：一个是优秀样本的结构方程模型，一个是差样本的结构方程模型，两个模型之间如何进行定量的比较？

.......[/quote]

Likelihood Principle:

In drawing inferences about "parameters" after "data" y is observed,

all relevant "sample information" is contained in the likelihood function p(y|parameters).

In this sense, it can be difficult to evaluate the performances of two different models based on two different samples as we are in fact dealing with different information sets. Say, "Model A can better explore information contained in Data A than Model B" does not imply "Model A must outperform Model B for Data B". Consequently, you are not able to know which model is the better one. You need a benchmark.

The point for your question is that you have defined the "fine" data and the "coarse" data. If you want to compare two models based on different data sets, you must figure out a way to estimate "fine" and "coarse" quantitatively. This is equivalent to find a "benchmark". For example, if the "fine" data contains double information than the "coarse" in explaining some empirical phenomena, then you could be able to evaluate how much the fine-data model outperforms the coarse-data model in this environment (the empirical phenomena). I think it can be extremely difficult.

Maybe you can continue your study based on the assumption that "find" and "coarse" are generated from the same "true" data generating process. This is equivalent to assume that "coarse" data contains less information than "fine" data (due to sample sizes or other reasons). In this case if

1. "Model A and Model B can, resp., best explore information contained in Data A and Data B among all other models belonging to SEM family." Equivalently, "Model A outperforms Model B for Data A and Model B outperforms Model A for Data B."

2. Model B has a higher likelihood value than Model A

then you can say Model B is better than Model A. However, you can only achieve semi-"定量的比较".

« 上一页