wshoper
目的: 鉴定在同一种组织中,某些基因的表达差异是否显著。
数据: 5种正常组织,一种对照组织。正常组织全部标记为cy5,对照标记为cy3。3次重复。每张
片子Lowess 正态化。30000个基因,片子内没有重复。
由于是在同一种组织内的比较,等于比较一张芯片上不同点的信号,3次重复。但是由于每张片子有可能上样量不同而有误差,先用两因素的方差分析。(SAS v8,anova prog, no interation)
结果:
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 28494 334096.3720 11.7251 3.89 <.0001
Error 56984 171676.4366 3.0127
Corrected Total 85478 505772.8086
R-Square Coeff Var Root MSE exp Mean
0.660566 27.97072 1.735717 6.205479
Source DF Anova SS Mean Square F Value Pr > F
Array 2 19473.6699 9736.8350 3231.92 <.0001
Gene 28492 314622.7021 11.0425 3.67 <.0001
wshoper
Array间的变异比较大,我用了Quantile的方法进行了处理。然后再进行方差分析:
Sum of
Source DF Squares Mean Square F Value Pr > F
Model 28494 194565755265 6828306.1439 5.08 <.0001
Error 56984 76651973863 1345149.057
Corrected Total 85478 271217729128
R-Square Coeff Var Root MSE exp Mean
0.717378 341.9285 1159.806 339.1953
Source DF Anova SS Mean Square F Value Pr > F
Array 2 13004614.341 6502307.1703 4.83 0.0080
Gene 28492 194552750651 6828329.0275 5.08 <.0001
wshoper
这样Array间的变异就比较小了,然后可以对GENE的多重比较了。
问题是:1 这样处理是否可以?有没有问题。
2 Array间的影响仍然显著,要把它处理掉吗?如果要,应该如何处理?
wshoper
问题3 : 如果我每个组织中比较一对基因的表达差异,多重比较问题应该只存在于这5个组织间吧,因此校正时除以5就行了吧?
rtist
microarrays are not designed for such purposes. if you think of associated statistics, it'll be even worse.
drewlee
[quote]引用第3楼wshoper于2007-01-24 18:07发表的“”:
问题3 : 如果我每个组织中比较一对基因的表达差异,多重比较问题应该只存在于这5个组织间吧,因此校正时除以5就行了吧?[/quote]
校正什么?p-value?
这种情况下如果做的是Tukey's method test还可以不用校正吧。
wshoper
[quote]引用第4楼rtist于2007-01-24 18:46发表的“”:
microarrays are not designed for such purposes. if you think of associated statistics, it'll be even worse.[/quote]
应该可以用芯片比较一个组织中不同基因的表达高低。
rtist
[quote]引用第6楼wshoper于2007-01-24 06:35发表的“”:
应该可以用芯片比较一个组织中不同基因的表达高低。[/quote]
It is desirable, but hardly useful in practice, due to the complexity of physico-chemical properties of sequences and/so slide surfaces with or without coating agents.
A high signal does not necessarily correspond to higher concentration of nucleotides.
There are some very special cases that such arrays will work, but in general, such a approach will almost always end up with failure attempts.
Also, this is not an interesting biological problem to most biologists, although a very small number of guys would like this.
rtist
I recommend you to use other techniques, e.g. mpss, for such purposes. In MPSS, this is comparable.
wshoper
[quote]引用第7楼rtist于2007-01-25 11:52发表的“”:
It is desirable, but hardly useful in practice, due to the complexity of physico-chemical properties of sequences and/so slide surfaces with or without coating agents.
A high signal does not necessarily correspond to higher concentration of nucleotides.
There are some very special cases that such arrays will work, but in general, such a approach will almost always end up with failure attempts.
Also, this is not an interesting biological problem to most biologists, although a very small number of guys would like this.[/quote]
Perhaps the variance can be estimated by the technologic replicates. If there are more than one different probe to detect one gene, we can get the information about effect by the probe or slide sufaces. how do you think about it?
rtist
It depends. I've said that there are special cases where such approaches will work. But it needs very careful design to get that edge. More than one different probes are definitely desirable, as in Affy. Randomization is critically important.