请问怎样判断二者的一致性（菜鸟初问）

way_liu

用两个传感器（其中一个为标准传感器，另一个为自制传感器）测试同一环境因子，每一分钟自动记录一次当前数值，得到该环境因子连续多日的逐日变化数据，该因子每日变化规律均为单峰曲线变化，类似于气温的变化曲线。

现在欲说明两传感器的一致性，请问该如何做？

利用excel做趋势线，定性发现两个传感器测量得到的曲线几乎完全重合，但是能否用t检验，或者F检验，定量说明二者的一致性？

菜鸟一来上课不认真，而来天生愚笨，希望各位赐教

losttemple

2个重复测量因素的方差分析或许可以

rtist

Strictly speaking, you don't have any replicates here.

Although you can still come up with a probability model, that is not any simple t-test/F-test.

way_liu

请问[1楼]能否详细说明一下啊？利用excel或者spss如何实现？

请问[2楼]能否用中文说明一下？我不是很明白您的意思。

谢谢各位了！

outsider

This is a pretty odd situation. Can you treat it like a repeated measure? Congregate readings down to hourly or something. EXCEL will not do very well for large observation points. For one minute a reading, there are going to be 1440 a day. Say you do 10 days; that will 14400 points. EXCEL can't display them well. The matching curves might be misleading.

One extreme way is to go to non-parametric. Say you decide if two obs match then it is 0; otherwise then it is 1. Use Binomial to test if it is significant if you have xx un-matched pairs. My wild guess, any more thoughts.

rtist

[quote]引用第3楼way_liu于2007-10-07 21:42发表的“”:

请问[1楼]能否详细说明一下啊？利用excel或者spss如何实现？

请问[2楼]能否用中文说明一下？我不是很明白您的意思。

谢谢各位了！[/quote]

right now, deciding what to do is more important than how to do it.

rtist

[quote]引用第4楼outsider于2007-10-08 20:16发表的“”:

One extreme way is to go to non-parametric. Say you decide if two obs match then it is 0; otherwise then it is 1. Use Binomial to test if it is significant if you have xx un-matched pairs. [/quote] Hard to justify independence.

outsider

rtist:

I think his problem is more about the pattern matching instead of certain parameters comparison. As long as he can justify that two shapes are similar, then he could conclude the two meters follow each other well. There is no assumption of population distributions here. Will that simplify his problem?

Let's rephrase the question: If there are two curves, how do we know the two curves looks similar and by how much?

rtist

I think there should be two groups of curves, instead of just two curves, for them to be strictly comparable. Two curves will almost always be different. One can think of a curve as a high dimensional data point. With only two data points, we can hardly say whether they are close or far away from each other. We can calculate the distance between the two points, but we don't have any variability estimates to compare with.

That's why I say you can still come up with a probability model. But model is just a model, not a solution.

The way you suggested is a some kind of empirical Bayes thinking. You treated the different dimensions as a simple random sample from some distribution. It's a good idea, but I don't think it's working well right now, nor does everyone like to accept that dimensions can be treated as samples. If we have to do something with only two curves, at the very least, the dimensions should be modified as a correlated sample.

But, the best solution is still to get replicates - two groups of curves, instead of two curves.

outsider

I am not really familiar with this type of comparisons. I think the question could lead into two distinctive sets of topics. I think all of them have real business instances:

1. If cost was an inhibitor and we could only get two devices and measured two time series data, is there any way to compare them. The replicates might be embedded inside the time series, say you repeated experiments by season.

2. If I have a certain stock price as time series, I like the stock and would like have stocks with similar price variation as my portafolio. How could I match stocks with similar price variation by time series?

Those are something popped up in my head while I was reading the post. I feel it is interesting. Could you share some experience with us on that?

rtist

I've no idea about econ/finance. But the 2nd question seems easier: we are only trying to find out "which", among "many". (BTW: what do you mean by portfolio here?) Any distance based method will do the work.

For the 1st question, my naive answer would be to use other information that is not contained in the data. I think your empirical Bayes solution is also nice and promising, as long as the dependence structure can be properly incorporated.

outsider

Thanks for your suggestions. I think in some industry 'shape' similarity is every important. For instance, in telecomunicaiton, if two customers have very similar usage pattern over the time series. To a certain degree, they will trigger the same cost structure and could be group together for cost allocation. It is an interesting topic to follow.

way_liu

感谢outsider和rtist的帮助

One extreme way is to go to non-parametric. Say you decide if two obs match then it is 0; otherwise then it is 1. Use Binomial to test if it is significant if you have xx un-matched pairs.

实际上，可以说我是要比较同一时刻两个传感器测量值之间的差异，并认为在二者误差小于**%时，二者match each other very well, then it's 0, if not, then it's 1. 然后继续对其他时刻的pairs of value进行比较。一天24小时1440对数据，进行比较后得到1440个0或1。基于0出现的概率说明二者的一致性。

我想outsider的这个方法不错。谢谢！

rtist说得对，在该试验中没有任何重复，虽然每天有很多数据点，但他们不是在同一背景条件下获取的，所以不能简单的用t检验或者F检验。

虽然问题还处于悬而未决的阶段，但是感谢两位的热心帮助！谢谢！

rtist

[quote]引用第12楼way_liu于2007-10-11 10:32发表的“”:

基于0出现的概率说明二者的一致性。

我想outsider的这个方法不错。谢谢！

.......[/quote]

办法是不错，但问题是怎么能算出0出现的概率是大还是小。

hexm26

用Repeated Measures也好，用Time Series也好，都是研究时间对曲线增长或变化的影响作用的，而楼主的问题是比较２个曲线的异同，所以不建议用那些Modeling的方法。

惭愧的是，我平时见过很多做这种曲线的比较的问题，但自己不是这个方向的所以不太留意。大多数的方法是比较location,slope，curvature之类，不过楼主的数据是两个传感器测量得到的曲线几乎完全重合，所以我倒是建议楼主可以试一试寻找第３方曲线的办法：就是假定曲线A服从某个分布（可以分段进行），然后估计出参数值来。用这些参数值再来看曲线B，看是否B符合这些参数的分布。当然，楼主会得到一堆t和F值来，应该都是insignificant，就可以说明两个传感器是一致的。（不成熟的个人意见，仅供参考。）

rtist

[quote]引用第14楼hexm26于2007-10-11 11:37发表的“”:

用Repeated Measures也好，用Time Series也好，都是研究时间对曲线增长或变化的影响作用的，而楼主的问题是比较２个曲线的异同，所以不建议用那些Modeling的方法。

惭愧的是，我平时见过很多做这种曲线的比较的问题，但自己不是这个方向的所以不太留意。大多数的方法是比较location,slope，curvature之类，不过楼主的数据是两个传感器测量得到的曲线几乎完全重合，所以我倒是建议楼主可以试一试寻找第３方曲线的办法：就是假定曲线A服从某个分布（可以分段进行），然后估计出参数值来。用这些参数值再来看曲线B，看是否B符合这些参数的分布。当然，楼主会得到一堆t和F值来，应该都是insignificant，就可以说明两个传感器是一致的。（不成熟的个人意见，仅供参考。）[/quote]

好久不见hexm26。

在我看，这不就是在modeling么；事实上和outsider的思路很类似，虽然表现形式有点不同。

digestive

我觉得可以随机抽样得到一个训练组，估计出BAYES因子，然后根据这个因子，在剩下的数据中进行0-1判定，分析probability。

rltang2007

One possible solution:

Suppose the underlying model is X(t)=u(t)+e(t), where u(t) is the true temperature curve, e(t) is some noise and X(t) is the observed curve.

In this case, {X(t) , t\in T} is not totally observed; only {X(ti), i=1,n } is observed.

If two equipments are the same, the data {X1(ti), i=1,n } and {X2(ti), i=1,n } can be seen to i.i.d. come from {X(ti), i=1,n }.

Under the further assumption that e(t) is a Gaussian process, Ee(t)=0 and cov(e(s),e(t))=0 (s!=t) or sigma^2 (s=t), then \sum [X1(ti)-X2(ti)]^2/sigma^2 should follows kai square distribution with df = n.

So you need to choose sigma, then do the kai square test.

Note: the assumption that e(t) is a Gaussian process, Ee(t)=0 and cov(e(s),e(t))=0 (s!=t) or sigma^2 (s=t) essentially simplifies the problem. I think for your practice, it should work.