用定义求
想高手请教:如何计算相关系数?
不好意思,我还是问个弱弱问题吧:中文的“a是b的方程”,这句话什么意思啊?
a是b的函数 a=f(b) 楼主能不能给出表达式 还有"(b,c)上的切断正态分布"是什么意思?
不知道正确的中文名字,不好意思。切断正态分布的密度函数是正态分布的形式,但是是从某一点开始比如在区间(a到正无穷大)之间的概率是1。而不是像正常的正态分布一样,是从负无穷达到正无穷大。
从负无穷大到正无穷大的积分应该大于某个区间的积分吧 那怎么会使某区间的概率为1呢 是不是搞错了?
我没办法给出精确的描述。是变量的取值区间是a到正无穷大,小于a的部分不是在定义域内,所以在a到正无穷大之间概率为1。这种分布函数的英文名字是truncated normal distribution。中文名字是什么?
a到正无穷大的积分也不是1啊 莫非函数也改变了 已经不是正态分布了 你在哪本书或文章里看到的
ypchen你可能把分布函数和密度函数弄混了,no1cooler所说的truncated distribution可能更常见的翻译是“截尾分布”,也就是把通常的分布“截断”,P{X<b}=0 P{X<a}=1(或者F(b)=0 F(a)=1)
对于密度函数来说,从b到a的积分为1,而f(b)=f(a)=0,应该就是这个意思吧。
z=f((1-r)x, y)
w=g(rx)
f和g的形式都未知,咋求corr(z, w)啊?……
对于密度函数来说,从b到a的积分为1,而f(b)=f(a)=0,应该就是这个意思吧。
z=f((1-r)x, y)
w=g(rx)
f和g的形式都未知,咋求corr(z, w)啊?……
我觉得这个问题或者没说清楚,或者我没理解清楚。楼主能不能用英文重新解释一遍?
我怀疑的理由是:按照谢的理解,函数g在这个题里面没有任何限制,也就是说任何函数都应该有同样的解。这显然不可能。举个反例来说,如果函数g的形式是一个cauchy分布的密度函数除以rx的密度函数,那么g(rx)的期望就不存在。w期望都不存在,那还谈什么相关系数呢?
我怀疑的理由是:按照谢的理解,函数g在这个题里面没有任何限制,也就是说任何函数都应该有同样的解。这显然不可能。举个反例来说,如果函数g的形式是一个cauchy分布的密度函数除以rx的密度函数,那么g(rx)的期望就不存在。w期望都不存在,那还谈什么相关系数呢?
我觉得有密度函数就好办了 有了密度函数 就可以知道期望有没有 再看方差和协方差存不存在
相关系数就等于协方差初以两个随机变量的标准差的乘积
分布是由密度函数(如果是离散的,就是分布列)唯一确定
可能我想简单了 哈哈!
相关系数就等于协方差初以两个随机变量的标准差的乘积
分布是由密度函数(如果是离散的,就是分布列)唯一确定
可能我想简单了 哈哈!
Truncated distribution is a type of conditional distribution: density f(x|x>a)=f(x)/P(x>a) truncated from below,or from above f(x|x<a)=f(x)/p(x<a), for example, f(x) can be any distribution, uniform distribution, Normal distribution, Possion distribution.
In our problem, the density function of z has a density form as following:
f(x|b<x<c)=f(x)/p(b<x<c)=f(x)/G(c)-G(b)
here f(x),G(x) denote the density function, and distribution function of normal distribution(or we take it as standard norm distribution).
z correlates with w only because both depend on x, assume z and w are linear function of x,y:
t=a(1-r)x+dy
w=erx
where a,d,e are coefficients.We obtain z by truncating t from both below and above(!!!!),
that is,
z=t|b<t<c
Without loss of generality, or for simplicity, we assume x,y are distributied as N(0,1)
The t distributed as N(0, (a(1-r))(square)+d(square))
Firstly, we compute the the covariance,
Cov(z,w)=Cov(t|b<t<c,erx)=Cov(a(1-r)x+dy|b<t<c, erx)
To compute the correlation coefficients we still need the standard deviation for z and w,
Var(w)=(er)(square)Var(x)
Var(z)=var(t|b<t<c)=E(z(square))-(E(z))(square):
1) E(z(square))=E(t(square)|b<t<c)=(integral)z(square)f(t)/p(b<t<c)dt b<t<c
2)E(z)=E(t|b<t<c)=(integral)z(square)f(t)/p(b<t<c)dt b<t<c
You may compute these by yourself, a little complicated but Probability I. Hopefully it's right.
Note that the truncated mean and variance differ from the original norm distribution.The truncated variance is reduced and mean depends on b and c. See for example, Greene, Econometric analysis, Ch22, 757-761, which is available on the website of COS. cos.name ->置顶文章
In our problem, the density function of z has a density form as following:
f(x|b<x<c)=f(x)/p(b<x<c)=f(x)/G(c)-G(b)
here f(x),G(x) denote the density function, and distribution function of normal distribution(or we take it as standard norm distribution).
z correlates with w only because both depend on x, assume z and w are linear function of x,y:
t=a(1-r)x+dy
w=erx
where a,d,e are coefficients.We obtain z by truncating t from both below and above(!!!!),
that is,
z=t|b<t<c
Without loss of generality, or for simplicity, we assume x,y are distributied as N(0,1)
The t distributed as N(0, (a(1-r))(square)+d(square))
Firstly, we compute the the covariance,
Cov(z,w)=Cov(t|b<t<c,erx)=Cov(a(1-r)x+dy|b<t<c, erx)
To compute the correlation coefficients we still need the standard deviation for z and w,
Var(w)=(er)(square)Var(x)
Var(z)=var(t|b<t<c)=E(z(square))-(E(z))(square):
1) E(z(square))=E(t(square)|b<t<c)=(integral)z(square)f(t)/p(b<t<c)dt b<t<c
2)E(z)=E(t|b<t<c)=(integral)z(square)f(t)/p(b<t<c)dt b<t<c
You may compute these by yourself, a little complicated but Probability I. Hopefully it's right.
Note that the truncated mean and variance differ from the original norm distribution.The truncated variance is reduced and mean depends on b and c. See for example, Greene, Econometric analysis, Ch22, 757-761, which is available on the website of COS. cos.name ->置顶文章
I think 谢益辉's claim is critical: the functional form of z and w decide how to get the correlation and/or its existence.
rtist has found one possible function that makes E(w)=infinity and hence non-existence of any correlations. So the problem is sure to be poorly stated.
What ypchen needs, i.e. the pdf, also directly follows what z and w functions look like.
meactohn2003 tries to "construct" sensible and simple forms of z and w, and find particular solutions to this specific construction. But the problem is unidentifiability. Even if we arbitrarily set d=0 and e=1/r, the coef a is still unidentifiable. Again, this said the problem is poorly specified.
No one will succeed if a problem has no solutions (rtist demonstrated) or infinite nubmer of possible solutions (meactohn2003 demonstrated) .
rtist has found one possible function that makes E(w)=infinity and hence non-existence of any correlations. So the problem is sure to be poorly stated.
What ypchen needs, i.e. the pdf, also directly follows what z and w functions look like.
meactohn2003 tries to "construct" sensible and simple forms of z and w, and find particular solutions to this specific construction. But the problem is unidentifiability. Even if we arbitrarily set d=0 and e=1/r, the coef a is still unidentifiable. Again, this said the problem is poorly specified.
No one will succeed if a problem has no solutions (rtist demonstrated) or infinite nubmer of possible solutions (meactohn2003 demonstrated) .
Are we talking about Truncated regression model? I think there is no identifiabilty problem here, we are not estimating model coefficients from data!! r is known .And here of course the functional form is linear, otherwise z can not be truncated norm distribution(because x,y are normally distributed). Perhaps the question should be specified more clearly.
right. I shouldn't say unidentifiable, but infinite number of solutions.
z doesn't need to be linear. You can construct non-linear functions and still end up with a normal variable, then truncate it.
z doesn't need to be linear. You can construct non-linear functions and still end up with a normal variable, then truncate it.
9 天 后
To meactohn2003:
Thank you very much.
Here you gave your explaination detailly. I donot think you can get covariance by the definition Cov(z,w)=Cov(t|b<t<c,erx)=Cov(a(1-r)x+dy|b<t<c, erx). Because you donot know the density function of (z,w), so we can not get E(zw) and then can not get covariance, or the corelation coefficient.
Thank you very much.
Here you gave your explaination detailly. I donot think you can get covariance by the definition Cov(z,w)=Cov(t|b<t<c,erx)=Cov(a(1-r)x+dy|b<t<c, erx). Because you donot know the density function of (z,w), so we can not get E(zw) and then can not get covariance, or the corelation coefficient.
I am not sure if it is right to calculate the corelation coefficient between z and w by using the relation between the regression coefficient and the corelation coefficient. because I think there are some relation between the two coefficients. But I think maybe I am wrong because z is not only linear function of w, but they have different intervals.
Who can tell me if I am right?
Who can tell me if I am right?