总体分布未知，检验单个样本的方差，用什么检验方法

Cloud2016 · 2023年6月5日

设 $x_1,x_2,\cdots,x_n$ 是来自某个总体的样本，现在想检验

$H_0: \sigma^2 - \sigma^2_0 \leq 0 \quad vs. \quad H_1: \sigma^2 - \sigma^2_0 > 0$

其中， $\sigma^2_0$ 是已知的某个数。请问用什么检验方法可以做？

Cloud2016 · 2023年6月5日

也求助一下有 ChatGPT 的坛友，可否帮我问一下？

how to test variance in one sample when the population is unknown?

yuanfan · 2023年6月5日

谷歌他们家的 Bard 是这样回答的：

To test variance in one sample when the population is unknown, you can use a t-test. A t-test is a statistical test that is used to compare the means of two groups. In this case, the two groups are the sample data and the hypothesized population variance.

To conduct a t-test, you will need to know the following:

The sample mean: This is the average of the values in your sample data.
The sample standard deviation: This is a measure of how spread out the values in your sample data are.
The degrees of freedom: This is a measure of how much data you have. It is calculated as , where is the number of values in your sample data.n-1n
Once you have this information, you can use a t-table to find the critical value for your test. The critical value is the value that the t-statistic must exceed in order for you to reject the null hypothesis.

The null hypothesis is the hypothesis that the population variance is equal to a certain value. The alternative hypothesis is the hypothesis that the population variance is not equal to that value.

If the t-statistic is greater than the critical value, then you can reject the null hypothesis and conclude that the population variance is not equal to the hypothesized value.

Here is an example of how to conduct a t-test to test variance in one sample when the population is unknown:

Suppose you have a sample of 10 values that has a mean of 50 and a standard deviation of 10. You want to test the hypothesis that the population variance is 100.

The degrees of freedom for this test are .10-1 = 9

The critical value for a two-tailed t-test with 9 degrees of freedom and a significance level of 0.05 is 2.262.

The t-statistic for this test is .(50-100)/(10/sqrt(10)) = -5.26

Since the t-statistic is greater than the critical value, we can reject the null hypothesis and conclude that the population variance is not equal to 100.

It is important to note that a t-test is only valid if the sample data is normally distributed. If the sample data is not normally distributed, then you may want to use a nonparametric test, such as the Welch-Satterthwaite test.

Liechi · 2023年6月6日

Cloud2016 如果数据符合正态独立同分布（或可以合理推测不会偏离得过于离谱），且够大的话（一般 n > 20），卡方检验就可以了。

如果不符合以上假设的话，找对应的非参方法吧。我稍微搜了一下，没有看到显而易见符合你目的的非参检验。不大要求正态总体的 Levene 检验可以用于比较样本间是否同方差，但似乎不能用于单样本场景。

nan.xiao · 2023年6月6日

GPT-4:

In this case, we are interested in testing whether the population variance is significantly different from a known value. Here, you're specifically interested in whether the population variance is greater than a given value. The most appropriate statistical test to use in this case is the Chi-Square test for a single variance.

In this context, the test statistic can be defined as follows:

$X² = \frac{(n-1)S²}{\sigma_0²}$,

where:

$n$ is the sample size,

$S²$ is the sample variance, defined as $\frac{1}{n-1} \sum_{i=1}ⁿ (x_i - \bar{x})²$, with $\bar{x}$ > being the sample mean,

$\sigma_0²$ is the known population variance.

The test statistic $X²$ follows a chi-square distribution with $n-1$ degrees of freedom under the null hypothesis.

Your hypothesis testing problem can be re-stated as follows:

$H_0: \sigma² \leq \sigma²_0$

$H_1: \sigma² > \sigma²_0$

If you obtain a $X²$ statistic value that is on the right tail of the chi-square distribution beyond a certain critical value (determined by your chosen significance level, say 0.05), then you would reject the null hypothesis in favor of the alternative hypothesis, which states that the population variance is greater than the known value.

Remember that this test assumes that the population from which the samples are drawn is normally distributed. If you cannot assume normality, you might need to use a different test or a non-parametric method.

跟进询问如果不能假设正态性，需要用什么方法：

If you can't assume normality, the chi-square test may not be appropriate. Here are some alternatives:

Levene's Test: This is a test for equality of variances in different samples which does not require the assumption of normality.

Bartlett’s Test: This is another test for homogeneity of variances. It's more sensitive to departures from normality than Levene's test.

Brown–Forsythe Test: This is a variant of Levene's test that is less sensitive to departures from normality.

Fligner–Killeen Test: This is another non-parametric test for homogeneity of variances.

All the tests above, however, are usually used to compare variances of two or more groups, rather than testing a hypothesis about a single population variance.

If you are dealing with a single population and can't assume normality, it may be difficult to find an appropriate test for the population variance. You might need to resort to methods such as bootstrapping, which involves resampling your data many times to get an empirical distribution of the sample variance, and then using this distribution to test your hypothesis.

Alternatively, you could try transforming your data in a way that makes it more normally distributed, and then using the chi-square test. Logarithmic or square root transformations, for example, can sometimes make data more symmetric and reduce the impact of outliers. However, this depends on the nature of your data and may not always be appropriate.

As always, the choice of test or method should depend on the specifics of your problem and the nature of your data.

Cloud2016 · 2023年6月6日

yuanfan 谷歌家的差距有点大呀！特别是下面一句，Welch-Satterthwaite test 是参数检验，即使用最广泛的两样本的 Welch t 检验，也是 A/B 测试常用的。

then you may want to use a nonparametric test, such as the Welch-Satterthwaite test.

nan.xiao Liechi
我稍微修改一下原问题，将检验方差改为检验尺度参数，不知道是否能给出单样本情形下的检验方法（参数和非参数都可以）。

how to test scale parameter in one sample when the population is unknown?

nan.xiao · 2023年6月6日

Cloud2016

When you are interested in testing the scale parameter rather than the variance, you will need to use different tests. The appropriate test will depend on the underlying distribution of your data. Here are a few possibilities:

Kolmogorov-Smirnov Goodness-of-Fit Test: This test is a general method for comparing a sample with a reference probability distribution. If you have an idea of what your scaled distribution should look like, you can compare your sample with this distribution.

Mann-Whitney U Test (or Wilcoxon rank-sum test): This non-parametric test is used to compare two independent samples to determine if they come from the same distribution. It can be used to compare the scale parameters of two samples. This test is typically used for comparing medians, but it also makes use of the full distribution of the data and hence provides some information about scale.

Ansari-Bradley Test: This is a non-parametric test for the equality of the variances (or scales) of two populations.

Mood’s Median Test: This is a non-parametric test that determines whether the medians of two groups are different, which indirectly relates to scale.

Bootstrapping: Bootstrapping is a resampling method where one repeatedly samples from the observed data with replacement, and then checks the empirical distribution of the scale parameter to test the hypothesis.

Likelihood Ratio Test: If you are willing to make some parametric assumptions (like assuming a particular family of distributions), then you might use a likelihood ratio test to test whether the scale parameter is equal to a particular value. For instance, if you assume that your data is exponentially distributed, the scale parameter would be the inverse of the rate parameter.

Again, the right choice of test will depend on the specifics of your problem and the nature of your data. The key is to remember that inferences about scale often come with more assumptions and are more sensitive to the exact shape of the underlying distribution compared to inferences about location (like the mean or median).

Liechi · 2023年6月6日

nan.xiao 看来 GPT4 比 3.5 要靠谱不少啊。3.5 会建议在不符合正态的时候用列文检验，但不会提醒列文检验不能用于单样本的场景；甚至会把 Bartlett's test 当非参来推荐.。

If the population distribution is unknown or not assumed to be normal, and you want to test whether a sample variance equals an expected value, you can use a non-parametric test called the Bartlett's test.

如果不自己手动复核的话，就被带到沟里了。部分时间心态如下：

Cloud2016 · 2023年6月6日

nan.xiao 谢谢。看来，GPT-4 是一本正经地不正面回答问题，GPT-4 就说没有找到相关的检验方法就完了，偏偏说了一大堆无助于问题的其他内容，不过，能唬新人。老实说，我之前是认真翻找过一些材料，按照 MECE 的原则想把众多的检验方法排排座，唯一没有座位可排的就是上面的检验问题（单样本总体分布未知，尺度/方差参数的检验问题），然后，我就想可能是我所知不多，没有找到，来问问 GPT-4，结果它一本正经地胡说。

其实，我有个问题，在忽略 GPT-4 版本更新带来的滞后性后，GPT-4 是不是可以保证已经拥有最前沿的知识？如果用它搜不到答案，意味着，这个问题还没有人做过？感觉这个问题对科研工作者会挺重要的。

nan.xiao · 2023年6月6日

Cloud2016 同意，即使是 GPT-4，推断能力以及 conciseness 都还需要加强，以及大家最常吐槽的 hallucination。这些弱点在你问一个写 grant proposal 这样的需要强事实的创意写作问题时就集中暴露了，上来就编出一堆不存在的论文 + 强行迎合设定进行文本套娃。

即使如此，目前 GPT-4 的水平比其他所有开源模型以及 GPT-3.5 还是强不少。个人猜测，可能领先 Google 的进度两到三年。如果相关数据在训练文本中不是太稀有，给定一个靠谱的框架和上下文，目前定位写作助手还是可以的。当然，我是降临派，表示期待 GPT-5 的表现。

对于你的问题，我的感觉是 GPT 基本可以保证知识的覆盖度（只要你关注的问题不是太偏门），所以我有时也会利用它验证想法，因为很多研究类问题需要输入较多上下文，而这类问题使用 Google 并没有办法很好地搜索到答案。

Cloud2016 · 2023年6月6日

nan.xiao @yuanfan 抱歉，我对工具吐槽太狠了，不要介意呀，希望后续有新的问题，还可以帮忙贴一贴 GPT 的答案，这至少也让我知道，在我提的这个问题上，它们是无能为力的。

Cloud2016 · 2023年6月6日

nan.xiao GPT 基本可以保证知识的覆盖度

在这个问题上来看，它提供的答案体现出来了。

另外，没有人做过的或者新的东西算不算偏门？对于这类问题， GPT 可以简洁一点，或者就给出最新的成果。

针对不同的数据科学岗位，GPT 会在哪些方面有助于提升工作效率？我很好奇这个问题，是否有相关的文章在总结经验，比如商业分析师岗位，GPT 在哪些方面可以给到帮助？

yuanfan · 2023年6月6日

Cloud2016 你这吐槽不狠啊，许多时候我也觉得新必应跟以前相比降智了，而且变敷衍了。

针对不同的数据科学岗位，GPT 会在哪些方面有助于提升工作效率？我很好奇这个问题，是否有相关的文章在总结经验，比如商业分析师岗位，GPT 在哪些方面可以给到帮助？

这个问题也是我想问的，我总觉得是不是我没用对它们，所以没有用好它们。

Cloud2016 · 2023年6月6日

有另一种解释，找不到任何相关材料，没有人研究过。这个检验问题可能本身没什么应用价值，不值得研究。我只是在机械地用 MECE 原则去对方法做划分。

Cloud2016 · 2023年9月12日

偶然看到一则消息说了类似的问题。所有的大语言模型都对强事实的推理问题无解。有人出了一道很简单的推理计算问题，如下：

Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?

涉及阅读理解了，LLM 模型没有一个答对的 https://benchmarks.llmonitor.com/sally

yuanfan · 2023年9月13日

Cloud2016
这个问题算不算是脑筋急转弯？

Cloud2016 · 2023年9月13日

yuanfan 不算。考试一般不会出现脑筋急转弯，但会考类似上面的题。脑筋急转弯需要的理解层次更高，非正式，答案也可能不唯一。

Liechi · 2023年9月14日

Cloud2016

Sally is one of the sisters mentioned in the statement. So, Sally has one sister.

by GPT3.5

Cloud2016 · 2023年9月14日

Liechi 和你不一样。刚去验证的，截图为证，使用的是 gpt-3.5-turbo.

我突然想到一个点，咱们讨论的其实都属于一个确定性的问题。而 GPT 等 LLM 本质上都是基于概率论、统计的，只有在研究不确定性问题的时候，它们才有相对优势。这是在拿我们的优势去挑战它们的劣势，有点欺负人家。

yuanfan · 2023年9月14日

Cloud2016
哈哈，“欺负”这个词好“人类”啊，只有人类才会有这种基于道德规训带来的心理状态。