1. Let X be a matrix of random normal values (mean =0; sd=1) having 10 columns and N=100 rows. Reset the values in the first row in the matrix to (1,1.5,1.4,3,1.9,4,4.9,2.6,3.2,2.4). Assume that the first 5 columns of data for each row correspond to a group A, while the remaining 5 to another group B.前5个一组 后五个一组 每行十个数字
1.) For each row of the matrix X, compute: 方差等或不等时的P值
a) the t-statistic comparing the groups A and B assuming equal variance and the p-value
b) Compute the probability to observe such a t-statistics only by chance, using a permutation analysis. The following strategy will be used: the columns will be randomly permuted nk=1000 times, and at each iteration the t-statistic will be computed again and recorded in a vector. At the end, compute the p-value as the number of times out of nk when the t-statistic with the permuted data was at least as or more extreme than the t-statistics obtained with the real (non-permuted data).
Present the result as a data.frame with 4 columns: ID= Row number, t= t-score, p_theoretical=p-value assuming the asymptotic distribution; p_permutations=p-value from permutations;
Sort the data.frame in descending order of p-values.
2.) Plot the distribution (see hist) of the resulting vector of t-scores obtained at step 1a) after excluding the first element (corresponding to the first row) and on the same graph show a vertical line for the t-value of the first row.
2. Olympic Medals
During both summer and winter Olympic games the medal table is often of interest to spectators and the media. The medal table is a tally of the number of medals which have been won by each participating country during the games. A good performance on the medal table is often a source of pride for a country. However, it is to be expected that large countries will win more medals than smaller countries, due to the fact that they have a larger pool from which to recruit athletes. Thus smaller countries often argue that a better measure of performance would be medals per capita. However, it is possible that medal tally shouldn't be expected to increase in direct proportion to population. Further, it is reasonable to think that the medal tally will also depend on the resources available to athletes in a country, or on the climate (for example, access to snow).
The objective of this analysis is to explore the relationship between a country's medal tally, population size, wealth (measured by GDP) and climate (approximated by latitude). Further, it is proposed that in future a standardised measure of a country's medal tally should be developed which corrects for population size, climate and wealth. Your should investigate the feasibility of this proposal, and discuss your finndings.
The file medals.RData[load(medals.RData) in R] is an R data frame with one row for every country that has won at least one olympic medal in the previous four Olympic Games.
The variable descriptions are as follows:
Country Name of the competing country (only countries which have won at least one medal since 2004 are included).
Latitude Latitude of the capital city.
Summer2004 Total number of medals (gold, silver and bronze) won at the Summer Olympics in 2004.
Summer2008 Total number of medals won at the Summer Olympics in 2008.
Winter2006 Total number of medals won at the Winter Olympics in 2006.
Winter2010 Total number of medals won at the Winter Olympics in 2010.
Population2007 The population in 2007 (source: World Fact Book).
GDP2009 Gross Domestic Product in billions of US dollars (source: World Fact Book).
Bootstrap
计算机检验中的耗时方法
精确检验permutation 排列。。