为啥描述性统计量会缺少几个数据呢。。。。

Sylvanas · 2024年7月15日

找到一段批量进行wilcoxon test的代码，用自己的数据算了下咋会少数据呢。对了下原始数据，去除NA也对不上，去除0值也对不上。每个年龄段都少了几个数据。。。。。。
数据

source("wmc.R")
ageD<-read.csv("age Dixon test.csv")
groupD<read.csv("age Dixon test group.csv")
wmc(ageD$RBC ~ groupD$age, data = ageD, method = "holm")

以下为输出结果：
Descriptive Statistics

    age30_39  age19_29  age50_59  age40_49     age60

n 246.00000 109.00000 127.00000 182.00000 135.00000
median 2.70000 2.80000 2.80000 3.70000 3.80000
mad 2.66868 2.81694 2.81694 2.81694 2.81694

Multiple Comparisons (Wilcoxon Rank Sum Tests)
Probability Adjustment = holm

Group.1  Group.2       W           p

1 age30_39 age19_29 13198.5 1.000000000
2 age30_39 age50_59 14789.5 1.000000000
3 age30_39 age40_49 20771.5 1.000000000
4 age30_39 age60 12931.5 0.003244869 **
5 age19_29 age50_59 6682.5 1.000000000
6 age19_29 age40_49 9385.0 1.000000000
7 age19_29 age60 5859.5 0.054092551 .
8 age50_59 age40_49 11302.0 1.000000000
9 age50_59 age60 7265.5 0.224675626

10 age40_49 age60 10400.0 0.151463301

Signif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1

wmc.R代码如下，感觉问题出在获取描述性统计量的部分：
`# Nonparametric pairwise multiple comparisons using the Wilcoxon Signed Rank Test

Probability values are adjusted using the p.adjust function

wmc <- function(formula, data, exact=FALSE, sort=TRUE, method="holm"){

setup

df <- model.frame(formula, data)
y <- df[[1]]
x <- as.factor(df[[2]])

reorder levels of x by median y

if(sort){
medians <- aggregate(y, by=list(x), FUN=median)[2]
index <- order(medians$x)
x <- factor(x, levels(x)[index])
}

groups <- levels(x)
k <- length(groups)

summary statistics

stats <- function(z)(c(N = length(z), Median = median(z), MAD = mad(z)))
sumstats <- t(aggregate(y, by=list(x), FUN=stats)[2])
rownames(sumstats) <- c("n", "median", "mad")
colnames(sumstats) <- groups
cat("Descriptive Statistics\n\n")
print(sumstats)

multiple comparisons

mc <- data.frame(Group.1=character(0),
Group.2=character(0),
W=numeric(0),
p.unadj=numeric(0),
p=numeric(0),
stars=character(0),
stringsAsFactors=FALSE)

perform Wilcoxon test

row <- 0
for(i in 1:k){
for(j in 1:k){
if (j > i){
row <- row + 1
y1 <- y[x==groups]
y2 <- y[x==groups[j]]
test <- wilcox.test(y1, y2, exact=exact)
mc[row,1] <- groups
mc[row,2] <- groups[j]
mc[row,3] <- test$statistic
mc[row,4] <- test$p.value
}
}
}
mc$p <- p.adjust(mc$p.unadj, method=method)

add stars

mc$stars <- " "
mc$stars[mc$p < .1] <- "."
mc$stars[mc$p < .05] <- ""
mc$stars[mc$p < .01] <- ""
mc$stars[mc$p < .001] <- ""
names(mc)[6] <- " "

cat("\nMultiple Comparisons (Wilcoxon Rank Sum Tests)\n")
cat(paste("Probability Adjustment = ", method, "\n\n", sep=""))
print(mc[-4], right=TRUE)
cat("---\nSignif. codes: 0 '' 0.001 '' 0.01 '' 0.05 '.' 0.1 ' ' 1\n")
return(invisible(NULL))

}
`

谢谢！

Sylvanas · 2024年7月16日

奇怪，手工删除NA后数据个数又能对上了。。。。