huadeng
一般来说察看函数的原代码用edit命令就可,但下面的情况怎办:
>edit(normalTest)
function(x, ...)
UseMethod("normalTest")
> dumpMethod("normalTest")
Method specification written to file "normalTest.default.q"
而"normalTest.default.q"显示为:
setDefaultMethod("normalTest",
function(x, ...)
UseMethod("normalTest")
)
其实还是没看见其定义的方法,我也不知这方法对不,希望各位高手给与帮助。谢谢
yihui
源代码恐怕不是给Windows用户看的,如果你用Linux,应该就会很容易看到
在那些zip包中,并不方便看源代码,你下载相应的*.tar.gz包之后就会看到在每个包的R文件夹下有一些文件名为*.r的文件,那些就是真正的源代码了。
比如,进入
http://cran.cnr.berkeley.edu/src/contrib/PACKAGES.html中的某一个包,会看到
Package source: *.tar.gz
fu_neng
.cor()等内置函数代码怎样查看到? 比如查它所用的公式?
yihui
上面不是说了么,下载源码包才能看,Windows binary一般是看不到的。
yihui
cor()函数不是在contrib pkg中,你得下载base source pkg
huadeng
谢老师,我问的是在splus的确情况下如何查看
huadeng
谢老师,我问的是在splus的确情况下如何查看
januslian
我记得有个函数getS3method可以查看成员函数的具体定义。但是不知道在Splus是不是这样的
sociology
[quote]
引用第0楼huadeng于2006-11-18 09:55发表的“请教一个很有用的问题”:
一般来说察看函数的原代码用edit命令就可,但下面的情况怎办:
>edit(normalTest)
function(x, ...)
UseMethod("normalTest")
> dumpMethod("normalTest")
.......[/quote]
察看源代码的方法取决于具体的环境,以下是一个例子:
t.test
methods("t.test")
getS3method("t.test","default")
在我blog上面有一些总结
http://sociology.yculblog.com/post.882537.html
另外,最详细的文章请见Rnew 2006,vol 4的R Help Desk
areg
你好,在线啊?
Statistics with R
Vincent Zoonekynd
<zoonek@math.jussieu.fr>
28th August 2005
我刚才去看的,把R程序包(UNIX)的解压了,没有找到正态性检验函数,我把查到的相关信息和参考示例给发上来:
第449/1085页
7.2.3 Gaussian distribution
This is the famous “bell-shaped” distribution.
More precisely, the central limit theorem states that if X1, X2, ... X3 are independant
identically distributed random varaibles with expectation m avd variance sˆ2, then
(X1+X2+...+Xn) - nm
---------------------
sqrt(n) s
converges in law to a gaussian distribution when n tends to infinity.
in other words, the empirical means
X1+...Xn
--------
n
is “close to” a gaussian distribution of expectation m and standard deviation s/sqrt(n).
This explains the omnipresence of the gaussian law: when you repeat an experiment a large
number of times, the average result (almost) follows a gaussian distribution.
limite.centrale <- function (r=runif, m=.5, s=1/sqrt(12), n=c(1,3,10,30), for (i in n) { x <- matrix(r(i*N),nc=i)
x <- ( apply(x, 1, sum) - i*m )/(sqrt(i)*s)
hist(x, col=’light blue’, probability=T, main=paste("n =",i),
ylim=c(0,max(.4, density(x)$y)))
lines(density(x), col=’red’, lwd=3)
curve(dnorm(x), col=’blue’, lwd=3, lty=3, add=T)
if( N>100 ) { rug(sample(x,100))
} else { rug(x)
}
}
}
op <- par(mfrow=c(2,2))
limite.centrale()
par(op)
op <- par(mfrow=c(2,2))
limite.centrale(rexp, m=1, s=1)
par(op)
op <- par(mfrow=c(2,2))
limite.centrale(rexp, m=1, s=1)
par(op)
op <- par(mfrow=c(2,2))
limite.centrale(function (n) { rnorm(n, sample(c(-3,3),n,replace=m=0, s=sqrt(10), n=c(1,2,3,10))
par(op)
Exercise: dra a plot, similar to the one above, but with the theoretical probability densities.
For the formula-savvy reader, the gaussian probability density function is
f(x) = exp( -x^2/2 ) / sqrt( 2 pi )
In R:
curve(dnorm(x), xlim=c(-3,3), col=’red’, lwd=3)
title(main=’Gaussian Probability Distribution Function’)
curve(pnorm(x), xlim=c(-3,3), col=’red’, lwd=3)
title(main=’Cumulative gaussian distribution function’)
curve(qnorm(x), xlim=c(0,1), col=’red’, lwd=3)
title(main=’Gaussian quantiles function’)
n <- 1000
x <- rnorm(n)
hist(x, probability=T, col=’light blue’, main=’Gaussian Distribution’)
lines(density(x), col=’red’, lwd=3)
curve(dnorm(x), add=T, col=’red’, lty=2, lwd=3)
legend(par(’usr’)[2], par(’usr’)[4], xjust=1,
c(’sample density’, ’theoretical density’),
lwd=2, lty=c(1,2),
col=’red’)
In the discussion above, we had assumed the mean was mu=0 and the standard deviation
sigma=1 (the “standard gaussian” distribution): to get the general case, we just apply an
affine transformation:
f(x) = exp( -( (x-mu) / sigma )^2 /2 ) / sqrt( 2 pi sigma )
The gaussian distribution is sometimes called the “normal” distribution – I shall try to avoid
this word, because in some situations, the distribution we would like to observe (the one we
would like to call “normal”) is not the gaussian one.
########################################################################
上面是statistics for R中的材料
下面是从source得到的资料(UNIX)安装包解压得的
############# t检验代码###########
t.test <- function(x, ...) UseMethod("t.test")
t.test.default <-
function(x, y = NULL, alternative = c("two.sided", "less", "greater"),
mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95,
...)
{
alternative <- match.arg(alternative)
if(!missing(mu) && (length(mu) != 1 || is.na(mu)))
stop("'mu' must be a single number")
if(!missing(conf.level) &&
(length(conf.level) != 1 || !is.finite(conf.level) ||
conf.level < 0 || conf.level > 1))
stop("'conf.level' must be a single number between 0 and 1")
if( !is.null(y) ) {
dname <- paste(deparse(substitute(x)),"and",
deparse(substitute(y)))
if(paired)
xok <- yok <- complete.cases(x,y)
else {
yok <- !is.na(y)
xok <- !is.na(x)
}
y <- y[yok]
}
else {
dname <- deparse(substitute(x))
if( paired ) stop("'y' is missing for paired test")
xok <- !is.na(x)
yok <- NULL
}
x <- x[xok]
if( paired ) {
x <- x-y
y <- NULL
}
nx <- length(x)
if(nx < 2) stop("not enough 'x' observations")
mx <- mean(x)
vx <- var(x)
estimate <- mx
if(is.null(y)) {
df <- nx-1
stderr <- sqrt(vx/nx)
if(stderr < 10 *.Machine$double.eps * abs(mx))
stop("data are essentially constant")
tstat <- (mx-mu)/stderr
method <- ifelse(paired,"Paired t-test","One Sample t-test")
names(estimate) <- ifelse(paired,"mean of the differences","mean of x")
} else {
ny <- length(y)
if(ny < 2) stop("not enough 'y' observations")
my <- mean(y)
vy <- var(y)
method <- paste(if(!var.equal)"Welch", "Two Sample t-test")
estimate <- c(mx,my)
names(estimate) <- c("mean of x","mean of y")
if(var.equal) {
df <- nx+ny-2
v <- ((nx-1)*vx + (ny-1)*vy)/df
stderr <- sqrt(v*(1/nx+1/ny))
} else {
stderrx <- sqrt(vx/nx)
stderry <- sqrt(vy/ny)
stderr <- sqrt(stderrx^2 + stderry^2)
df <- stderr^4/(stderrx^4/(nx-1) + stderry^4/(ny-1))
}
if(stderr < 10 *.Machine$double.eps * max(abs(mx), abs(my)))
stop("data are essentially constant")
tstat <- (mx - my - mu)/stderr
}
if (alternative == "less") {
pval <- pt(tstat, df)
cint <- c(-Inf, tstat + qt(conf.level, df) )
}
else if (alternative == "greater") {
pval <- pt(tstat, df, lower = FALSE)
cint <- c(tstat - qt(conf.level, df), Inf)
}
else {
pval <- 2 * pt(-abs(tstat), df)
alpha <- 1 - conf.level
cint <- qt(1 - alpha/2, df)
cint <- tstat + c(-cint, cint)
}
cint <- mu + cint * stderr
names(tstat) <- "t"
names(df) <- "df"
names(mu) <- if(paired || !is.null(y)) "difference in means" else "mean"
attr(cint,"conf.level") <- conf.level
rval <- list(statistic = tstat, parameter = df, p.value = pval,
conf.int=cint, estimate=estimate, null.value = mu,
alternative=alternative,
method=method, data.name=dname)
class(rval) <- "htest"
return(rval)
}
t.test.formula <-
function(formula, data, subset, na.action, ...)
{
if(missing(formula)
|| (length(formula) != 3)
|| (length(attr(terms(formula[-2]), "term.labels")) != 1))
stop("'formula' missing or incorrect")
m <- match.call(expand.dots = FALSE)
if(is.matrix(eval(m$data, parent.frame())))
m$data <- as.data.frame(data)
m[[1]] <- as.name("model.frame")
m$... <- NULL
mf <- eval(m, parent.frame())
DNAME <- paste(names(mf), collapse = " by ")
names(mf) <- NULL
response <- attr(attr(mf, "terms"), "response")
g <- factor(mf[[-response]])
if(nlevels(g) != 2)
stop("grouping factor must have exactly 2 levels")
DATA <- split(mf[[response]], g)
names(DATA) <- c("x", "y")
y <- do.call("t.test", c(DATA, list(...)))
y$data.name <- DNAME
if(length(y$estimate) == 2)
names(y$estimate) <- paste("mean in group", levels(g))
y
}
areg
[quote]引用第8楼sociology于2006-11-20 22:51发表的“”:
察看源代码的方法取决于具体的环境,以下是一个例子:
t.test
methods("t.test")
.......[/quote]
sociology
你好啊,刚才我也是去查看了半天,觉得t test比较典型,就举它为例,结果一提交,发现你先举了啊
areg
是不是把正态性检验放在密度函数中了?请看下面:来自source(UNIX)解压包
###############################
density <- function(x, ...) UseMethod("density")
density.default <-
function(x, bw = "nrd0", adjust = 1,
kernel = c("gaussian", "epanechnikov", "rectangular",
"triangular", "biweight", "cosine", "optcosine"),
weights = NULL, window = kernel, width,
give.Rkern = FALSE,
n = 512, from, to, cut = 3, na.rm = FALSE, ...)
{
if(length(list(...)) > 0)
warning("non-matched further arguments are disregarded")
if(!missing(window) && missing(kernel))
kernel <- window
kernel <- match.arg(kernel)
if(give.Rkern)
##-- sigma(K) * R(K), the scale invariant canonical bandwidth:
return(switch(kernel,
gaussian = 1/(2*sqrt(pi)),
rectangular = sqrt(3)/6,
triangular = sqrt(6)/9,
epanechnikov= 3/(5*sqrt(5)),
biweight = 5*sqrt(7)/49,
cosine = 3/4*sqrt(1/3 - 2/pi^2),
optcosine = sqrt(1-8/pi^2)*pi^2/16
))
if (!is.numeric(x))
stop("argument 'x' must be numeric")
name <- deparse(substitute(x))
x <- as.vector(x)
x.na <- is.na(x)
if (any(x.na)) {
if (na.rm) x <- x[!x.na]
else stop("'x' contains missing values")
}
N <- nx <- length(x)
x.finite <- is.finite(x)
if(any(!x.finite)) {
x <- x[x.finite]
nx <- length(x) # == sum(x.finite)
}
## Handle 'weights'
if(is.null(weights)) {
weights <- rep.int(1/nx, nx)
totMass <- nx/N
}
else {
if(length(weights) != N)
stop("'x' and 'weights' have unequal length")
if(!all(is.finite(weights)))
stop("'weights' must all be finite")
if(any(weights < 0))
stop("'weights' must not be negative")
wsum <- sum(weights)
if(any(!x.finite)) {
weights <- weights[x.finite]
totMass <- sum(weights) / wsum
} else totMass <- 1
## No error, since user may have wanted "sub-density"
if (!isTRUE(all.equal(1, wsum)))
warning("sum(weights) != 1 -- will not get true density")
}
n.user <- n
n <- max(n, 512)
if (n > 512) n <- 2^ceiling(log2(n)) #- to be fast with FFT
if (missing(bw) && !missing(width)) {
if(is.numeric(width)) {
## S has width equal to the length of the support of the kernel
## except for the gaussian where it is 4 * sd.
## R has bw a multiple of the sd.
fac <- switch(kernel,
gaussian = 4,
rectangular = 2*sqrt(3),
triangular = 2 * sqrt(6),
epanechnikov = 2 * sqrt(5),
biweight = 2 * sqrt(7),
cosine = 2/sqrt(1/3 - 2/pi^2),
optcosine = 2/sqrt(1-8/pi^2)
)
bw <- width / fac
}
if(is.character(width)) bw <- width
}
if (is.character(bw)) {
if(nx < 2)
stop("need at least 2 points to select a bandwidth automatically")
bw <- switch(tolower(bw),
nrd0 = bw.nrd0(x),
nrd = bw.nrd(x),
ucv = bw.ucv(x),
bcv = bw.bcv(x),
sj = , "sj-ste" = bw.SJ(x, method="ste"),
"sj-dpi" = bw.SJ(x, method="dpi"),
stop("unknown bandwidth rule"))
}
if (!is.finite(bw)) stop("non-finite 'bw'")
bw <- adjust * bw
if (bw <= 0) stop("'bw' is not positive.")
if (missing(from))
from <- min(x) - cut * bw
if (missing(to))
to <- max(x) + cut * bw
if (!is.finite(from)) stop("non-finite 'from'")
if (!is.finite(to)) stop("non-finite 'to'")
lo <- from - 4 * bw
up <- to + 4 * bw
y <- .C("massdist",
x = as.double(x),
xmass = as.double(weights),
nx = nx,
xlo = as.double(lo),
xhi = as.double(up),
y = double(2 * n),
ny = as.integer(n),
PACKAGE = "base" )$y * totMass
kords <- seq(0, 2*(up-lo), length = 2 * n)
kords[(n + 2):(2 * n)] <- -kords[n:2]
kords <- switch(kernel,
gaussian = dnorm(kords, sd = bw),
## In the following, a := bw / sigma(K0), where
## K0() is the unscaled kernel below
rectangular = {
a <- bw*sqrt(3)
ifelse(abs(kords) < a, .5/a, 0) },
triangular = {
a <- bw*sqrt(6) ; ax <- abs(kords)
ifelse(ax < a, (1 - ax/a)/a, 0) },
epanechnikov = {
a <- bw*sqrt(5) ; ax <- abs(kords)
ifelse(ax < a, 3/4*(1 - (ax/a)^2)/a, 0) },
biweight = { ## aka quartic
a <- bw*sqrt(7) ; ax <- abs(kords)
ifelse(ax < a, 15/16*(1 - (ax/a)^2)^2/a, 0) },
cosine = {
a <- bw/sqrt(1/3 - 2/pi^2)
ifelse(abs(kords) < a, (1+cos(pi*kords/a))/(2*a),0)},
optcosine = {
a <- bw/sqrt(1-8/pi^2)
ifelse(abs(kords) < a, pi/4*cos(pi*kords/(2*a))/a, 0)}
)
kords <- fft( fft(y)* Conj(fft(kords)), inv=TRUE)
kords <- Re(kords)[1:n]/length(y)
xords <- seq(lo, up, length = n)
# keep <- (xords >= from) & (xords <= to)
x <- seq(from, to, length = n.user)
structure(list(x = x, y = approx(xords, kords, x)$y, bw = bw, n = N,
call=match.call(), data.name=name, has.na = FALSE),
class="density")
}
plot.density <- function(x, main=NULL, xlab=NULL, ylab="Density", type="l",
zero.line = TRUE, ...)
{
if(is.null(xlab))
xlab <- paste("N =", x$n, " Bandwidth =", formatC(x$bw))
if(is.null(main)) main <- deparse(x$call)
plot.default(x, main=main, xlab=xlab, ylab=ylab, type=type, ...)
if(zero.line) abline(h=0, lwd=0.1, col = "gray")
invisible(NULL)
}
print.density <- function(x, digits=NULL, ...)
{
cat("\nCall:\n\t",deparse(x$call),
"\n\nData: ",x$data.name," (",x$n," obs.);",
"\tBandwidth 'bw' = ",formatC(x$bw,digits=digits), "\n\n",sep="")
print(summary(as.data.frame(x[c("x","y")])), digits=digits, ...)
invisible(x)
}
yihui
[quote]引用第6楼huadeng于2006-11-20 21:38发表的“”:
谢老师,我问的是在splus的确情况下如何查看[/quote]
很久没用S-Plus了,也不打算用了……你看Object Explorer能不能看到吧。S-Plus可没说“Open Source”哦!
fu_neng
[quote]引用第12楼谢益辉于2006-11-21 12:40发表的“”:
很久没用S-Plus了,也不打算用了……你看Object Explorer能不能看到吧。S-Plus可没说“Open Source”哦![/quote]
是啊, R比它强大多了,并且是在s基础上产生的精华, 何必还用s
sociology
[quote]引用第13楼fu_neng于2006-11-21 17:49发表的“”:
是啊, R比它强大多了,并且是在s基础上产生的精华, 何必还用s[/quote]
各有各的优势吧。比如处理大数据,splus比R好。
但是,在作图方面,R应该比Splus好,特别是在grid完善后:)
fu_neng
stats包里的cor()函数, 用下载linux源码的方法看cor.r文件,里面仍是调用
.Internal(cor(x2, y2, na.method, method == "kendall"))
用methods(cor),getS3method("cor","deafault")
等方法说找不到此cor()函数, 看来是禁止我们看它的源代码了.
yihui
肯定不是这样,既然R是Open Source的,肯定不会禁止查看源代码。我也不知道你怎么对这个问题这么感兴趣:)
cor的源文件恐怕在stats的包中不能直接找到,因为它涉及到一些内置函数,都是用C语言编写的,所以你要追根溯源的话,得去看\src\main下面的那些*.c文件,因为我也没有专门去找过cor的源代码,所以不确定是不是在cov.c那个文件中(按理说是)。
areg
S-PLUS用起来很方便,R很强大,很灵活,都是S语言为基础的,总想试试揭开它的机制,说白了,想自己试着破解一个S-PLUS的新版本长期用,毕竟试用期会结束的!
januslian
[quote]引用第15楼fu_neng于2006-11-21 18:15发表的“”:
stats包里的cor()函数, 用下载linux源码的方法看cor.r文件,里面仍是调用
.Internal(cor(x2, y2, na.method, method == "kendall"))
用methods(cor),getS3method("cor","deafault")
等方法说找不到此cor()函数, 看来是禁止我们看它的源代码了.[/quote]
R Internal 文档中有关于.Internal和.Primitive的论述。
不妨用SubVersion,checkout一个分支看看里边的源代码。
BTW:破解S+估计用不到这么多东西吧,破解GAUSS或者SAS的人不见得就懂它的内部机制啊。
fu_neng
用cor(y,y') 求到的与模型返回的R^2的平方根的值不一样,想看cor()用到的公式究竟和决定系数有什么不同