- 已经解决了,参考这里:http://stackoverflow.com/questions/40639138/configure-error-installing-r-3-3-2-on-ubuntu-checking-whether-bzip2-support-suf
- 在Linux下安装R-3.3.2源码,遇到bzip2不能识别问题:
[yangpc@login R_package_archive]$ uname -a Linux login 2.6.32-279.el6.x86_64 #1 SMP Wed Jun 13 18:24:36 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux [yangpc@login R_package_archive]$./configure --prefix=/panfs/home/yangpc/soft/R/R_soft/ --enable-R-shlib --with-libpth-prefix=/panfs/home/yangpc/soft/ CPPFLAGS="-I/panfs/home/yangpc/soft/lib/packages/bzip2-1.0.6/include/ -I/panfs/home/yangpc/soft/lib/packages/zlib-1.2.8/include/" checking zlib.h presence... yes checking for zlib.h... yes checking if zlib version >= 1.2.5... yes checking whether zlib support suffices... yes checking mmap support for zlib... yes checking for BZ2_bzlibVersion in -lbz2... yes checking bzlib.h usability... yes checking bzlib.h presence... yes checking for bzlib.h... yes checking if bzip2 version >= 1.0.6... no checking whether bzip2 support suffices...
这个问题在R邮件列表里都有问过,但是始终没有给出解决方案:
http://comments.gmane.org/gmane.comp.lang.r.hpc/1719
http://r.789695.n4.nabble.com/bzip2-td4726112.html
http://stackoverflow.com/questions/40639138/configure-error-installing-r-3-3-2-on-ubuntu-checking-whether-bzip2-support-suf
R安装说明里有指出需要bzip2 version 1.0.6:
https://cran.r-project.org/doc/manuals/r-release/R-admin.html 再一次证实了,谢谢。
- 于 帖子提醒功能
当有人回复自己的帖子,或者自己参与讨论的帖子有新回复时,能有个提醒的消息就好了。
</p><br /> sapply(dm1new,function(x) which(dm2new %in% x))<br />
- 于 函数密度图
看一下curve函数
</p><br /> > curve(dnorm(x),xlim=c(-10,10))<br />
回复 第9楼 的 pengchy:
找到方法了:
</p><br /> > bud.sum <- summary(budworm.lg, cor = F)<br /> > bud.sum$coeff<br /> Estimate Std. Error z value Pr(>|z|)<br /> (Intercept) -2.9935418 0.5526997 -5.4162175 6.087304e-08<br /> sexM 0.1749868 0.7783100 0.2248292 8.221122e-01<br /> ldose 0.9060364 0.1671016 5.4220678 5.891353e-08<br /> sexM:ldose 0.3529130 0.2699902 1.3071324 1.911678e-01<br /> ><br />
如何把glm结果中的Coefficients中的Pr提取出来呢?
budworm.lg$coefficient并没有这个信息。
</p><br /> > summary(budworm.lg, cor = F)</p> <p>Call:<br /> glm(formula = SF ~ sex * ldose, family = binomial)</p> <p>Deviance Residuals:<br /> Min 1Q Median 3Q Max<br /> -1.39849 -0.32094 -0.07592 0.38220 1.10375 </p> <p>Coefficients:<br /> Estimate Std. Error z value Pr(>|z|)<br /> (Intercept) -93.5972 17.2140 -5.437 5.41e-08 ***<br /> sexM -35.1163 27.6925 -1.268 0.205<br /> ldose 0.9060 0.1671 5.422 5.89e-08 ***<br /> sexM:ldose 0.3529 0.2700 1.307 0.191<br />
问题:
用logist做广义回归,predictor之一是factor,不是连续的,用字母和数字表示结果不太一样:用数字表示,结果可以给出这个predictor的显著性。但是用字母之后,结果就变成了对predictor的每一个level都给了一个显著性值,对这个predictor却没有了显著性值。这是为什么呢?
如果想得到predictor的显著性值,必须把factor改为数字的连续变量吗?
</p></p> <p>> options(contrasts = c("contr.treatment", "contr.poly"))<br /> > ldose <- rep(100:105, 2)<br /> > numdead <- c(1, 4, 9, 13, 18, 20, 0, 2, 6, 10, 12, 16)<br /> > sex <- factor(rep(c("M", "F"), c(6, 6)))<br /> > SF <- cbind(numdead, numalive = 20 - numdead)<br /> > budworm.lg <- glm(SF ~ sex*ldose, family = binomial)<br /> > summary(budworm.lg, cor = F)</p> <p>Call:<br /> glm(formula = SF ~ sex * ldose, family = binomial)</p> <p>Deviance Residuals:<br /> Min 1Q Median 3Q Max<br /> -1.39849 -0.32094 -0.07592 0.38220 1.10375 </p> <p>Coefficients:<br /> Estimate Std. Error z value Pr(>|z|)<br /> (Intercept) -93.5972 17.2140 -5.437 5.41e-08 ***<br /> sexM -35.1163 27.6925 -1.268 0.205<br /> ldose 0.9060 0.1671 5.422 5.89e-08 ***<br /> sexM:ldose 0.3529 0.2700 1.307 0.191<br /> ---<br /> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 </p> <p>(Dispersion parameter for binomial family taken to be 1)</p> <p> Null deviance: 124.8756 on 11 degrees of freedom<br /> Residual deviance: 4.9937 on 8 degrees of freedom<br /> AIC: 43.104</p> <p>Number of Fisher Scoring iterations: 4</p> <p>> ldose <- rep(letters[1:6],2)<br /> > budworm.lg <- glm(SF ~ sex*ldose, family = binomial)<br /> Warning message:<br /> In model.matrix.default(mt, mf, contrasts) :<br /> variable 'ldose' converted to a factor<br /> > summary(budworm.lg, cor = F)</p> <p>Call:<br /> glm(formula = SF ~ sex * ldose, family = binomial)</p> <p>Deviance Residuals:<br /> [1] 0 0 0 0 0 0 0 0 0 0 0 0</p> <p>Coefficients:<br /> Estimate Std. Error z value Pr(>|z|)<br /> (Intercept) -25.752 52998.328 0.000 1<br /> sexM 22.807 52998.328 0.000 1<br /> ldoseb 23.555 52998.328 0.000 1<br /> ldosec 24.904 52998.328 0.000 1<br /> ldosed 25.752 52998.328 0.000 1<br /> ldosee 26.157 52998.328 0.000 1<br /> ldosef 27.138 52998.328 0.001 1<br /> sexM:ldoseb -21.996 52998.328 0.000 1<br /> sexM:ldosec -22.161 52998.328 0.000 1<br /> sexM:ldosed -22.188 52998.328 0.000 1<br /> sexM:ldosee -21.016 52998.328 0.000 1<br /> sexM:ldosef 1.558 74950.923 0.000 1</p> <p>(Dispersion parameter for binomial family taken to be 1)</p> <p> Null deviance: 1.2488e+02 on 11 degrees of freedom<br /> Residual deviance: 5.2389e-10 on 0 degrees of freedom<br /> AIC: 54.11</p> <p>Number of Fisher Scoring iterations: 22</p> <p>><br />
很久没来,学习了。
回复 第2楼 的 remember, discover, invent:
是有这种感觉。但是实际上有这个需要,呵呵。
有一组数据的分布是如图所示的。但是,我们想在随机的情况下,也有这种分布,对后面的分析结果有什么影响。所以需要这种模拟。
我用了这种方法,通过不断调整prob参数,基本上能满足要求。
aa <- sample(0:10000,1000,prob=c(seq(0.99,0.98,101),seq(0.1,0.05,900),seq(0.001,0.000001,9000))如何模拟如下图所示的分布呢?也就是有严重拖尾的泊松分布。
想到一个方法是用sample,但是如何指定prob呢?[attachment=221341,1237]
标准误更小,是标准差除以样本量的平方根。
可以参考这里:
http://encyclopedia.thefreedictionary.com/Standard+error+(statistics)回复 第1楼 的 419syy:
?density
看一下每个参数回复 第1楼 的 jueduijingying:
应该是有的。可能我还不是太明白你说的问题,以及为什么要这样做?简单的理解:
</p><br /> sps <- c("sp1","sp2","sp2","sp13","sp1")<br /> sp.val <- c("a 1","b 2","c 3")<br /> names(sp.val) <- c("sp1","sp2","sp13")<br /> sps.val <- sps.val[sps]<br />
不知道这样行不?
回复 第3楼 的 bioshaw:
碰到这种大规模数据量的时候,用其他语言预处理一下可能更好,比如perl/python等。下面这个把背景的竖线也加上了。
</p><br /> parallel(~iris[1:4] | Species, iris,horizon=FALSE,<br /> ylim = extendrange(range(iris[1:4])),<br /> scales = list(y = list(at = NULL, labels = NULL),x=list(rot=45)),<br /> lower = 0, upper = 1,<br /> panel=function(x,y,z,...){<br /> panel.abline(v=1:4,col="gray90")<br /> panel.parallel(x,y,z,...)<br /> })<br />
[attachment=216223,972]
修改后的图片:
[attachment=216220,971]
回复 第3楼 的 wxw.name:
多谢!根据这个,对ylim的极值按比例缩放也行。
下面是Deepayan的回复,思路是一样的:
</p><br /> parallel(~iris[1:4] | Species, iris,horizon=FALSE,<br /> ylim = extendrange(range(iris[1:4])),<br /> scales = list(y = list(at = NULL, labels = NULL)),<br /> lower = 0, upper = 1)<br />
使用common.scale参数可以使y轴表示的是所有sample的总的min/max,但是这样之后,我想对y-axis标记刻度,通过下面这个方法,貌似不行。不知道怎么才能做到,搜索了一下,也没发现好的解决方法。
</p><br /> parallel(~iris[1:4] | Species, iris,horiz=FALSE,common.scale=TRUE,<br /> scales=list(y=list(at=c(0,2,3))))<br />
[attachment=216205,969]
</p><br /> library(lattice)<br /> EE <- equal.count(ethanol$E, number=9, overlap=1/4)</p> <p>## Constructing panel functions on the fly; prepanel<br /> xyplot(NOx ~ C | EE, data = ethanol,<br /> prepanel = function(x, y) prepanel.loess(x, y, span = 1),<br /> xlab = "Compression Ratio", ylab = "NOx (micrograms/J)",<br /> panel = function(x, y) {<br /> panel.grid(h = -1, v = 2)<br /> panel.xyplot(x, y)<br /> panel.loess(x, y, span=1)<br /> },<br /> aspect = "xy")<br />