发个学习贴，新手可以进来看看一些问题的理解

superdesolator

94.PKD之NAMESPACE

首先我来引用下一段英文[quote]The namespace controls the search strategy for variables used by functions in the package. If not found locally, R searches the package namespace first, then the imports, then the base namespace and then the normal search path.[/quote]

记得我之前仔细研究过search path点这里

上述英文解释正好是一个补充，其实之前的帖子环境那章也提到过：

nrow<br />
#> function (x)<br />
#> dim(x)[1L]<br />
#> <bytecode: 0x2a4ebc8><br />
#> <environment: namespace:base><br />
dim <- function(x) c(1, 1)<br />
dim(mtcars)<br />
#> [1] 1 1<br />
nrow(mtcars)<br />
#> [1] 32

上述代码发生的原因就是，nrow首先在自己的包（base包）的namespace中找，然后在imports中找，再在base namespace中找，最后才上search path找.

显然我那篇帖子说的是search path上的寻值规则，而这里一切发生在上search path之前[s:11]

所以nrow在自己包的namespace中就找到了

最简单的使用namespaces的方法是roxygen2,它就是来转化成NAMESPACE指令的，例如

@import plyr becomes import(plyr) and @importFrom plyr ddply becomes importFrom(plyr, ddply),@export可以自动找到函数名

也就是说import(plyr)这些指令是在NAMESPACE文件中的真正样子，而roxygen2是帮我们在R文件周围写的标签转化成对应的效果.

我们首先来介绍Exports:

默认的，roxygen2不导出任何东西，所以必须得显式@export

作者例举了下面的指导规则来告诉我们什么该导出：

1) 函数，想导出的函数都是必须文档化的

2) 数据集不需要导出，它们本身就是可获得的

3) S3类如果想让其他人创建类的实例，导出它的构造函数

4) s3泛型函数同1)

5) s3方法所有的S3方法都必须要导出（原因暂时看不懂我也翻了下Writing R ext手册也是暂时看不懂[s:18]）

6) s4类暂时没怎么弄懂，原话引用如下，并且开了个贴子询问中

[quote]S4 classes: if you want others to be able to extend your class, @export it. If you want others to create instances of your class, but not extend it, @export the constructor function, but not the class.

# Can extend and create

#' @export

setClass("A")

# Can extend, but constructor not exported

#' @export

B <- setClass("B")

# Can create, but not extend

#' @export C

C <- setClass("C")

# Can create and extend

#' @export D

#' @exportClass D

D <- setClass("D")[/quote]

7)s4泛型同1)

8)s4方法最好全导出

9)RC类和S4类规则一样

另外要注意对于S4，

Depends: methods in DESCRIPTION

import(methods) in NAMESPACE （@import methods 可以放在任何地方，但最好在介绍整个包的文档中）

要这样做的原因是，methods包会被loaded和attached在R交互式的时候，但Rscript的时候却不attached

我们再来介绍Imports:

NAMESPACE同时也可以控制我们的包需要的东西

假如我们的包需要用到一个包中很少的函数，我们可以在DESCRPTION中Imports这个包，然后用::使用那些函数，这样做没有在NAMESPACE中import有效率，但是由于在NAMESPACE中import的那些包会在load我们的包的同时被load,而使用::可以在用到这个包::的时候才会导致这个包被load,这就带来了一个包load delay的好处（虽然我现在还不知道什么是load,attach,但很显然这个方法比起在NAMESPACE中显式import确实可以带来load delay)

当然，替代的方法就是@importFrom pkg fun

如果我们包需要很多个来自其他的包的函数，那么@import pkg

如果我们需要为一个S3泛型增加一个类方法，那么 @importFrom pkg generic.

如果我们使用S4，则

@importClassesFrom package classa classb ... to import selected S4 classes.

@importMethodsFrom package methoda methodb ... to import methods for selected S4 generics.

如果我们需要其他包的编译好的代码，使用@useDynLib (这部分writing r ext上有比较详细的介绍，但暂时无力弄懂[s:18]）

[quote]@useDynLib package imports all compiled functions.

@useDynLib package routinea routineb imports selected compiled functions.

Any @useDynLib specification containing a comma, e.g. @useDynLib mypackage, .registration = TRUE will be inserted as is into the the NAMESPACE, e.g. useDynLib(mypackage, .registration = TRUE)[/quote]

最后，作者给出了一个技巧，就是怎么给一个suggested package 注册S3方法，仍然是原文引用[s:18]

[quote]One complexity arises when you want to register S3 methods for a generic that’s defined in a suggested package. You can’t use S3method() because the generic is not available at package load time. Instead, you can use the following code to set up hooks that load automatically.

From htmltools:

# COPYRIGHT RStudio, GPL >= 2

registerMethods <- function(methods) {

lapply(methods, function(method) {

pkg <- method[[1]]

generic <- method[[2]]

class <- method[[3]]

func <- get(paste(generic, class, sep="."))

if (pkg %in% loadedNamespaces()) {

registerS3method(generic, class, func, envir = asNamespace(pkg))

}

setHook(

packageEvent(pkg, "onLoad"),

function(...) {

registerS3method(generic, class, func, envir = asNamespace(pkg))

}

)

})

}[/quote]

虽然我们不明白methods是什么样子，怎么lapply它，怎么拆分出需要的东西，但是应该可以看出setHook中说，当那个包被load的时候注册S3方法！

PS：注意ctrl+shift+D [s:11]

superdesolator

95.PKD之Testing

这一贴,其实主要介绍作者的testthat包

测试的好处我简短归纳下：

1) 测试越多未来出错可能越少

2) 越好的代码越容易测试，测试的时候很有可能让我们把代码分散成功能块

3）更容易知道工作要做什么，比如修正BUG,没有BUG的时候添加新功能

4) 修改代码更有信心

1. testthat的结构

expectation 用来检测最基础结果，test用来组合expectation来检测一个test_that()函数，context用来组合相关的一系列test_that()

2.expectations

引用原文，夹杂个人翻译[s:11]

[quote]•equals() 用 all.equal()来检测是否相等，但容许一定的数值精度差.

# Passes expect_that(10, equals(10)) # Also passes expect_that(10, equals(10 + 1e-7))

# Fails expect_that(10, equals(10 + 1e-6))

# Definitely fails! expect_that(10, equals(11))

•is_identical_to() 用identical()来检测精确相等.

# Passes expect_that(10, is_identical_to(10)) # Fails expect_that(10, is_identical_to(10 + 1e-10))

•is_equivalent_to() 宽松的equals()版本，因为忽略属性:

# Fails expect_that(c(“one” = 1, “two” = 2), equals(1:2)) # Passes expect_that(c(“one” = 1, “two” = 2), is_equivalent_to(1:2))

•is_a() 检测inherit()s继承自某个特定的类.

model <- lm(mpg ~ wt, data = mtcars) # Passes expect_that(model, is_a(“lm”))

# Fails expect_that(model, is_a(“glm”))

•matches() matches a character vector against a regular expression. The optional all argument controls whether all elements or just one element needs to match. This code is powered by str_detect() from the stringr package.

string <- “Testing is fun!” # Passes expect_that(string, matches(“Testing”)) # Fails, match is case-sensitive expect_that(string, matches(“testing”)) # Passes, match can be a regular expression expect_that(string, matches(“T.+ting”))

•prints_text() 对输出结果来进行正则匹配

a <- list(1:10, letters) # Passes expect_that(str(a), prints_text(“List of 2”)) # Passes expect_that(str(a), prints_text(fixed(“int [1:10]”))

•shows_message() 检测一个表达式生成特定信息:

# Passes expect_that(library(mgcv), shows_message(“This is mgcv”))

•gives_warning() 检测生成警告信息.

# Passes expect_that(log(-1), gives_warning()) expect_that(log(-1), gives_warning(“NaNs produced”)) # Fails expect_that(log(0), gives_warning())

•throws_error() verifies that the expression throws an error. You can also supply a regular expression which is applied to the text of the error.

# Fails expect_that(1 / 2, throws_error()) # Passes expect_that(1 / “a”, throws_error()) # But better to be explicit expect_that(1 / “a”, throws_error(“non-numeric argument”))

•is_true() is a useful catchall if none of the other expectations do what you want - it checks that an expression is true. is_false() is the complement of is_true().

[/quote]

这11种之多的检测可以组合出很多test去完成一个特定错误检测,另外，作者还提供了快键方式：

[quote]

expect_that(x, is_true())

expect_true(x)

expect_that(x, is_false())

expect_false(x)

expect_that(x, is_a(y))

expect_is(x, y)

expect_that(x, equals(y))

expect_equal(x, y)

expect_that(x, is_equivalent_to(y))

expect_equivalent(x, y)

expect_that(x, is_identical_to(y))

expect_identical(x, y)

expect_that(x, matches(y))

expect_match(x, y)

expect_that(x, prints_text(y))

expect_output(x, y)

expect_that(x, shows_message(y))

expect_message(x, y)

expect_that(x, gives_warning(y))

expect_warning(x, y)

expect_that(x, throws_error(y))

expect_error(x, y)

[/quote]

3.tests

一个test是test_that(name,code block)组成的, name是一种一旦出错可以让你知道哪里或者哪个功能出错的有用的信息标签，而code block就是完成对那种功能的检测的组合代码块，一旦出错就可以轻松找到对应的代码块，例如作者举的 test floor_date() function from library(lubridate).

test_that("floor_date works for different units", {<br />
  base <- as.POSIXct("2009-08-03 12:01:59.23", tz = "UTC")</p>
<p>  is_time <- function(x) equals(as.POSIXct(x, tz = "UTC"))<br />
  floor_base <- function(unit) floor_date(base, unit)</p>
<p>  expect_that(floor_base("second"), is_time("2009-08-03 12:01:59"))<br />
  expect_that(floor_base("minute"), is_time("2009-08-03 12:01:00"))<br />
  expect_that(floor_base("hour"),   is_time("2009-08-03 12:00:00"))<br />
  expect_that(floor_base("day"),    is_time("2009-08-03 00:00:00"))<br />
  expect_that(floor_base("week"),   is_time("2009-08-02 00:00:00"))<br />
  expect_that(floor_base("month"),  is_time("2009-08-01 00:00:00"))<br />
  expect_that(floor_base("year"),   is_time("2009-01-01 00:00:00"))<br />
})

那么，假设一旦出错，我们看到 Test failed: "floor_date works for different units", Not expected: ..... 我们很快就可以找到上述test的对应代码

每一个test拥有自己的环境，但也有会影响到自己环境之外的情况：

1)文件系统，创建删除文件

2) search path,loading,detaching包

3) global options, 例如options(),par()

作者说如果有上述动作，you’ll need to clean up after yourself，有些测试包会帮你set-up and teardown methods that are run automatically before and after each test，对于testthat,我们可以在tests外创建对象然后依靠copy-on-modify机制不改变外部对象（这段没怎么深入理解）引用一下原文吧[s:11]

[quote]When you use these actions in tests, you’ll need to clean up after yourself. Many other testing packages have set-up and teardown methods that are run automatically before and after each test. These are not so important with testthat because you can create objects outside of the tests and rely on R’s copy-on-modify semantics to keep them unchanged between test runs. To clean up other actions you can use regular R functions[/quote]

4.contexts

context就是把一些相关功能的tests组织到代码块，一般一个context一个文件,下面就举个测试stringr

包中的str_length

context("String length")</p>
<p>test_that("str_length is number of characters", {<br />
  expect_that(str_length("a"), equals(1))<br />
  expect_that(str_length("ab"), equals(2))<br />
  expect_that(str_length("abc"), equals(3))<br />
})</p>
<p>test_that("str_length of missing is missing", {<br />
  expect_that(str_length(NA), equals(NA_integer_))<br />
  expect_that(str_length(c(NA, 1)), equals(c(NA, 1)))<br />
  expect_that(str_length("NA"), equals(2))<br />
}<br />
test_that("str_length of factor is length of level", {<br />
  expect_that(str_length(factor("a")), equals(1))<br />
  expect_that(str_length(factor("ab")), equals(2))<br />
  expect_that(str_length(factor("abc")), equals(3))<br />
})

我们同时用nchar来替代str_length测试

5.运行测试

我们有两种运行测试的情况：一是开发的时候，交互式的，二是自动测试

1) 我们用test_file(),test_dir()运行所有测试

2) auto_test来自动测试

3) 让R CMD check来运行测试

6.测试文件和目录

从下面的代码以及结果可以看到source(path)和test_file(path)的区别，可以看到test_file会给出所有的结果，而source只会给出第一个错误的test

> source("test-str_length.r")<br />
> test_file("test-str_length.r")<br />
.........</p>
<p>> source("test-nchar.r")<br />
Error: Test failure in 'nchar of missing is missing'<br />
* nchar(NA) not equal to NA_integer_<br />
'is.NA' value mismatch: 0 in current 1 in target<br />
* nchar(c(NA, 1)) not equal to c(NA, 1)<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>> test_file("test-nchar.r")<br />
...12..34</p>
<p>1. Failure: nchar of missing is missing ---------------------------------<br />
nchar(NA) not equal to NA_integer_<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>2. Failure: nchar of missing is missing ---------------------------------<br />
nchar(c(NA, 1)) not equal to c(NA, 1)<br />
'is.NA' value mismatch: 0 in current 1 in target</p>
<p>3. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("ab")) not equal to 2<br />
Mean relative difference: 0.5</p>
<p>4. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("abc")) not equal to 3<br />
Mean relative difference: 0.6666667

test_dir()会自动执行一个目录下以test开头的文件，下面是stringr的结果，12个contexts,每个2到25个基本检测

> test_dir("inst/tests/")<br />
String and pattern checks : ......<br />
Detecting patterns : .........<br />
Duplicating strings : ......<br />
Extract patterns : ..<br />
Joining strings : ......<br />
String length : .........<br />
Locations : ............<br />
Matching groups : ..............<br />
Test padding : ....<br />
Splitting strings : .........................<br />
Extracting substrings : ...................<br />
Trimming strings : ........

然后就是报告的形式，上面看到的test_dir的结果就是test_dir和test_file默认的报告形式summary reporter,而我们可以选择minimal reporter(E代表错误，F代表失败):

> test_dir("inst/tests/", reporter="minimal")<br />
...............................................

而最后一种reporter形式stop是当有一个failure出现就stops()的形式.原文如下

[quote]The stop reporter is the default and stop()s whenever a failure is encountered.[/quote]

我的理解是stop reporter就是

4. Failure: nchar of factor is length of level --------------------------<br />
nchar(factor("abc")) not equal to 3<br />
Mean relative difference: 0.6666667

7.Autotest

auto_test()有两个参数，code_path和test_path.前者是放代码的目录，后者是放测试的目录，一旦运行，auto_test()会自动检测，如果某个测试文件修改了，那么会重新用那个测试文件测试，如果某个源文件修改了，它会重新装载那个源文件，然后执行所有测试文件测试。

我的理解就是auto_test()一运行，剩下的就去改吧，改完点保存自动测试。

这改变了传统的modify->save->source->check的模式

8.R CMD check

首先我们得把testthat放入DESCRIPTION文件，以免R CMD check警告unspecified dependencies.

然后就是要把测试代码放入R CMD check可以找到的地方.

最开始，最好的实践是把所有测试文件放入inst/tests然后添加下列代码到tests/test-all.R

library(testthat)<br />
library(yourpackage)<br />
test_package("yourpackage")

现在推荐的方式是：

把测试放入tests/testthat,然后在tests/test-all.R放入

library(testthat)<br />
test_check("yourpackage")

作者说这样的好处是可以在R CMD install的时候选择 -install-tests或者设置install.packages()的参数

INSTALL_opts = c(“–install-tests”)

我又要猜测为什么要这么做了[s:11]：

由于R包的代码和测试代码的位置相对可以找到，所以test_check里面很可能用了test_dir和test_file,从而利用了这两个函数的优势（例如比起source的优势）

9.开发的两种方式

一种叫探索性编程，一种叫验证式编程

验证式就是遇到一个test失败，run一下auto_test()然后不断修改直到通过

探索式就是不断source+modify

然后最后都弄好了就document(),update NEWS.

superdesolator

95.PKD之Data

R包里面有3种包含数据的方法：

1) 如果你想存parsed data，并且可以供用户使用，那么放入 data/

2) 如果你想存parsed data,但不想让用户使用，那么放入R/sysdata.rda

3) 如果你想存raw data,最好放入inst/extdata

OK，下面我们一个个介绍：

1.data/

这个文件夹是放置示例数据集的最好的地方，其中的.rda文件要是save()创建的：

每个文件应该包含一个单个对象，文件名就是这个对象的名字

save(mtcars, file = "data/mtcars.rda")

data文件夹里的对象是自动被exported，所以必须文档化，本贴最后面会提到

还记得DESCRIPTION里的LaztData字段吗，它的值为true的话，就表示数据集不会占用任何内存直到我们加载它们，

pryr::mem_used()<br />
#> 24.4 MB<br />
library(nycflights13)<br />
pryr::mem_used()<br />
#> 24.6 MB<br />
invisible(flights)<br />
pryr::mem_used()<br />
#> 59.9 MB<br />

我们通过上例看到，当我们加载包的时候数据集flights并没有被装入内存，但是当使用flights的时候，它就被装入内存了，所以显然这个方式是值得推荐的

另外，有些数据文件是从别的地方收集到的raw data创建的，所以我们最好能这些可重复的代码放入data-raw文件夹中，并注意把这个文件夹加入.Rbuildignore)

大家可以看看实际的例子： nasaweather

最后作者提醒，如果你的包要用到一个很少变动的大数据集，最好把它放入一个专门的包，这样你的代码包改动的时候，用户就不必重复下载数据集

2.R/sysdata.rda

有的时候，我们的函数需要用到一些事先计算好的表，那么我们最好使用这个方法，因为我们不应该让用户可以使用，这个文件可以放入任意数量对象：

save(x, y, z, file = "R/sysdata.rda")

由于不被用户使用，不被exported,所以不必文档化

3.inst/

当我们需要给出 loading/parsing raw data的例子的时候，把原文件放入inst/,事实上，我们可以放入任何目录，但最好是inst/extdata,由于inst目录的内容在包被装的时候所有东西都会被放入最上层目录，所以inst里面不能有R，DESCRIPTION之类名字的目录

获得路径的方法：

system.file("include", "Rcpp.h", package = "Rcpp")<br />
#> [1] "/usr/lib/R/site-library/Rcpp/include/Rcpp.h"

但要注意，如果没有这个文件夹，system.file不会出现错误，而是返回空字符串

system.file("include", "Rcp.h", package = "Rcpp")<br />
#> [1] ""

4.vignettes/

如果我们需要数据集写vignette,就直接把数据集放入vignette文件夹就好了，然后用local path来引用

5.cran notes

如果我们的包要被提交到cran,我们必须确保数据集被优化压缩，原文：

[quote]Run tools::checkRdaFiles() to determine the best compression for each file. If you’ve lost the code for recreating the files, you can use tools::resaveRdaFiles() to save as best format, but it’s better to modify the original save() code[/quote]

大家可以?看下这两个函数以及?save,其实有很多东西要学

6.文档化数据集

给个作者的例子

#' Prices of 50,000 round cut diamonds.<br />
#'<br />
#' A dataset containing the prices and other attributes of almost 54,000<br />
#' diamonds. The variables are as follows:<br />
#'<br />
#' \itemize{<br />
#'   \item price. price in US dollars (\$326--\$18,823)<br />
#'   \item carat. weight of the diamond (0.2--5.01)<br />
#'   ...<br />
#' }<br />
#'<br />
#' @format A data frame with 53940 rows and 10 variables<br />
#' @source \url{http://www.diamondse.info/}<br />
#' @name diamonds<br />
NULL

我们不直接文档数据集，而是文档NULL，然后用@name告诉roxygen2你其实在文档什么，然后@format就是给出数据结构的样子，如果我们省略，roxygen会自动根据间str()的第一行输出添加，@source给出数据的来源,经常是\url{}

PS: 下面内容是补充第5点的内容，不想看可以不必细看[s:11]

顺便更深入的介绍下save,tools::resaveRdaFiles,tools::checkRdaFiles,其实大家可以自己?去仔细读，我这里捡些重要的内容以及给个框架：

首先是R中的数据压缩

[quote]Large files can be reduced considerably in size by compression. A particular 46MB R object was saved as 35MB without compression in 2 seconds, 22MB with gzip compression in 8 secs, 19MB with bzip2 compression in 13 secs and 9.4MB with xz compression in 40 secs. The load times were 1.3, 2.8, 5.5 and 5.7 seconds respectively. These results are indicative, but the relative performances do depend on the actual file: xz compressed unusually well here.

It is possible to compress later (with gzip, bzip2 or xz) a file saved with compress = FALSE: the effect is the same as saving with compression. Also, a saved file can be uncompressed and re-compressed under a different compression scheme (and see resaveRdaFiles for a way to do so from within R).[/quote]

可以看到gzip,bzip2,xz的不同之处

再次来看save:

save(..., list = character(),<br />
     file = stop("'file' must be specified"),<br />
     ascii = FALSE, version = NULL, envir = parent.frame(),<br />
     compress = !ascii, compression_level,<br />
     eval.promises = TRUE, precheck = TRUE)

...就是要存储的对象的名字，list是包含名字的字符串向量,file是存储的路径,envir是寻找的环境，ascii是存储的格式（备选是2进制，大概是说ascii可以跨平台，而2进制只可以在所有R平台),version是说R的版本，老的用1，新的用2（ R 1.4.0 之后就算新了），compress就是压缩方式，为true默认是gzip,显然这里和ascii对应了，就是说如果不是ascii也就是2进制，默认用gzip,当然也可以自己选"bzip2","xz",compression_level貌似是数字代表压缩方式？（不太理解），eval.promises是否要计算promises对象（不的话就得保存整个相关环境），precheck是否

检测对象的存在性？最后个safe,先用临时文件保存成功了转成正式的，失败了就自动删除吧，貌似是这样

最后要注意，这个函数也看options的设置，我们可以

options(save.defaults = list(ascii = TRUE, safe = FALSE)

更改默认值

最后来看看tools::resaveRdaFiles,tools::checkRdaFiles：

其实看清楚了save之后，这两个函数就没什么好说的了，我只举个例子，首先大家从

这个地方的data文件下载borders.rdata放入wd

然后我们来看看吧：

<br />
tools::checkRdaFiles("borders.rdata")<br />
   #            size ASCII compress version<br />
#borders.rdata 79354 FALSE     gzip       2<br />
tools::resaveRdaFiles("borders.rdata",compress="xz",2)<br />
tools::checkRdaFiles("borders.rdata")<br />
 #             size ASCII compress version<br />
#borders.rdata 50572 FALSE       xz       2

OK,这么清楚的了解了save,tools::resaveRdaFiles,tools::checkRdaFiles，那么在R包里存放个数据有什么难的呢？[s:11]

superdesolator

96.PKD之Compiled code

这个部分很难，作者也就写了3页，我就简单列举了：

1.c++

可能是Rcpp的原因，C++是最容易用在R里面的编译型语言

2.C

如果想包装老的C代码，或者罕见的不用C++的包装

那么：

.c文件放入src/，并且添加

[quote]#include <R.h>

#include <Rdefines.h>[/quote]

像这样包装函数

#' @useDynLib qstack car<br />
car_ <- function(x) .Call(car, x)

[quote]

Slowest: use .Call("car")

Faster: use .Call(car)

Fastest: don’t write wrapper function[/quote]

如果使用了useDynLib,最好使用.onUnload()来收尾

.onUnload <- function (libpath) {<br />
  library.dynam.unload("qstack", libpath)<br />
}

3.Fortran

...[s:18]

4.Java

作者说rJava也需要很多copy所以貌似很慢

5.Dynamically loading dll

使用@dynlib loading

然后unload:

.onUnload <- function(libpath) {<br />
  library.dynam.unload("yourpackagename", libpath)<br />
}

来确保当包被unloaded的时候DLL也被unloaded

6.Licensing

如果使用了别人写的代码，我们必须e.g

http://cran.rstudio.com/web/packages/optparse/COPYRIGHTS

我估计有些地方作者还没写完吧[s:12]

superdesolator

97.PKD之Other components

很不幸，作者这本书正在写，而现在这部分还没写完，Hadley Wickham大神很忙的啊（可是书您能先一口气写完么[s:11]）

作者列了个目录，准备写的部分：

[quote]

.Rbuildignore

inst/

exec/: executable scripts

po/: translation files

the inst/CITATION file describes how to cite your package. If you have published a peer reviewed article which you’d like people to cite when they use your software, this is the place to put it.

the demo/ directory contains larger scale demos, that use many features of the package.[/quote]

可以看到exec/,po/我们还陌生的很啊[s:12]

CITATION貌似写完了,它和citation()函数关联，文件在inst/CITATION,主要目的是cite,对我们来说就是cite r package.我们直接运行citation()会出现如何cite base R.

To cite R in publications use:</p>
<p>  R Core Team (2012). R: A language and environment for statistical<br />
  computing. R Foundation for Statistical Computing, Vienna, Austria.<br />
  ISBN 3-900051-07-0, URL http://www.R-project.org/.</p>
<p>A BibTeX entry for LaTeX users is</p>
<p>  @Manual{,<br />
    title = {R: A Language and Environment for Statistical Computing},<br />
    author = ,<br />
    organization = {R Foundation for Statistical Computing},<br />
    address = {Vienna, Austria},<br />
    year = {2012},<br />
    note = { {ISBN} 3-900051-07-0},<br />
    url = {http://www.R-project.org/},<br />
  }</p>
<p>We have invested a lot of time and effort in creating R, please cite it<br />
when using it for data analysis. See also ‘citation("pkgname")’ for<br />
citing R packages.

产生上面信息的CITATION文件可能长下面的样子：

bibentry("Manual",<br />
   title = "R: A Language and Environment for Statistical Computing",<br />
   author = person("R Core Team"),<br />
   organization = "R Foundation for Statistical Computing",<br />
   address      = "Vienna, Austria",<br />
   year   = version$year,<br />
   note   = "{ISBN} 3-900051-07-0",<br />
   url    = "http://www.R-project.org/",</p>
<p>   mheader = "To cite R in publications use:",</p>
<p>   mfooter =<br />
     paste("We have invested a lot of time and effort in creating R,",<br />
      "please cite it when using it for data analysis.",<br />
      "See also", sQuote("citation(\"pkgname\")"),<br />
      "for citing R packages.", sep = " ")<br />
)

大家可以？bibentry,第一个参数是bibtype,然后对照下mheader,mfooter不过这两个是可选的,其他部分是基本信息

然后demo没有写完，作者只说了demo/目录下的R文件会和demo()关联，比起一个函数例子，它们是多个函数的组合

demos需要个index,最简单的是：

#' @demoTitle my title

然后作者就什么都没写了........ 期待作者赶紧写吧，不过看了下这本书要到2015年的6月出版，有的等了[s:15]

superdesolator

98.PKD 之 Git and github

此时此刻，作者几乎木有写，然后我就找了份教程学习了，感觉非常好

点这里看教程

我花了10多个小时的时间学习完了,绝对值得推荐,看完上面的教程，应该是以后不用看Hadley Wickham写出来的这部分了[s:11]

superdesolator

99.PKD 之 Checking

如果我们要把包发布到CRAN上，这个包必须通过R CMD check,也就是说不能有

warnings,errors以及尽可能少的notes.

要check我们的包，运行devtools::check(),或者在RStudio按ctrl+shift+E

1.Check components

devtools::run_examples()它还有可选的参数让你选择从哪个函数开始，这个好处是前面通过的函数就不要每次都运行了

devtools:：check_doc() 文档化是最常见的check失败的原因之一

2.Common problems

这里作者给出了一些常见问题，由于我们并没有真正走到这一步，但先看看也是不错的：

1)有时check()的失败是安装失败造成的

2)所有被exported的对象都必须文档化，所有的函数参数都必须文档化

3)S3方法要和它的泛型保持参数一致，特别例如print的...参数：

print<br />
#> function (x, ...)<br />
#> UseMethod("print")<br />
#> <bytecode: 0x30475b8><br />
#> <environment: namespace:base></p>
<p># BAD<br />
print.my_class <- function(x) cat("Hi")</p>
<p># GOOD<br />
print.my_class <- function(x, ...) cat("Hi")

4)用TRUE,FLASE而不是T,F

5）Data应该要被适当压缩，见Data那贴

6) examples or vignettes带来的BUGS，要注意的是它们只能用exported函数，而不能用internal functions

同样也要注意DESCRIPTION的依赖关系

7) building pdf manual

[quote]Occassionally you’ll get an error when building the PDF manual. This is usually because the pdf is built by latex and you’ve forgotten to escape something. Debugging this is painful - your best bet is to look up the latex logs and combined tex file and work back from their to .Rd files and

back to a roxygen comment. You may need to enlist the help of a local latex expert.[/quote]

3.Checking on Windows

[quote]You can check packages on Windows using build_win(). This builds your package and submits it to the CRAN win-builder. 30-60 minutes after submission, you’ll recieve and email which will point you to the check results.

This checks both 32-bit and 64-bit builds. So if you’re using C code, you’ll need to make sure it works for both.

By default build_win() checks with both the release and development version of R. You can override this with the version parameter.[/quote]

貌似是说，build_win()帮你把包提交到CRAN win-builder上check,然后过个半个小时到1个小时我们就会收到相应结果的邮件

4.Checking dependencies

感觉作者没有写完[s:12]，因为原文就这样一个标题

5.Continuous integration with Travis

这部分暂时大家可以不学，因为连github都没有用熟悉，怎么就能学怎么自动化，什么commit的时候运行check,什么pull requests的时候自动运行....先把github用的比较熟了，再来学习这个travis工具吧！貌似是个很给力的工具

superdesolator

100.PKD 之 Releasing

1.Releasing a package

两个地方:

CRAN,github

重要的文件：

NEWS 说明版本的变动

README.md 总体介绍包

2.README.md

作者说用3-4段来说明以下几个问题：

•Why should someone use your package?

•How does it compare to other existing solutions?

•What are the main functions?

同时推荐在更新包的版本时也使用，这在github包的主页上会显示

3.NEWS

以下举例stringr的NEWS

stringr 0.5<br />
===========</p>
<p>* new <code>str_wrap</code> function which gives <code>strwrap</code> output in a more<br />
  convenient format</p>
<p>* new <code>word</code> function extract words from a string given user defined<br />
  separator (thanks to suggestion by David Cooper)</p>
<p>* <code>str_locate</code> now returns consistent type when matching empty string<br />
  (thanks to Stavros Macrakis)</p>
<p>* new <code>str_count</code> counts number of matches in a string.</p>
<p>* <code>str_pad</code> and <code>str_trim</code> receive performance tweaks - for large vectors<br />
  this should give at least a two order of magnitude speed up</p>
<p>* str_length returns NA for invalid multibyte strings</p>
<p>* fix small bug in internal <code>recyclable</code> function

可以看到：

1) 包名+版本号+ 一行=

2) 每个改动都要和一个bullet一起列出，如果一个bullet持续多行，那就其他行得至少2个spaces

3) 如果改动多，可以分成小节，小节的名字都必须是大写左对齐

4) markdown的使用会帮助其他地方publish NEWS

我们可以使用devtools::show_news()，默认显示最近，但latest=FALSE改变这个默认

另一个选择是NEWS.md

•Can’t include in package (needs to be in .Rbuildignore), and so not on CRAN.

•Easier to read on github

•Easier to copy-and-paste into github releases

4.Version numbers

R使用版本号来决定包的依赖关系是否满足，所以很重要：

一个版本号的结构通常是这样的：

1.9和1.9.0效果一样，但是比1.9.2低

1是主版本号major,一些不兼容的API变动的时候可以改变这个版本号

9是次版本号minor,一些向后兼容的API变动的时候

0是补丁版本号patch,一些向后兼容的修正

另外，还有次补丁sub-patch,这预示着released版本在改动中1.9.0.9000,1.9.0.9001

但注意上面的规则也不是绝对的，例如我们修改了一个很不常用的向后兼容的API那可能就没必要改动1，或者当我们修复了一个非常重要的BUG的时候可能可以改动1

5.Publishing on Github

作者暂时没写[s:12]

6.Publishing on CRAN

当我们通过了check的时候就可以上传给CRAN了：

1.Build a package bundle with build().

2.Upload to http://cran.r-project.org/submit.html

3.Confirm submission email

4.Wait for response.

作者推荐devtools::release(),这个我们以后遇到再说吧，没实战也不知道效果

值得说的是，如果失败了，那么就make the suggested changes, update cran-comments.md describing what you did and why, and re-run release().

7.CRAN policies

最重要的是：

•Must pass R CMD check without errors, warnings and ideally notes. If there are notes that you do not believe are important, it is almost always easier to hack around them than to persuade CRAN that they’re ok.

•Must have clearly identified IP holders, have open source license (so CRAN can distribute it) and not include any binary executables.

•Maintainer email address must be correct. CRAN will make no effort to contact apart from through this address. Make sure it’s something that’s likely to be around for a while

•Must not be more than 5 megs.

•Packages shouldn’t make external changes without explicit user permission. Don’t write to the file system, change options, install packages, quit R, send information over the internet, open external software, etc.

•You must not use :::

我相信如果大家仔细学过来的话，应该对上面所提的比较熟悉，作者在其他地方都提及过

8.Post-release and Publicising

这部分基本上就是git tag以及宣传我们的包的地方，主要是twitter,blog,maillist以及自己的包的主页(github pages)等

superdesolator

101.PKD之Style guide

其实这个帖子介绍风格的,作者的adr书中也说过，毕竟两本书分开的嘛，所以作者可能要写两次，那我在简单做个总结吧，形成一个好的R风格要注意的：

1) File names :

最好意义，且以.R结尾

# Good<br />
fit-models.R<br />
utility-functions.R<br />
# Bad<br />
foo.r<br />
stuff.r

如果需要按顺序执行则得这样：

0-download.R<br />
1-parse.R<br />
2-explore.R

2)Object names:

最好是下划线，不要以.防止S3方法派送出问题

# Good<br />
day_one<br />
day_1</p>
<p># Bad<br />
first_day_of_the_month<br />
DayOne<br />
dayone<br />
djm1

同时不要以存在的内建的名字命名

3)Syntax:

对内建的操作符(=,+,-,<-等）要左右加空格，对于函数参数的=也是一样,注意逗号后加空格前面不加

# Good<br />
average <- mean(feet / 12 + inches, na.rm = TRUE)</p>
<p># Bad<br />
average<-mean(feet/12+inches,na.rm=TRUE)

但对于:, ::, :::例外

除了函数调用的括号，最好在括号前后加空格，括号里面不要加

if (debug) do(x)<br />
diamonds[5, ]</p>
<p># Bad<br />
if ( debug ) do(x)  # No spaces around debug<br />
x[1,]   # Needs a space after the comma<br />
x[1 ,]  # Space goes after comma not before

4)Curly braces

# Good</p>
<p>if (y < 0 && debug) {<br />
  message("Y is negative")<br />
}</p>
<p>if (y == 0) {<br />
  log(x)<br />
} else {<br />
  y ^ x<br />
}</p>
<p># Bad</p>
<p>if (y < 0 && debug)<br />
message("Y is negative")</p>
<p>if (y == 0) {<br />
  log(x)<br />
}<br />
else {<br />
  y ^ x<br />
}<br />
#Good<br />
if (y < 0 && debug) message("Y is negative")

只有一句的话if后可跟，否则第二行indent，同时注意else的位置

5)Indentation

long_function_name <- function(a = "a long argument",<br />
                               b = "another argument",<br />
                               c = "another long argument") {<br />
  # As usual code is indented by two spaces.<br />
}

6)Commenting guidelines

# Load data ---------------------------</p>
<p># Plot data ---------------------------

PS:ok,pkd也暂时结束[s:11]

superdesolator

102. 关于时间日期处理的包lubridate

简单介绍下，这个包的用处就是方便我们处理日期有关的事情，举个从别处拿来的例子，假设我们要统计一个给定的日期的当月的星期五有几天，如下代码可以实现这个功能：

重点是 months,days,wday,day等函数的功能

个人感觉比较实用直观，推荐一下给还不知道的筒子们[s:11]

fridays <- function(the_day){<br />
    require(lubridate)<br />
    day(the_day) = 1<br />
    the_month = seq(the_day, the_day+months(1)-days(1),1)<br />
    sum(wday(the_month) == 6)<br />
}</p>
<p>That returns the number of Fridays in the month of a Date object given as arg:</p>
<p>> fridays(as.Date("2012-01-01"))<br />
[1] 4<br />
> fridays(as.Date("2012-02-01"))<br />
[1] 4<br />
> fridays(as.Date("2012-03-01"))<br />
[1] 5<br />
> fridays(as.Date("2012-04-01"))<br />
[1] 4<br />

superdesolator

103. search path的研究

最近又用到这个的知识,不知道这个知识前面的帖子收录了没有,在链接一下

点我

« 上一页