jianzhiying
# 读入数据
# 示例
gen <- c('AF-CAM01-A5-2002','AF-CAM01-A5-2002','AF-CAM01-A5-2002')
# 从txt读
# gen <- read.table('...txt',...)
# 转换类型至data.frame
gen <- as.data.frame(gen)
# 命名列
names(gen) <- c('gen_info')
# 转换字段类型,默认读取是factor类型,也可以在read.table指定参数stringsAsFactors=FALSE
gen$gen_info <- as.character(gen$gen_info)
# 安装sqldf包
install.packages('sqldf')
library(sqldf)
# 提取国家和基因信息,并赋予新的字段名
gen$country <- substr(gen$gen_info,4,6)
gen$gen_type <- substr(gen$gen_info,10,11)
# 分组统计
gen_stat <- sqldf('select country, gen_type, count(1) as num
from gen group by country, gen_type ')
`
enthumelon
[未知用户]
假设字段长度固定就用read.fwf
假设字段长度不定,你的数据既然是-分隔符,你为啥不用read.table(...,sep='-')注意options中stringasfactor变成F.