如今,大语言模型如火如荼地发展,最近,也开始学习一点文本处理的知识。
library(StanfordCoreNLP)
library(StanfordCoreNLPjars)
library(NLP)
s <- as.String(paste("Stanford University is located in California.",
"It is a great university."))
s
## 可以运行
p <- StanfordCoreNLP_Pipeline(annotators = c("pos", "lemma"))
annotate(s, p)
id type start end features
1 sentence 1 45 constituents=<<integer,7>>
2 word 1 8 word=Stanford, POS=NNP, lemma=Stanford
3 word 10 19 word=University, POS=NNP, lemma=University
4 word 21 22 word=is, POS=VBZ, lemma=be
5 word 24 30 word=located, POS=VBN, lemma=locate
6 word 32 33 word=in, POS=IN, lemma=in
7 word 35 44 word=California, POS=NNP, lemma=California
8 word 45 45 word=., POS=., lemma=.
9 sentence 47 71 constituents=<<integer,6>>
10 word 47 48 word=It, POS=PRP, lemma=it
11 word 50 51 word=is, POS=VBZ, lemma=be
12 word 53 53 word=a, POS=DT, lemma=a
13 word 55 59 word=great, POS=JJ, lemma=great
14 word 61 70 word=university, POS=NN, lemma=university
15 word 71 71 word=., POS=., lemma=.
## 报错
p <- StanfordCoreNLP_Pipeline(annotators = c("pos", "lemma", "ner"))
annotate(s, p)
NER 不能运行,报内存超出的错
Error in e(s, a) : java.lang.OutOfMemoryError: Java heap space
附
用到的两个 R 包下载地址
https://datacube.wu.ac.at/src/contrib/StanfordCoreNLP_0.1-11.tar.gz
英文大模型
https://datacube.wu.ac.at/src/contrib/StanfordCoreNLPjars_4.5.5-1.tar.gz
sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.3
Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale:
[1] zh_CN.UTF-8/zh_CN.UTF-8/zh_CN.UTF-8/C/zh_CN.UTF-8/zh_CN.UTF-8
time zone: Asia/Shanghai
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] NLP_0.2-1 StanfordCoreNLPjars_4.5.5-1
[3] StanfordCoreNLP_0.1-11
loaded via a namespace (and not attached):
[1] compiler_4.3.2 cli_3.6.2 xml2_1.3.6 rlang_1.1.3 rJava_1.0-11
最后
大家有推荐的 NLP / LLM 课程吗?通俗点、偏实战的。