用 ChatGPT 写 R 代码和数据分析报告

dapengde · 2023年2月13日

最近火到爆的 ChatGPT，想必大家都调戏过了吧？不如来分享一下感受？汇总一下给主站投个稿？

我的第一个反应是让它写段 R 语言代码：

我：写一段 R 语言代码，来输出1000以内所有的质数。

ChatGPT:
x = 2:1000
for (i in x) {
  isprime = TRUE 
  for (j in 2:(i - 1)) {
    if (i %% j == 0) {
      isprime = FALSE
      break
    }
  }
  if (isprime) print (i)
}

咋说呢，转也是能转，但这代码怎么看怎么不像 R：首先是双重循环，其次并没有把所有质数赋值到一个对象再输出（怪我没说清楚咯），更别提调用包或者写成自定义函数了。总感觉是从别的编程语言翻译过来的。

于是我换了个说法，改用英文，并要求不用循环语句。

我：Find all the prime numbers less than 1000 with the R language. Do not use loops.

ChatGPT:
In R, you can use the primes function from the primes package to find all the prime numbers less than 1000 without using loops. Here's how you can do it:
library(primes)
primes(1000)
This code will return an array of all prime numbers less than 1000. The primes function generates prime numbers efficiently, so you don't need to worry about performance issues.

满意！值得称赞的是：代码部分用了代码的格式（黑色背景，等宽字体）。貌似 ChatGPT 用的是 Markdown。此外，让我吃惊的是它还补充了一句“这样你就不用担心性能问题了“——它居然猜出了我指明不要循环的意图。

那么数据分析报告呢？我把去年给学生留的作业拿来试了一下：

我：Write an article for analyzing the R built-in dataset "state.x77" with multiple linear regression about a state's murder. Use citations in the article properly.

ChatGPT:

The state.x77 dataset, one of the many built-in datasets included in the R programming environment, is an incredibly helpful resource when analyzing the correlation between different population variables and the prevalence of murders in US states. The dataset contains measurement of several continuous and discrete variables in the 50 US states concerning the year 1977. Using a multiple linear regression model, the effects of various state population attributes on the number of murders can be investigated to attempt to further understand the factors that may contribute to or be relevant to higher murder rates in different states.

The state.x77 dataset contains 16 different variables. Of these, the five considered for this analysis were population (‘Pop’), high school graduates (‘HS.Grad’), urban population (‘UrbanPop’), per capita income (‘Income’), and percent of the population below poverty level (‘Poverty’). Each of these variables is a possible contributing factor to a state’s murder rate.

After the five relevant variables were determined, an ordinary least squares multiple linear regression model was used to assess the correlation between the independent variables and the number of murders, the dependent variable. The linear regression yielded a correlation coefficient, or R² value, of 0.91; this indicates a strong correlation between the independent variables and the number of murders.

The linear regression also revealed that all of the independent variables were significant predictors of the number of murders in a state. The highest coefficient was seen for population, at 0.45. This coefficient suggests that, contrary to many people’s expectations, population size may have a greater impact on the number of murders than some of the other variables. In other words, a larger population in a state may be associated with a correspondingly larger number of murders taking place.

High school graduation rate is another contributing factor to the number of murders in a state; the coefficient was determined to be 0.25, meaning that states with higher high school graduation rates also have lower murder rates. This can likely be attributed to the fact that, with higher educational attainment, individuals may have better opportunities for employment, making crime less attractive and attractive.

The other variables in the state.x77 dataset were still seen to have significant effects on the number of murders in a state. The coefficient for urban population was 0.17, indicating that higher urban populations is associated with more murders. Similarly, income and poverty had coefficients of 0.13 and 0.13, respectively, showing that states with lower per capita incomes and higher poverty rates tend to have more murders.

In summary, the multiple linear regression performed using the state.x77 dataset highlights the correlation between different population attributes and the number of murders in US states. Of all the variables considered, population was found to be the most significant factor, meaning that larger population sizes may be associated with more murders in a given state. The other four independent variables, regarding high school graduation rates, urban populations, income, and poverty, were also seen to be significant predictors of the number of murders in a state.

References

Blumentritt, T. L. (1995). US State Datasets [Online]. Available at: http://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html

U.S. Census Bureau. (2016). State and County Quickfacts. Available at: http://www.census.gov/quickfacts/

完了完了，以后不能这样出作业了。

Cloud2016 · 2023年2月13日

dapengde 能帮我问问 chatGPT「地震越来越频繁了吗？」吗？
最近，土耳其发生 7.8 级地震（相当于 400 多颗广岛原子弹的威力），想蹭点热点，我在改一篇稿子投稿：地震越来越频繁了吗？黄湘云，还没什么确切的结论，文章里都是一些探索分析的过程，我想听听 chatGPT 对这个问题有没有什么结论，以及它的结论的依据。

dapengde · 2023年2月13日

Cloud2016

我：地震越来越频繁了吗？最近，土耳其发生 7.8 级地震（相当于 400 多颗广岛原子弹的威力），想蹭点热点，我在改一篇稿子，还没什么确切的结论，文章里都是一些探索分析的过程，我想听听 chatGPT 对这个问题有没有什么结论，以及结论的依据。

ChatGPT：
很抱歉，科学上还没有证据证明地震的频率正在增加。实际上，地震的频率在整个地球上是不均匀的，并且在一段时间内可能会有更多的地震发生，但这并不意味着地震的频率正在增加。地震是由地球内部构造活动导致的，而这些构造活动是由复杂的地质动力学因素控制的。因此，地震的频率和强度是难以预测的。

另外，随着地震监测技术的提高，我们对地震的检测能力也在不断提高，这导致我们更容易发现和记录地震，而不是地震的频率增加。

因此，地震的频率的增加还需要进一步的研究和证据才能得出结论。如果您正在写关于地震的文章，建议您清晰地阐述地震的复杂性和难以预测性，并避免给读者错误的信息。

我：你上面的回答能给出一些参考文献吗？谢谢。

ChatGPT:

当然，我很愿意为您提供一些相关的参考文献：

"Seismicity rate changes following large earthquakes: observations and implications for earthquake triggering" by D. D. O'Connell and L. M. Jones, published in the journal Nature in 1994.

"The Global Seismic Hazard Assessment Program (GSHAP) in the Framework of the Global Seismic Safety Initiative (GSSI)" by A. I. Wenger et al., published in the journal Annals of Geophysics in 2000.

"Earthquake Predictability, Statistics, and Dynamics" by E. R. Kanamori and B. W. Lay, published in the journal Reviews of Geophysics in 2000.

"Global Seismicity: 1900–2010" by W. H. K. Lee and G. W. Lilienfeld, published in the journal Pure and Applied Geophysics in 2012.

这些文献都是关于地震学的权威性文章，提供了有关地震频率和强度的详细信息。希望这些信息对您有所帮助。

我：这些文献都比较老，都是上个世纪的。能否给一些比较新的文献？谢谢。

ChatGPT:

Something went wrong, please try reloading the conversation.

然后就不理我了。我自己没账号，也是借别人的账号问的。

CyrusYip · 2023年2月13日

一开始觉得只是聊天机器人，没太在意。后来看到有人问编程问题，才感觉它挺有用的。

前些天注册了账号，问它Linux使用问题，回复半对半错，没网上说得那么厉害。我认为ChatGPT 厉害之处在于直接给出一个答案，用搜索引擎还要自己看结果。

要是早点出来，大学都可以少写很多无意义的作业。

ChatGPT 的每个答复看起来都是对的，错的看起来也是对的。忽悠能力一流。

CyrusYip · 2023年2月13日

dapengde 我分享了一个可行的注册方法。

https://d.cosx.org/d/423980-guo-nei-yong-hu-zhu-ce-chatgptde-fang-fa-qin-ce-ke-xing

Cloud2016 · 2023年2月13日

dapengde 谢谢。ChatGPT 这个回答，我无力反驳，但是好像也没什么新鲜的内容可以补充到我的文章中来。鉴于我对地球物理什么的一窍不通，我在文中给了些猜测，它的回答肯定了我猜测的正确性，这似乎也是一个作用，无形之中把它当作权威了。

fenguoerbian · 2023年2月13日

CyrusYip 是的，即使是错的回答，它的语气也永远一本正经。

yufree · 2023年2月13日

欢迎投稿讲下应用，另外可以用bing版的，那个会对实时搜索结果进行整合汇总，而且会给出引用链接，不像原生版会编造参考文献。我加入他们等待名单后昨天收到了新版下载链接，用了下感觉谷歌要是动作再慢点可能就真完了，它会替你读搜索引擎排靠前的链接然后汇总成答案，想看细节可以接着追问。例如我让它简述某个领域最近一年的研究进展，答案基本符合我的感受（虽然集中在新闻报道没有看到一些关注度不高但新意不错的工作，但对大同行科普应该是够了），能填维基百科跟前沿之间的信息差。这个东西在我这边更多是摘要、润色与编程工具，但效果已经足够好了。至于说车轱辘话，这种文章写的人与读的人都知道不重要，可能还算不上社会生产力但对有些行业算生产力。

yufree · 2023年2月15日

正好这周云讲堂轮空，楼上各位有没有兴趣这周六晚七点开个腾讯会议聊一下ChatGPT在生产环境里的应用，我们不谈技术与模型及其局限性，只聊自己使用心得及有用的部分。初步计划是先找几位嘉宾一个人5-10分钟聊下体会，半小时后就完全是圆桌会议，所有参会的都可以开麦或屏幕共享来分享。时间可能比较紧，不过倒也不用准备太多，讲不完可以引流到这边帖子继续讨论。不知楼上哪位感兴趣？@dapengde 有空吗？我们明晚前定个名单，3-5人就可以，形式完全自由，也欢迎产业界朋友分享经验。

dapengde · 2023年2月15日

yufree 谢谢邀请！我非常感兴趣，只是得迟些答复——今天下班回家问问娃周末的安排。

yuanfan · 2023年2月15日

CyrusYip 叶寻同学，你用过 chatGPT，三水说的这个你可以去讲一下呀。

只聊自己使用心得及有用的部分

CyrusYip · 2023年2月15日

yuanfan 谢谢邀请。其实我用得不是很多，还在学习中，就不参加了。我还是看看大家怎么用。

yufree 话说到时候有录像吗？

dapengde · 2023年2月16日

yufree 我可以参加！

yufree · 2023年2月16日

CyrusYip 这东西只出来两个月，恐怕除了开发人员大家都在探索咋用，所以无所谓，感兴趣就来聊。
这次会有录像的，到时会放B站。

dapengde 感谢支持！到时我来凑数主持，细节以今晚或明天公众号通知为准。

22640077 · 2023年2月16日

https://hbsp.harvard.edu/inspiring-minds/why-all-our-classes-suddenly-became-ai-classes

这里有一篇文章，作者表达了在教学领域如何使用ChatGPT的一些观点。

Liechi · 2023年2月16日

使用 chatGPT 的经历让我想起一些人对马前足的评价：听起来总是很有道理，直到他谈到你的专业领域。

这是玩笑。chatGPT 给我的感觉是比以上评价的还要好。它的回复质量在一定程度上取决于问题的问法---大而化之的提问大概会迎来套路化的回答；而具体明确的问题一般会得到相对靠谱的回答。

不过，在对其回答没有判断力的情况下使用是危险的。

CyrusYip · 2023年2月16日

Liechi 它的回复质量在一定程度上取决于问题的问法---大而化之的提问大概会迎来套路化的回答；而具体明确的问题一般会得到相对靠谱的回答。

最近看一篇博客印证了你的说法。

https://manateelazycat.github.io/think/2023/02/14/chatgpt.html

客观的说，玩 ChatGPT 本身，如果你的知识储备足够厚，你问的问题足够精确，基本上能够得到帮助的概率是 80%以上。

我这两天问的 20 个编程问题，基本上 18 个最后我去验证都是正确答案。