R: dplyr - 错误:无法修改分组变量

一则或许对你有用的小广告

欢迎加入小哈的星球 ,你将获得:专属的项目实战 / 1v1 提问 / Java 学习路线 / 学习打卡 / 每月赠书 / 社群讨论

  • 新项目:《从零手撸:仿小红书(微服务架构)》 正在持续爆肝中,基于 Spring Cloud Alibaba + Spring Boot 3.x + JDK 17...点击查看项目介绍 ;
  • 《从零手撸:前后端分离博客项目(全栈开发)》 2 期已完结,演示链接: http://116.62.199.48/ ;

截止目前, 星球 内专栏累计输出 63w+ 字,讲解图 2808+ 张,还在持续爆肝中.. 后续还会上新更多项目,目标是将 Java 领域典型的项目都整一波,如秒杀系统, 在线商城, IM 即时通讯,权限管理,Spring Cloud Alibaba 微服务等等,已有 2200+ 小伙伴加入学习 ,欢迎点击围观

我一直在探索这个博客上发表的帖子,我想我应该从回答一个简单的问题开始——我在哪些日期写的帖子最多?

我从一个包含每篇文章及其发布日期的数据框开始:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

要找到博客文章最流行的日子,我们可以编写以下聚合函数:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

所以我们可以看到几天有 6 个帖子,几天有 5 个帖子,几天有 4 个帖子,然后大概有几天有 1 个帖子。

我认为如果我们可以在博客上绘制一个直方图,在 x 轴上显示帖子数量,在 y 轴上显示该帖子数发生了多少天,例如对于 x 值为 6(帖子)我们会有ay 值为 2(出现次数)。

我最初的尝试是这样的:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

不幸的是,这是不允许的。我尝试取消分组然后再次计数:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

仍然没有运气。我在谷歌上搜索了一下,发现了一篇建议 结合使用 group_by + mutate 或 group_by + summarize 的 帖子。

我首先尝试了 mutate 方法:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

这保留了“标题”,这有点烦人。如果需要,我们可以在“day”上使用不同的方式来摆脱它,如果我们还实现了函数的第二部分,我们将得到以下结果:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

恼人的是,我们仍然有“标题”、“日期”和“日期”列,我们需要通过调用“选择”来摆脱这些列。代码也让人感觉很恶心,尤其是在几个地方使用了 distinct。

事实上,如果我们使用 summarize 而不是 mutate,我们可以简化代码:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

而且我们还摆脱了讨价还价中的额外列,这太棒了!现在我们可以绘制直方图了:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33


在这种情况下,我们实际上不需要进行第二次分组来创建条形图,因为如果我们向它提供以下数据,ggplot 会为我们做这件事:


 > library(dplyr)
> df %>% sample_n(5)
                                                title                date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158     Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331           Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035       msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181                The danger of commenting out code 2009-01-17 06:02:33

不过,很高兴知道怎么做!

相关文章