我一直在探索这个博客上发表的帖子,我想我应该从回答一个简单的问题开始——我在哪些日期写的帖子最多?
我从一个包含每篇文章及其发布日期的数据框开始:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
要找到博客文章最流行的日子,我们可以编写以下聚合函数:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
所以我们可以看到几天有 6 个帖子,几天有 5 个帖子,几天有 4 个帖子,然后大概有几天有 1 个帖子。
我认为如果我们可以在博客上绘制一个直方图,在 x 轴上显示帖子数量,在 y 轴上显示该帖子数发生了多少天,例如对于 x 值为 6(帖子)我们会有ay 值为 2(出现次数)。
我最初的尝试是这样的:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
不幸的是,这是不允许的。我尝试取消分组然后再次计数:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
仍然没有运气。我在谷歌上搜索了一下,发现了一篇建议 结合使用 group_by + mutate 或 group_by + summarize 的 帖子。
我首先尝试了 mutate 方法:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
这保留了“标题”,这有点烦人。如果需要,我们可以在“day”上使用不同的方式来摆脱它,如果我们还实现了函数的第二部分,我们将得到以下结果:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
恼人的是,我们仍然有“标题”、“日期”和“日期”列,我们需要通过调用“选择”来摆脱这些列。代码也让人感觉很恶心,尤其是在几个地方使用了 distinct。
事实上,如果我们使用 summarize 而不是 mutate,我们可以简化代码:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
而且我们还摆脱了讨价还价中的额外列,这太棒了!现在我们可以绘制直方图了:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
在这种情况下,我们实际上不需要进行第二次分组来创建条形图,因为如果我们向它提供以下数据,ggplot 会为我们做这件事:
> library(dplyr)
> df %>% sample_n(5)
title date
1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48
158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15
331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30
1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03
1181 The danger of commenting out code 2009-01-17 06:02:33
不过,很高兴知道怎么做!