在我对 温布尔登数据 的持续探索中,我想弄清楚一名球员是否按照他们的种子排名所建议的那样完成了比赛。
因此,我想弄清楚他们达到的轮次与预期达到的轮次之间的差异。数据集中的“回合”是一个有序的因子变量。
这些是所有可能的值:
rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner")
如果我们想将几个字符串分解为这个因子,我们会这样做:
round = factor("Finals", levels = rounds, ordered = TRUE) expected = factor("Winner", levels = rounds, ordered = TRUE) > round [1] Finals 9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner > expected [1] Winner 9 Levels: Did not enter < Round of 128 < Round of 64 < Round of 32 < Round of 16 < Quarter-Finals < ... < Winner
在这种情况下,实际轮次和预期轮次之间的差值应该是 -1——该球员本应赢得比赛,但在决赛中输了。我们可以通过
对每个变量调用
unclass
函数
来计算差异:
> unclass(round) - unclass(expected) [1] -1 attr(,"levels") [1] "Did not enter" "Round of 128" "Round of 64" "Round of 32" "Round of 16" "Quarter-Finals" [7] "Semi-Finals" "Finals" "Winner"
这似乎仍然有 factor 变量的一些残余,所以为了摆脱它,我们可以将它转换为一个数值:
> as.numeric(unclass(round) - unclass(expected)) [1] -1
就是这样!我们现在可以将此计算应用于所有种子,看看它们是如何发展的。