提问者:小点点

使用dplyr按分组汇总百分比的更好方法?


在SO上咨询了类似的问题后,就像在这里,我终于得到了我想要的输出,但我不禁想知道是否有更好的方法到达那里。此外,我想知道是否有一种方法可以使用管道运算符来链接最后一步,从而消除经理和头衔组合的重复。

可重复的例子:

library(dplyr)

# Sample data frame
employee = LETTERS[1:18]
manager  = c(rep("Tom", 3), rep("Sue", 4), rep("Mike", 4), rep("Jack", 7))  
title    = c(rep("Entry", 2), rep("Mid", 3), rep("Junior", 7), rep("Senior", 6))

mydata <- data.frame(employee, manager, title)

# Code gives me output I want, but wondering if there is a better way
org2 <-  mydata %>%
  group_by(manager, title) %>%
  mutate(title_count = n()) %>%  # Total number of people with given title by manager
  ungroup() %>%
  group_by(manager) %>%          # Total number of people in manager's group
  mutate(mgr_total = n()) %>%
  group_by(title, add = TRUE) %>%
  mutate(title_pctg = round(title_count/mgr_total*100, 1)) %>%  # Percent of people with given title by manager
  select(-employee)

# Remove duplicates of manager and title to summarize data wanted
org2 <- org2[!duplicated(org2[2:4]), ]

arrange(org2, manager, title)

# A tibble: 7 x 5
# Groups:   manager, title [7]
#  manager  title title_count mgr_total title_pctg
#   <fctr> <fctr>       <int>     <int>      <dbl>
#1    Jack Junior           1         7       14.3
#2    Jack Senior           6         7       85.7
#3    Mike Junior           4         4      100.0
#4     Sue Junior           2         4       50.0
#5     Sue    Mid           2         4       50.0
#6     Tom  Entry           2         3       66.7
#7     Tom    Mid           1         3       33.3

提前感谢您的想法和帮助。


共1个答案

匿名用户

您可以通过切换group_by的顺序将其简化如下(即首先按manager分组,然后按manger title分组,而不是另一种方式);

mydata %>% 
    group_by(manager) %>% 
    mutate(mgr_count = n()) %>% 
    group_by(title, mgr_count, add=TRUE) %>% 
    summarise(
        title_count = n(), 
        title_pctg = round(title_count / first(mgr_count) * 100, 1)
    )

# A tibble: 7 x 5
# Groups:   manager, title [?]
#  manager  title mgr_count title_count title_pctg
#   <fctr> <fctr>     <int>       <int>      <dbl>
#1    Jack Junior         7           1       14.3
#2    Jack Senior         7           6       85.7
#3    Mike Junior         4           4      100.0
#4     Sue Junior         4           2       50.0
#5     Sue    Mid         4           2       50.0
#6     Tom  Entry         3           2       66.7
#7     Tom    Mid         3           1       33.3