在SO上咨询了类似的问题后,就像在这里,我终于得到了我想要的输出,但我不禁想知道是否有更好的方法到达那里。此外,我想知道是否有一种方法可以使用管道运算符来链接最后一步,从而消除经理和头衔组合的重复。
可重复的例子:
library(dplyr)
# Sample data frame
employee = LETTERS[1:18]
manager = c(rep("Tom", 3), rep("Sue", 4), rep("Mike", 4), rep("Jack", 7))
title = c(rep("Entry", 2), rep("Mid", 3), rep("Junior", 7), rep("Senior", 6))
mydata <- data.frame(employee, manager, title)
# Code gives me output I want, but wondering if there is a better way
org2 <- mydata %>%
group_by(manager, title) %>%
mutate(title_count = n()) %>% # Total number of people with given title by manager
ungroup() %>%
group_by(manager) %>% # Total number of people in manager's group
mutate(mgr_total = n()) %>%
group_by(title, add = TRUE) %>%
mutate(title_pctg = round(title_count/mgr_total*100, 1)) %>% # Percent of people with given title by manager
select(-employee)
# Remove duplicates of manager and title to summarize data wanted
org2 <- org2[!duplicated(org2[2:4]), ]
arrange(org2, manager, title)
# A tibble: 7 x 5
# Groups: manager, title [7]
# manager title title_count mgr_total title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 1 7 14.3
#2 Jack Senior 6 7 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 2 4 50.0
#5 Sue Mid 2 4 50.0
#6 Tom Entry 2 3 66.7
#7 Tom Mid 1 3 33.3
提前感谢您的想法和帮助。
您可以通过切换group_by
的顺序将其简化如下(即首先按manager
分组,然后按manger title
分组,而不是另一种方式);
mydata %>%
group_by(manager) %>%
mutate(mgr_count = n()) %>%
group_by(title, mgr_count, add=TRUE) %>%
summarise(
title_count = n(),
title_pctg = round(title_count / first(mgr_count) * 100, 1)
)
# A tibble: 7 x 5
# Groups: manager, title [?]
# manager title mgr_count title_count title_pctg
# <fctr> <fctr> <int> <int> <dbl>
#1 Jack Junior 7 1 14.3
#2 Jack Senior 7 6 85.7
#3 Mike Junior 4 4 100.0
#4 Sue Junior 4 2 50.0
#5 Sue Mid 4 2 50.0
#6 Tom Entry 3 2 66.7
#7 Tom Mid 3 1 33.3