考虑这个数据帧。
question <- data.frame("Product" = c("P001", "P001", "P001", "P002", "P002", "P002"),
"Activity" = c("sawing", "planning", "opening", "sawing", "planning", "opening"),
"Employee" = c("Tom", "Bert", "Louisa", "Bert", "Louisa", "Louisa"))
我想在一个新表中做3个总结:
这是我无法理解的最后一个总结,因为我需要一个不同的“group_by”。到目前为止,我所拥有的是:
days <- question %>%
group_by(Activity) %>%
summarise("Number of activities" = n(),
"Number of employees" = length(unique(Employee)))
例如,作为最后一个总结的结果,我想得到:
如何完成最后一步/列?
一种选择是单独计算每个产品的员工人数,并在计算最终摘要之前将其加入原始数据集:
summary1 <- question %>%
group_by(Activity, Product) %>%
summarize(n_employees_per_prod_act = n(), .groups = "drop")
question %>%
left_join(summary1, by = c("Activity", "Product")) %>%
group_by(Activity) %>%
summarise("Number of activities" = n(),
"Number of employees" = length(unique(Employee)),
"Average number of employees per activity per product" = mean(n_employees_per_prod_act))
# A tibble: 3 x 4
Activity `Number of activ~ `Number of emplo~ `Average number of~
<chr> <int> <int> <dbl>
1 opening 2 1 1
2 planning 2 2 1
3 sawing 2 2 1
这是我的尝试。请注意,我在预期输出中将活动数量
更改为产品数量
。
question %>%
nest_by(Activity, Product) |>
transmute(uempl = unique(data$Employee),
uempp = length(uempl)) |>
group_by(Activity) |>
summarize("Number of employees" = length(unique(uempl)),
"Average number of employees per activity per product" = mean(uempp),
"Number of products" = n())
+ # A tibble: 3 × 4
Activity `Number of employees` `Average number of employees…` `Number of pro…`
<chr> <int> <dbl> <int>
1 opening 1 1 2
2 planning 2 1 2
3 sawing 2 1 2