我试图创建一个函数,根据每个分子有多少个唯一的PATIENT_ID,按降序返回x个最大的分子。从某个日期到最后一个日期。
data <- data.frame(PATIENT_ID = c(1,1,2,2), dateM = c(ymd("2020-01-05","2020-01-06","2020-05-06","2019-12-15")), MOLECULES = c("mol1", "mol1", "mol1", "mol2"))
topx <- function(data, datefrom, var , x = 5){
data %>%
subset(dateM >= datefrom) %>%
group_by(var) %>%
summarize(pat = length(unique(PATIENT_ID))) %>%
arrange(-pat) %>%
head(x) %>%
select(1)
}
topx(data = data, datefrom = "2016-04", var = MOLECULES, x = 2)
在这种情况下,想要的结果将是:
c("mol1","mol2")
但是,它将var作为文本,并且不会解析其中的MOLECULES并告诉我这一点。
Error: Must group by variables found in `.data`.
* Column `var` is not found.
很酷的功能。使用dplyr
编程时有特殊的规则和操作。在这里查看更多。具体来说,您需要{{}}
运算符。
library(tidyverse)
library(lubridate)
#>
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#>
#> date, intersect, setdiff, union
data <- data.frame(PATIENT_ID = c(1,1,2,2), dateM = c(ymd("2020-01-05","2020-01-06","2020-05-06","2019-12-15")), MOLECULES = c("mol1", "mol1", "mol1", "mol2"))
topx <- function(data, datefrom, var , x = 5){
data %>%
subset(dateM >= datefrom) %>%
group_by({{var}}) %>%
summarize(pat = length(unique(PATIENT_ID))) %>%
arrange(-pat) %>%
head(x) %>%
select(1)
}
topx(data = data, datefrom = "2016-04-01", var = MOLECULES, x = 2)
#> `summarise()` ungrouping output (override with `.groups` argument)
#> # A tibble: 2 x 1
#> MOLECULES
#> <chr>
#> 1 mol1
#> 2 mol2
由reprex包(v0.3.0)于2021-01-14创建
我认为这是一个准引用问题。!!
对表达式进行一对一的求值。有关详细信息,请参阅https://adv-r.hadley.nz/quasiquotation.html
尝试:
topx <- function(data, datefrom, var , x = 5){
var <- enquo(var)
data %>%
subset(dateM >= datefrom) %>%
group_by(!!var) %>%
summarize(pat = length(unique(PATIENT_ID))) %>%
arrange(-pat) %>%
head(x) %>%
select(1)
}