基本上,我有一个数据框,df
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway8 A G NA NA E
Pathway9 A G Z H F
Pathway6 A G Z H E
Pathway2 A G D NA F
Pathway5 A G D NA E
Pathway1 A D K NA F
Pathway7 A B C D F
Pathway4 A B C D E
现在我想合并这些行,如下所示:
newdf
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F, E
Pathway9 A G Z H F, E
Pathway2 A G D NA F, E
Pathway1 A D K NA F
Pathway4 A B C D F, E
这是我过去提出的问题(合并数据框中的重复行)的延续。这适用于此数据集,但对于我更大的数据集,它似乎无法组合值。例如,输出的前几行(在我修改了@Matt Jewett给出的代码或使用了Concatenate string by group with dplyr中提供的解释之后):
Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway1 Smoothened Gl-1 Osteopontin
Pathway2 Smoothened Gl-1 BMP2 Osteopontin
Pathway3 Smoothened Gl-1 BMP2 DLX5
Pathway4 Smoothened Gl-1 BMP2 Osteopontin
如您所见,有几个问题。首先,Biomarker1列似乎没有聚合。其次,有几行重复。我在解决方案方面遇到了障碍,所以你们能想到的任何解决方案都将不胜感激!
非常感谢你的帮助!
使用data. table
足够简单
library(data.table)
dat <- fread("Pathway Beginning1 Protein2 Protein3 Protein4 Biomarker1
Pathway3 A G NA NA F
Pathway8 A G NA NA E
Pathway9 A G Z H F
Pathway6 A G Z H E
Pathway2 A G D NA F
Pathway5 A G D NA E
Pathway1 A D K NA F
Pathway7 A B C D F
Pathway4 A B C D E")
dat_collapse <- dat[, .(Pathway = Pathway[1],
Biomarker1 = paste0(Biomarker1, collapse = ", ")),
by = .(Beginning1, Protein2, Protein3, Protein4)]
setcolorder(dat_collapse, names(dat))
dat_collapse
结果在:
Pathway Beginning1 Protein2 Protein3 Protein4 Biomarker1
1: Pathway3 A G NA NA F, E
2: Pathway9 A G Z H F, E
3: Pathway2 A G D NA F, E
4: Pathway1 A D K NA F
5: Pathway7 A B C D F, E