我有2个这样的数据框
ID <- c("A","B","C")
Type <- c("PASS","PASS","FAIL")
Measurement <- c("Length","Height","Breadth")
Function <- c("Volume","Area","Circumference")
df1 <- data.frame(ID,Type,Measurement,Function)
ID <- c("A","B","C","C")
Type <- c("PASS","PASS","FAIL","FAIL")
Measurement <- c("Length","Height","Breadth","Breadth_DSPT")
df2 <- data.frame(ID,Type,Measurement)
我试图合并这两个数据帧,使其返回匹配的测量值,并返回由另一个字符串连接的匹配测量值的行。
我想要的输出是
ID Type Measurement Function
A PASS Length Volume
B PASS Height Area
C FAIL Breadth Circumference
C FAIL Breadth_DSPT Circumference
我使用这样的merge函数来获取前3行,但是我们如何匹配数据帧中的度量名称以返回所有匹配的行?
df <- merge(df1,df2,by=c("ID","Type","Measurement"),all.x=T)
实现它的一种方法是使用< code>sqldf包:
library(sqldf)
sqldf("select df1.ID, df1.Type, df2.Measurement, df1.Function
from df1 left join df2 on (df1.ID = df2.ID and
df1.Type = df2.Type and
df2.Measurement like df1.Measurement||'%')")
# ID Type Measurement Function
# 1 A PASS Length Volume
# 2 B PASS Height Area
# 3 C FAIL Breadth Circumference
# 4 C FAIL Breadth_DSPT Circumference
连接中的最后一个子句(df2。像df1一样的测量。测量||'%'代码>)意味着
df2$测量
必须等于df1$测量
后跟任何字符串,但是您可以使用SQL的%
和_
指定更灵活的条件。
如果你只是在字符串的末尾有一个连接,你可以做这样的事情:
merge(
transform(df2, tmpmeas = sub("_.+$", "", Measurement)),
df1,
by.x=c("ID","Type","tmpmeas"), by.y=c("ID","Type","Measurement")
)[-3]
# ID Type Measurement Function
#1 A PASS Length Volume
#2 B PASS Height Area
#3 C FAIL Breadth Circumference
#4 C FAIL Breadth_DSPT Circumference
您可以使用<code>数据。table库来实现这一点。首先将dataframe
转换为data表
,使用setkey
为每个表设置键,然后merge
。
dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,ID)
setkey(dt2,ID)
merge(dt1,dt2)
# ID Type.x Measurement.x Function Type.y Measurement.y
# 1: A PASS Length Volume PASS Length
# 2: B PASS Height Area PASS Height
# 3: C FAIL Breadth Circumference FAIL Breadth
# 4: C FAIL Breadth Circumference FAIL Breadth_DSPT