提问者:小点点

合并2个数据框以返回所有匹配测量名称的行


我有2个这样的数据框

ID <- c("A","B","C")
Type <- c("PASS","PASS","FAIL")
Measurement <- c("Length","Height","Breadth")
Function <- c("Volume","Area","Circumference")
df1 <- data.frame(ID,Type,Measurement,Function)

ID <- c("A","B","C","C")
Type <- c("PASS","PASS","FAIL","FAIL")
Measurement <- c("Length","Height","Breadth","Breadth_DSPT")
df2 <- data.frame(ID,Type,Measurement)

我试图合并这两个数据帧,使其返回匹配的测量值,并返回由另一个字符串连接的匹配测量值的行。

我想要的输出是

  ID Type  Measurement      Function
   A PASS       Length        Volume
   B PASS       Height          Area
   C FAIL      Breadth Circumference
   C FAIL Breadth_DSPT Circumference

我使用这样的merge函数来获取前3行,但是我们如何匹配数据帧中的度量名称以返回所有匹配的行?

df <- merge(df1,df2,by=c("ID","Type","Measurement"),all.x=T)

共3个答案

匿名用户

实现它的一种方法是使用< code>sqldf包:

library(sqldf)

sqldf("select df1.ID, df1.Type, df2.Measurement, df1.Function
      from df1 left join df2 on (df1.ID = df2.ID and 
                                 df1.Type = df2.Type and 
                                 df2.Measurement like df1.Measurement||'%')")

#   ID Type  Measurement      Function
# 1  A PASS       Length        Volume
# 2  B PASS       Height          Area
# 3  C FAIL      Breadth Circumference
# 4  C FAIL Breadth_DSPT Circumference

连接中的最后一个子句(df2。像df1一样的测量。测量||'%')意味着df2$测量必须等于df1$测量后跟任何字符串,但是您可以使用SQL的%_指定更灵活的条件。

匿名用户

如果你只是在字符串的末尾有一个连接,你可以做这样的事情:

merge(
  transform(df2, tmpmeas = sub("_.+$", "", Measurement)),
  df1,
  by.x=c("ID","Type","tmpmeas"), by.y=c("ID","Type","Measurement")
)[-3]
#  ID Type  Measurement      Function
#1  A PASS       Length        Volume
#2  B PASS       Height          Area
#3  C FAIL      Breadth Circumference
#4  C FAIL Breadth_DSPT Circumference

匿名用户

您可以使用<code>数据。table库来实现这一点。首先将dataframe转换为data表,使用setkey为每个表设置键,然后merge

dt1 <- data.table(df1)
dt2 <- data.table(df2)
setkey(dt1,ID)
setkey(dt2,ID)
merge(dt1,dt2)

#    ID Type.x Measurement.x      Function Type.y Measurement.y
# 1:  A   PASS        Length        Volume   PASS        Length
# 2:  B   PASS        Height          Area   PASS        Height
# 3:  C   FAIL       Breadth Circumference   FAIL       Breadth
# 4:  C   FAIL       Breadth Circumference   FAIL  Breadth_DSPT