如何使用Regex提取多个字符串？

提问者：小点点

如何使用Regex提取多个字符串？

>>> import pandas as pd
>>> df = pd.DataFrame({'Sentence':['his is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm', 'I have researched the product KEY_abc_def, and KEY_blt_chm as requested', 'He got the idea from your message KEY_mno_pqr']})
>>> df
                                                Sentence
0       This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm
1  I have researched the product KEY_abc_def, and KEY_blt_chm as requested
2            He got the idea from your message KEY_mno_pqr

我想使用正则表达式将KEY提取到一个没有实际KEY_的新列中。对于那些有超过1个KEY的句子，它们应该用逗号连接。输出应如下：

>>> df
                                                Sentence                               KEY
0      This is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm    abc_def, mno_pqr, blt_chm
1  I have researched the product KEY_abc_def, and KEY_blt_chm as requested          abc_def, blt_chm     
2           He got the idea from your message KEY_mno_pqr                           mno_pqr

我尝试使用此代码，但它不起作用。如有任何建议，将不胜感激。

我目前只使用第一个键的代码，而忽略了其余的。我是新加入regex的，所以任何建议都将不胜感激。

df['KEY']= df.sentence.str.extract("KEY_(\w+)", expand=True)

共1个答案

匿名用户

使用

df['KEY']= df.sentence.str.findall("KEY_(\w+)").str.join(",")

Series.str.findall查找捕获的子字符串的所有出现次数，并且str.join (",")将结果连接到逗号分隔的字符串值中。

熊猫测试：

>>> df['KEY']= df['Sentence'].str.findall("KEY_(\w+)").str.join(",")
>>> df
                                                                   Sentence                      KEY
0  his is the results of my experiments KEY_abc_def KEY_mno_pqr KEY_blt_chm  abc_def,mno_pqr,blt_chm
1   I have researched the product KEY_abc_def, and KEY_blt_chm as requested          abc_def,blt_chm
2                             He got the idea from your message KEY_mno_pqr                  mno_pqr

（注意，如果您不知道：我使用了pd.set\u选项（'display.max\u colwidth'，None）来显示列中的所有数据，请参见如何在从pandas dataframe转换为html时以html显示完整（非截断）数据帧信息？）。

如何使用Regex提取多个字符串？

共1个答案

相关问题

热门标签

如何使用Regex提取多个字符串？

共1个答案

相关问题

热门标签

微信关注