例如,我有一个关于体育的列表:
sports = ["basketball", "football", "baseball"]
和一个带有一些句子的一列数据帧,
column_1
df
My favourite sport is football
I love to play basketball
Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal
我想阅读列表,以便根据列中是否包含这些单词创建第二列。 见下文
df other
My favourite sport is football football
I love to play basketball basketball
Football is a family of team sports that involve.. football
我不想使用if语句,因为我的列表包含几乎50个不同的单词。 谢谢。
尝试此操作,str.extract
import re
sports = ["basketball", "football", "baseball"]
extract_ = re.compile("(%s)" % "|".join(sports), re.IGNORECASE)
df['extract'] = df.column_1.str.extract("(%s)" % "|".join(sports))
0 football
1 basketball
2 Football
df = pd.DataFrame()
df['column_1'] = ['My favourite sport is football', 'I love to play basketball', 'Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal']
sports = ["basketball", "football", "baseball"]
list_output = []
for i in range(len(df)):
sentence = df['column_1'].iloc[i]
for s in sports:
if s.lower() in sentence.lower(): #s.lower is to avoid missing entries because they're upper case. So I'm comparing then all as lower case
list_output.append(s)
df['sport'] = list_output
用这个。 这直截了当,通俗易懂--
df['other'] = df['column1'].apply(lambda x: list(set(x.lower().split()).intersection(set(sports)))[0])
[0]
以获得运动项目列表 column1 other
0 My favourite sport is football football
1 I love to play basketball basketball
2 Football is a family of t... football