基于文本列表创建新列

提问者：小点点

基于文本列表创建新列

例如，我有一个关于体育的列表:

sports = ["basketball", "football", "baseball"]

和一个带有一些句子的一列数据帧，

column_1
df
My favourite sport is football
I love to play basketball
Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal

我想阅读列表，以便根据列中是否包含这些单词创建第二列。见下文

df                                                    other
My favourite sport is football                        football
I love to play basketball                             basketball
Football is a family of team sports that involve..    football

我不想使用if语句，因为我的列表包含几乎50个不同的单词。谢谢。

匿名用户

尝试此操作，str.extract

import re

sports = ["basketball", "football", "baseball"]

extract_ = re.compile("(%s)" % "|".join(sports), re.IGNORECASE)
df['extract'] = df.column_1.str.extract("(%s)" % "|".join(sports))

0    football
1  basketball
2    Football

匿名用户

df = pd.DataFrame()

df['column_1'] = ['My favourite sport is football', 'I love to play basketball', 'Football is a family of team sports that involve, to varying degrees, kicking a ball to score a goal']

sports = ["basketball", "football", "baseball"]

list_output = []

for i in range(len(df)):
    
    sentence = df['column_1'].iloc[i]
    for s in sports:
        if s.lower() in sentence.lower(): #s.lower is to avoid missing entries because they're upper case. So I'm comparing then all as lower case
            list_output.append(s)
    
df['sport'] = list_output

匿名用户

用这个。这直截了当，通俗易懂--

df['other'] = df['column1'].apply(lambda x: list(set(x.lower().split()).intersection(set(sports)))[0])

这将应用一个函数，该函数首先将句子降格，然后将其拆分为单词
然后它需要句子中的单词集和体育列表中的单词集的交集。
如果每个句子可以有多个运动项目，则删除末尾的[0]以获得运动项目列表

    column1                         other
0   My favourite sport is football  football
1   I love to play basketball       basketball
2   Football is a family of t...    football

基于文本列表创建新列

共3个答案

相关问题

热门标签

基于文本列表创建新列

共3个答案

相关问题

热门标签

微信关注