提问者:小点点

将列表与数据帧匹配


我有一个以年龄为列的人的数据框架。 我想把这个年龄匹配成一个群体,即婴儿=0-2岁,儿童=3-12岁,少年=13-18岁,青年=19-30岁,成年=31-50岁,老年=51-65岁。

我创建了定义这些年份组的列表,例如adult=list(range(31,51))等。我如何通过创建一个新列将列表“成人”的名称与数据帧匹配?

小输入:数据框架由三列组成:df['name'],df['country'],df['age']。

Name    Country  Age
Anthony France   15
Albert  Belgium  54
.
.
.
Zahra   Tunisia  14

所以我需要将年龄列与我已经有的列表相匹配。 输出应如下所示:

Name    Country  Age  Group
Anthony France   15   Young
Albert  Belgium  54   Adult
.
.
.
Zahra   Tunisia  14   Young

谢啦!


共3个答案

匿名用户

下面是一种使用pd.cut实现此目的的方法:

df = pd.DataFrame({"person_id": range(25), "age": np.random.randint(0, 100, 25)})
print(df.head(10))
==>
   person_id  age
0          0   30
1          1   42
2          2   78
3          3    2
4          4   44
5          5   43
6          6   92
7          7    3
8          8   13
9          9   76

df["group"] = pd.cut(df.age, [0, 18, 50, 100], labels=["child", "adult", "senior"])
print(df.head(10))
==>
   person_id  age   group
0          0   30   adult
1          1   42   adult
2          2   78  senior
3          3    2   child
4          4   44   adult
5          5   43   adult
6          6   92  senior
7          7    3   child
8          8   13   child
9          9   76  senior

匿名用户

IIUC我会选择np.select:

import pandas as pd
import numpy as np
df = pd.DataFrame({'Age': [3, 20, 40]})
condlist = [df.Age.between(0,2),
            df.Age.between(3,12),
            df.Age.between(13,18),
            df.Age.between(19,30),
            df.Age.between(31,50),
            df.Age.between(51,65)]

choicelist = ['Baby', 'Child', 'Young',
           'Young Adult', 'Adult', 'Senior Adult']

df['Adult'] = np.select(condlist, choicelist)

输出:

   Age        Adult
0    3        Child
1   20  Young Adult
2   40        Adult

匿名用户

为了让初学者更清楚,您可以定义一个函数,它将相应地返回每个人的年龄组,然后使用pandas.apply()将该函数应用到我们的'group'列:

import pandas as pd

def age(row):
    a = row['Age']
    if 0 < a <= 2:
        return 'Baby'
    elif 2 < a <= 12:
        return 'Child'
    elif 12 < a <= 18:
        return 'Young'
    elif 18 < a <= 30:
        return 'Young Adult'
    elif 30 < a <= 50:
        return 'Adult'
    elif 50 < a <= 65:
        return 'Senior Adult'

df = pd.DataFrame({'Name':['Anthony','Albert','Zahra'],
                   'Country':['France','Belgium','Tunisia'],
                   'Age':[15,54,14]})

df['Group'] = df.apply(age, axis=1)

print(df)

输出:

      Name  Country  Age         Group
0  Anthony   France   15         Young
1   Albert  Belgium   54  Senior Adult
2    Zahra  Tunisia   14         Young