我有一个以年龄为列的人的数据框架。 我想把这个年龄匹配成一个群体,即婴儿=0-2岁,儿童=3-12岁,少年=13-18岁,青年=19-30岁,成年=31-50岁,老年=51-65岁。
我创建了定义这些年份组的列表,例如adult=list(range(31,51))
等。我如何通过创建一个新列将列表“成人”的名称与数据帧匹配?
小输入:数据框架由三列组成:df['name'],df['country'],df['age']。
Name Country Age
Anthony France 15
Albert Belgium 54
.
.
.
Zahra Tunisia 14
所以我需要将年龄列与我已经有的列表相匹配。 输出应如下所示:
Name Country Age Group
Anthony France 15 Young
Albert Belgium 54 Adult
.
.
.
Zahra Tunisia 14 Young
谢啦!
下面是一种使用pd.cut
实现此目的的方法:
df = pd.DataFrame({"person_id": range(25), "age": np.random.randint(0, 100, 25)})
print(df.head(10))
==>
person_id age
0 0 30
1 1 42
2 2 78
3 3 2
4 4 44
5 5 43
6 6 92
7 7 3
8 8 13
9 9 76
df["group"] = pd.cut(df.age, [0, 18, 50, 100], labels=["child", "adult", "senior"])
print(df.head(10))
==>
person_id age group
0 0 30 adult
1 1 42 adult
2 2 78 senior
3 3 2 child
4 4 44 adult
5 5 43 adult
6 6 92 senior
7 7 3 child
8 8 13 child
9 9 76 senior
IIUC我会选择np.select
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Age': [3, 20, 40]})
condlist = [df.Age.between(0,2),
df.Age.between(3,12),
df.Age.between(13,18),
df.Age.between(19,30),
df.Age.between(31,50),
df.Age.between(51,65)]
choicelist = ['Baby', 'Child', 'Young',
'Young Adult', 'Adult', 'Senior Adult']
df['Adult'] = np.select(condlist, choicelist)
输出:
Age Adult
0 3 Child
1 20 Young Adult
2 40 Adult
为了让初学者更清楚,您可以定义一个函数,它将相应地返回每个人的年龄组,然后使用pandas.apply()
将该函数应用到我们的'group'
列:
import pandas as pd
def age(row):
a = row['Age']
if 0 < a <= 2:
return 'Baby'
elif 2 < a <= 12:
return 'Child'
elif 12 < a <= 18:
return 'Young'
elif 18 < a <= 30:
return 'Young Adult'
elif 30 < a <= 50:
return 'Adult'
elif 50 < a <= 65:
return 'Senior Adult'
df = pd.DataFrame({'Name':['Anthony','Albert','Zahra'],
'Country':['France','Belgium','Tunisia'],
'Age':[15,54,14]})
df['Group'] = df.apply(age, axis=1)
print(df)
输出:
Name Country Age Group
0 Anthony France 15 Young
1 Albert Belgium 54 Senior Adult
2 Zahra Tunisia 14 Young