所以我有一个像这样的df
In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])
Out[1]:
Group Name Performance
0 A Sheldon Webb 33
1 A Traci Dean 64
2 A Chad Webster 142
3 A Ora Harmon 116
4 A Elijah Mendoza 122
5 A June Strickland 68
6 B Beth Vasquez 95
7 B Betty Sutton 127
8 B Joel Gill 132
9 B Vernon Stone 80
我想以这样一种交替的方式对它进行排序,在一个组中,比如“a”组,第一行应该有表现最高的人(在本例中是“Chad Webster”),然后在第二行中表现最低的人(即“Sheldon Webb”),我要寻找的输出如下所示
Out[2]:
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95
您可以看到序列在一组中的最高和最低之间交替。
让我们尝试用groupby().transform()
检测min,max
行,然后排序:
groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')
(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
.sort_values(['Group','temp','Performance'],
ascending=[True, False, False])
.drop('temp', axis=1)
)
输出:
Group Name Performance
2 A Chad Webster 142
0 A Sheldon Webb 33
4 A Elijah Mendoza 122
3 A Ora Harmon 116
5 A June Strickland 68
1 A Traci Dean 64
8 B Joel Gill 132
9 B Vernon Stone 80
7 B Betty Sutton 127
6 B Beth Vasquez 95
尝试:
df['rank'] = df.groupby(['Group'])["Performance"].rank("dense", ascending=False)
df.sort_values(['Group','rank'])
结果:
Group Name Performance rank
2 A Chad Webster 142 1.0
4 A Elijah Mendoza 122 2.0
3 A Ora Harmon 116 3.0
5 A June Strickland 68 4.0
1 A Traci Dean 64 5.0
0 A Sheldon Webb 33 6.0
8 B Joel Gill 132 1.0
7 B Betty Sutton 127 2.0
6 B Beth Vasquez 95 3.0
9 B Vernon Stone 80 4.0
首先创建每个组的排名,然后根据计算出的排名对数据帧进行排序。
嗯。。。
>>> df.groupby('Group')[df.columns[1:]]
.apply(lambda x:
pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
x.nsmallest(x.shape[0] - x.shape[0]//2, 'Performance')
.reset_index()])
.sort_index()
.drop('index',1))
.reset_index().drop('level_1',1)
Group Name Performance
0 A Chad Webster 142
1 A Sheldon Webb 33
2 A Elijah Mendoza 122
3 A Traci Dean 64
4 A Ora Harmon 116
5 A June Strickland 68
6 B Joel Gill 132
7 B Vernon Stone 80
8 B Betty Sutton 127
9 B Beth Vasquez 95