提问者:小点点

如何对一个组进行排序,我在第一行得到最大的数字,在第二行得到最小的数字,在第三行得到第二大的数字,依此类推


所以我有一个像这样的df

In [1]:data= {'Group': ['A','A','A','A','A','A','B','B','B','B'],
    'Name': [ ' Sheldon Webb',' Traci Dean',' Chad Webster',' Ora Harmon',' Elijah Mendoza',' June Strickland',' Beth Vasquez',' Betty Sutton',' Joel Gill',' Vernon Stone'],
    'Performance':[33,64,142,116,122,68,95,127,132,80]}
In [2]:df = pd.DataFrame(data, columns = ['Group', 'Name','Performance'])

Out[1]:
    Group   Name    Performance
0   A   Sheldon Webb    33
1   A   Traci Dean      64
2   A   Chad Webster    142
3   A   Ora Harmon      116
4   A   Elijah Mendoza  122
5   A   June Strickland 68
6   B   Beth Vasquez    95
7   B   Betty Sutton    127
8   B   Joel Gill       132
9   B   Vernon Stone    80

我想以这样一种交替的方式对它进行排序,在一个组中,比如“a”组,第一行应该有表现最高的人(在本例中是“Chad Webster”),然后在第二行中表现最低的人(即“Sheldon Webb”),我要寻找的输出如下所示

Out[2]:
    Group   Name    Performance
0   A   Chad Webster    142
1   A   Sheldon Webb    33
2   A   Elijah Mendoza  122
3   A   Traci Dean      64
4   A   Ora Harmon      116
5   A   June Strickland 68
6   B   Joel Gill       132
7   B   Vernon Stone    80
8   B   Betty Sutton    127
9   B   Beth Vasquez    95

您可以看到序列在一组中的最高和最低之间交替。


共3个答案

匿名用户

让我们尝试用groupby().transform()检测min,max行,然后排序:

groups = df.groupby('Group')['Performance']
mins, maxs = groups.transform('min'), groups.transform('max')

(df.assign(temp=df['Performance'].eq(mins) | df['Performance'].eq(maxs))
   .sort_values(['Group','temp','Performance'],
                ascending=[True, False, False])
   .drop('temp', axis=1)
)

输出:

  Group              Name  Performance
2     A      Chad Webster          142
0     A      Sheldon Webb           33
4     A    Elijah Mendoza          122
3     A        Ora Harmon          116
5     A   June Strickland           68
1     A        Traci Dean           64
8     B         Joel Gill          132
9     B      Vernon Stone           80
7     B      Betty Sutton          127
6     B      Beth Vasquez           95

匿名用户

尝试:

df['rank'] = df.groupby(['Group'])["Performance"].rank("dense", ascending=False)
df.sort_values(['Group','rank'])

结果:

  Group              Name  Performance  rank
2     A      Chad Webster          142   1.0
4     A    Elijah Mendoza          122   2.0
3     A        Ora Harmon          116   3.0
5     A   June Strickland           68   4.0
1     A        Traci Dean           64   5.0
0     A      Sheldon Webb           33   6.0
8     B         Joel Gill          132   1.0
7     B      Betty Sutton          127   2.0
6     B      Beth Vasquez           95   3.0
9     B      Vernon Stone           80   4.0

首先创建每个组的排名,然后根据计算出的排名对数据帧进行排序。

匿名用户

嗯。。。

>>> df.groupby('Group')[df.columns[1:]]
      .apply(lambda x:
             pd.concat([x.nlargest(x.shape[0]//2,'Performance').reset_index(),
                        x.nsmallest(x.shape[0] - x.shape[0]//2, 'Performance')
                       .reset_index()])
            .sort_index()
            .drop('index',1))
      .reset_index().drop('level_1',1)

  Group              Name  Performance
0     A      Chad Webster          142
1     A      Sheldon Webb           33
2     A    Elijah Mendoza          122
3     A        Traci Dean           64
4     A        Ora Harmon          116
5     A   June Strickland           68
6     B         Joel Gill          132
7     B      Vernon Stone           80
8     B      Betty Sutton          127
9     B      Beth Vasquez           95