提问者:小点点

如何在列表数据框架中找到具有最常见元素的行?


我有一个艺术家的数据框架,每个艺术家都有他们所关联的流派列表

    Artist         Genres             
0     A      ['Pop','Dance Pop']
1     B      ['Rock, Rock n Roll']
2     C      ['Electronic]
3     D      ['Pop', 'Dance Pop', 'Electro Pop']
4     E      ['Pop']
5     F      ['Dance Pop']

我想做一个艺术家recommeder系统,在那里基本上给一个艺术家,哪些其他艺术家是相似的他们排名的数量共同流派。

例如,假设我想查找类似于A的数据,我想要一个输出,它返回一个新的数据帧,如:

Similar Artist to A      Similar Genres
         D             ['Pop','Dance Pop']
         E                   ['Pop']
         F                ['Dance Pop']

有人知道怎么做吗?


共1个答案

匿名用户

import pandas as pd

def rank_artist_similarity(data, artist):
    artist_data = data[data.Artist == artist]
    artist_genres = set(*artist_data.Genres)
    similarity_data = data.drop(artist_data.index)
    similarity_data.Genres = similarity_data.Genres.apply(lambda genres: list(set(genres).intersection(artist_genres)))
    similarity_lengths = similarity_data.Genres.str.len()
    similarity_data = similarity_data.reindex(similarity_lengths[similarity_lengths > 0].sort_values(ascending=False).index)
    similarity_data.rename({'Artist': f'Similar Artist to {artist}', 'Genres': 'Similar Genres'}, inplace=True)
    return similarity_data

df = pd.DataFrame({'Artist': ['A', 'B', 'C', 'D', 'E', 'F'], 'Genres': [['Pop','Dance Pop'], ['Rock, Rock n Roll'], ['Electronic'], ['Pop', 'Dance Pop', 'Electro Pop'], ['Pop'],['Dance Pop']]})

rank_artist_similarity(df, 'A')
  Artist            Genres
3      D  [Pop, Dance Pop]
5      F       [Dance Pop]
4      E             [Pop]