我有一个艺术家的数据框架,每个艺术家都有他们所关联的流派列表
Artist Genres
0 A ['Pop','Dance Pop']
1 B ['Rock, Rock n Roll']
2 C ['Electronic]
3 D ['Pop', 'Dance Pop', 'Electro Pop']
4 E ['Pop']
5 F ['Dance Pop']
我想做一个艺术家recommeder系统,在那里基本上给一个艺术家,哪些其他艺术家是相似的他们排名的数量共同流派。
例如,假设我想查找类似于A的数据,我想要一个输出,它返回一个新的数据帧,如:
Similar Artist to A Similar Genres
D ['Pop','Dance Pop']
E ['Pop']
F ['Dance Pop']
有人知道怎么做吗?
import pandas as pd
def rank_artist_similarity(data, artist):
artist_data = data[data.Artist == artist]
artist_genres = set(*artist_data.Genres)
similarity_data = data.drop(artist_data.index)
similarity_data.Genres = similarity_data.Genres.apply(lambda genres: list(set(genres).intersection(artist_genres)))
similarity_lengths = similarity_data.Genres.str.len()
similarity_data = similarity_data.reindex(similarity_lengths[similarity_lengths > 0].sort_values(ascending=False).index)
similarity_data.rename({'Artist': f'Similar Artist to {artist}', 'Genres': 'Similar Genres'}, inplace=True)
return similarity_data
df = pd.DataFrame({'Artist': ['A', 'B', 'C', 'D', 'E', 'F'], 'Genres': [['Pop','Dance Pop'], ['Rock, Rock n Roll'], ['Electronic'], ['Pop', 'Dance Pop', 'Electro Pop'], ['Pop'],['Dance Pop']]})
rank_artist_similarity(df, 'A')
Artist Genres
3 D [Pop, Dance Pop]
5 F [Dance Pop]
4 E [Pop]