提问者:小点点

用beautifulsoup Python 3.8从天才歌词中获取歌曲歌词


我正试图用beautifulsoup从genius lyrics中获取一首歌曲的歌词,但当试图打印出歌词时,我没有得到任何输出。下面是我的代码:

import requests 
from bs4 import BeautifulSoup
songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics")
song = songURL.content
soup = BeautifulSoup(song, 'lxml')
lyrics = soup.find_all("section")
for lyr in lyrics:
    for lyr1 in lyrics.select("p"):
        print(lyr1.text)      

为什么这不起作用,有人可以看看这个,因为我已经尝试这样做了一段时间了。


共3个答案

匿名用户

服务器似乎返回页面的两个版本:在一个版本中有class=“song_body-lyrics”的标记,在另一个版本中有class=“lyrics__container.。。”的标记。

此脚本尝试处理这两种情况:

import requests 
from bs4 import BeautifulSoup

url = 'https://genius.com/Marshmello-and-bastille-happier-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

for tag in soup.select('div[class^="Lyrics__Container"], .song_body-lyrics p'):
    t = tag.get_text(strip=True, separator='\n')
    if t:
        print(t)

打印:

[Intro]
Lately, I've been, I've been thinking
I want you to be happier, I want you to be happier
[Verse 1]

...and so on.

匿名用户

import requests 
from bs4 import BeautifulSoup
songURL = requests.get("https://genius.com/Marshmello-and-bastille-happier-lyrics")
song = songURL.content
soup = BeautifulSoup(song, 'lxml')
final_lyrics = []
lyrics = soup.find('div', {'class': "lyrics"})
lyrics = lyrics.find_all('p')
for i in lyrics:
    final_lyrics.append(i.text)
    print(i)

匿名用户

如果您看一看实际的HTML源代码,就会发现没有section标记。下面是它的实际结构:

<div class="song_body column_layout" initial-content-for="song_body">
  <div class="column_layout-column_span column_layout-column_span--primary">
    <div class="song_body-lyrics">
      
        <h2 class="text_label text_label--gray text_label--x_small_text_size u-top_margin">Happier Lyrics</h2>
      
      <div initial-content-for="lyrics">
        <div class="lyrics">
          
            <!--sse-->
            <p>[Intro]<br>
Lately, I've been, I've been thinking<br>
I want you to be happier, I want you to be happier<br>
<br>
...