提问者:小点点

从网站获取特定数据的Python


我是python的新手,我在界面上工作,我应该从imdb网站上拍前250部电影。

def clicked(self):
    movie=self.movie_name.text()
    
    url="https://www.imdb.com/chart/top/"
    response=requests.get(url)
    html_content=response.content
    soup=BeautifulSoup(html_content,"html.parser")

    movie_name = soup.find_all("td",{"class":"titleColumn"})
    for i in movie_name:
        i=i.text

        i=i.strip()

        i=i.replace("\n","")

        if (movie == i):
            self.yazialani.setText(i) 

这个代码的输出是这样的:6。 辛德勒的名单(1993)7。 《指环王:王者归来》(2003)8。 低俗小说(1994),但我的项目,我只想要电影的名字,而不是年份和排名。我应该如何改变我的代码?


共2个答案

匿名用户

一个基本的解决方案可能是(考虑到您的字符串是tipedigits+.+name_of_movie+(YEAR)

a=["6. Schindler's List(1993)", "7. The Lord of the Rings: The Return of the King(2003)", "8. Pulp Fiction(1994)"]
just_names=[]
for name in a:
    i=0
    while True:
        if name[i]=='.':
            just_names.append(name[i+2:-6]) # To delete the space after the point
            break
        i+=1

匿名用户

锚标签中只包含电影的名称。 因此为每个td选择锚标记文本

import requests
from bs4 import BeautifulSoup

url="https://www.imdb.com/chart/top/"
response=requests.get(url)
html_content=response.content
soup=BeautifulSoup(html_content,"html.parser")

movie_name = soup.find_all("td",{"class":"titleColumn"})

for i in movie_name:
    print(i.find("a").get_text(strip=True))

输出:

The Shawshank Redemption
The Godfather
The Godfather: Part II
The Dark Knight
12 Angry Men
Schindler's List
The Lord of the Rings: The Return of the King
Pulp Fiction
Il buono, il brutto, il cattivo
The Lord of the Rings: The Fellowship of the Ring
Fight Club
Forrest Gump
Inception
Star Wars: Episode V - The Empire Strikes Back
The Lord of the Rings: The Two Towers
The Matrix
Goodfellas
One Flew Over the Cuckoo's Nest
Shichinin no samurai
Se7en
La vita è bella
Cidade de Deus
The Silence of the Lambs
Hamilton
It's a Wonderful Life
Star Wars
Saving Private Ryan
Sen to Chihiro no kamikakushi
Gisaengchung
The Green Mile
Interstellar
Léon
The Usual Suspects
Seppuku
The Lion King
Back to the Future
The Pianist
Terminator 2: Judgment Day
American History X
Modern Times
Psycho
Gladiator
City Lights
The Departed
The Intouchables
Whiplash
The Prestige
...
...
..