Python-BeautifulSoup-只返回一个结果

提问者：小点点

Python-BeautifulSoup-只返回一个结果

我正在尝试从下面的链接中刮取体育日程数据

https://sport-tv-guide.live/live/darts

我正在使用下面的代码

def makesoup(url):
    page=requests.get(url)
    return BeautifulSoup(page.text,"lxml")
   
    
def matchscrape(g_data):


    for match in g_data:
        datetimes = match.find('div', class_='main time col-sm-2 hidden-xs').text.strip()
        print("DateTimes; ", datetimes) 
        print('-' *80)
        
def matches():
    soup=makesoup(url = "https://sport-tv-guide.live/live/darts")
    matchscrape(g_data = soup.findAll("div", {"class": "listData"}))

我遇到的问题是只返回第一个结果（见下文）

而应该输出两个值（见下文）

我打印了运行时收到的输出

def matches():
    soup=makesoup(url = "https://sport-tv-guide.live/live/darts")
    matchscrape(g_data = soup.findAll("div", {"class": "listData"}))

由于某种原因，HTML中似乎只返回第一个结果（见下文），这将导致为什么只返回第一个结果，因为这是从所接收的HTML中可以找到的唯一结果。我不确定的是为什么Beautifulsoup不输出整个HTML，所以所有的结果都可以输出？

感谢任何能够协助或解决这个问题的人。

匿名用户

今天只有一个时间，但是你可以先用想要的日期和重新加载页面进行POST请求来获得明天的时间。

例如:

import requests
from bs4 import BeautifulSoup


url = 'https://sport-tv-guide.live/live/darts'
select_date_url = 'https://sport-tv-guide.live/ajaxdata/selectdate'

with requests.session() as s:
    # print times for today:
    print('Times for today:')
    soup = BeautifulSoup(s.get(url).content, 'html.parser')
    for t in soup.select('.time'):
        print(t.get_text(strip=True, separator=' '))

    # select tomorrow:
    s.post(select_date_url, data={'d': '2020-07-19'}).text

    # print times for tomorrow:
    print('Times for 2020-07-19:')
    soup = BeautifulSoup(s.get(url).content, 'html.parser')
    for t in soup.select('.time'):
        print(t.get_text(strip=True, separator=' '))

打印:

Times for today:
Darts 17:05
Times for 2020-07-19:
Darts 19:05
Darts 19:05

匿名用户

您的matchscrape函数错误。不应使用返回第一项的match.find函数，而应使用与matchs函数中的match.findall函数相同的方式。然后迭代找到的日期时间，如下面的示例所示。

def matchscrape(g_data):
    for match in g_data:
        datetimes = match.findAll('div', class_='main time col-sm-2 hidden-xs')
        for datetime in datetimes:
            print("DateTimes; ", datetime.text.strip())
            print('-' * 80)

第二件事是解析html页面。该页面是用HTML编写的，因此您可能应该使用BeautifulSoup(page.text，'html.parser')而不是lxml

匿名用户

正如@ycmelon已经回答的，我也只有一个时间戳。不过，还有其他一些东西可能会引起这个问题。在这种情况下，网站通常具有动态内容，并且在某些情况下，这些内容并不总是随着请求正确加载。

如果您确实可以确定问题是requests没有正确地获取站点，请尝试requests_html，它将打开一个会话，该会话肯定会加载所有Dynamic内容:

from requests_html import HTMLSession
from bs4 import BeautifulSoup

session = HTMLSession()
request = session.get(LINK)
html = BeautifulSoup(request.text, "html.parser")

Python-BeautifulSoup-只返回一个结果

共3个答案

相关问题

热门标签

Python-BeautifulSoup-只返回一个结果

共3个答案

相关问题

热门标签

微信关注