提问者:小点点

使用BeautifulSoup迭代div元素


我想遍历这个表并获得名称wins和loss,然后将它们插入CSV,列表或JSON文件中。 使用下面的代码,即使尝试for循环,我也只能获得表中第一个元素的HTML:

from bs4 import BeautifulSoup as bs 
import requests 
from requests import get 
import pandas as pd 
import json 
import time 
from time import sleep



url = 'https://www.basketball-reference.com/international/euroleague/2020.html' 

time.sleep(2)
source = requests.get(url).text

time.sleep(4)
soup = bs(source,'lxml')

time.sleep(2)
for item in soup.find_all('div' , class_='table_outer_container'):
    #prints only first item
    team=item.div.table.tbody.tr
    print(team)

表元素的结构:

null

<div class="table_outer_container">
      <div class="overthrow table_container" id="div_elg_standings">
      
  <table class="sortable stats_table now_sortable" id="elg_standings" data-cols-to-freeze="1"><caption>EuroLeague Standings Table</caption>
   <colgroup><col><col><col></colgroup>
   <thead>
      
      <tr class="over_header"><th></th>
         <th aria-label="" data-stat="Regular Season" colspan="2" class=" over_header center">Regular Season</th>
      </tr>
      

      
      <tr>
         <th aria-label="&nbsp;" data-stat="team" scope="col" class=" poptip center">&nbsp;</th>
         <th aria-label="Wins" data-stat="wins|Regular Season" scope="col" class=" poptip right" data-tip="Wins" data-over-header="Regular Season">W</th>
         <th aria-label="Losses" data-stat="losses|Regular Season" scope="col" class=" poptip right" data-tip="Losses" data-over-header="Regular Season">L</th>
      </tr>
      
   </thead>
   <tbody>
<tr data-row="0"><th scope="row" class="left " data-stat="team"><a href="/international/teams/anadolu-efes/2020.html">Anadolu Efes</a></th><td class="right " data-stat="wins|Regular Season">24</td><td class="right " data-stat="losses|Regular Season">4</td></tr>
<tr data-row="1"><th scope="row" class="left " data-stat="team"><a href="/international/teams/real-madrid/2020.html">Real Madrid</a></th><td class="right " data-stat="wins|Regular Season">22</td><td class="right " data-stat="losses|Regular Season">6</td></tr>
<tr data-row="2"><th scope="row" class="left " data-stat="team"><a href="/international/teams/barcelona/2020.html">FC Barcelona</a></th><td class="right " data-stat="wins|Regular Season">22</td><td class="right " data-stat="losses|Regular Season">6</td></tr>
<tr data-row="3"><th scope="row" class="left " data-stat="team"><a href="/international/teams/cska-moscow/2020.html">CSKA Moscow</a></th><td class="right " data-stat="wins|Regular Season">19</td><td class="right " data-stat="losses|Regular Season">9</td></tr>
<tr data-row="4"><th scope="row" class="left " data-stat="team"><a href="/international/teams/maccabi-tel-aviv/2020.html">Maccabi FOX Tel Aviv</a></th><td class="right " data-stat="wins|Regular Season">19</td><td class="right " data-stat="losses|Regular Season">9</td></tr>
<tr data-row="5"><th scope="row" class="left " data-stat="team"><a href="/international/teams/panathinaikos/2020.html">Panathinaikos OPAP</a></th><td class="right " data-stat="wins|Regular Season">14</td><td class="right " data-stat="losses|Regular Season">14</td></tr>
<tr data-row="6"><th scope="row" class="left " data-stat="team"><a href="/international/teams/ulker-fenerbahce/2020.html">Fenerbahçe Beko</a></th><td class="right " data-stat="wins|Regular Season">13</td><td class="right " data-stat="losses|Regular Season">15</td></tr>
<tr data-row="7"><th scope="row" class="left " data-stat="team"><a href="/international/teams/khimki/2020.html">Khimki</a></th><td class="right " data-stat="wins|Regular Season">13</td><td class="right " data-stat="losses|Regular Season">15</td></tr>
<tr data-row="8"><th scope="row" class="left " data-stat="team"><a href="/international/teams/vitoria/2020.html">Kirolbet Baskonia</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="9"><th scope="row" class="left " data-stat="team"><a href="/international/teams/olympiakos/2020.html">Olympiacos</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="10"><th scope="row" class="left " data-stat="team"><a href="/international/teams/zalgiris/2020.html">Žalgiris</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="11"><th scope="row" class="left " data-stat="team"><a href="/international/teams/valencia/2020.html">Valencia Basket</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="12"><th scope="row" class="left " data-stat="team"><a href="/international/teams/milano/2020.html">AX Armani Exchange Olimpia</a></th><td class="right " data-stat="wins|Regular Season">12</td><td class="right " data-stat="losses|Regular Season">16</td></tr>
<tr data-row="13"><th scope="row" class="left " data-stat="team"><a href="/international/teams/red-star/2020.html">Crvena zvezda mts</a></th><td class="right " data-stat="wins|Regular Season">11</td><td class="right " data-stat="losses|Regular Season">17</td></tr>
<tr data-row="14"><th scope="row" class="left " data-stat="team"><a href="/international/teams/villeurbanne/2020.html">LDLC ASVEL</a></th><td class="right " data-stat="wins|Regular Season">10</td><td class="right " data-stat="losses|Regular Season">18</td></tr>
<tr data-row="15"><th scope="row" class="left " data-stat="team"><a href="/international/teams/alba-berlin/2020.html">Alba Berlin</a></th><td class="right " data-stat="wins|Regular Season">9</td><td class="right " data-stat="losses|Regular Season">19</td></tr>
<tr data-row="16"><th scope="row" class="left " data-stat="team"><a href="/international/teams/triumph-moscow/2020.html">Zenit Saint Petersburg</a></th><td class="right " data-stat="wins|Regular Season">8</td><td class="right " data-stat="losses|Regular Season">20</td></tr>
<tr data-row="17"><th scope="row" class="left " data-stat="team"><a href="/international/teams/bayern-muenchen/2020.html">Bayern Munich</a></th><td class="right " data-stat="wins|Regular Season">8</td><td class="right " data-stat="losses|Regular Season">20</td></tr>

</tbody></table>

      </div>
   </div>

null

我将非常感谢您的帮助,指导我正确地迭代这个元素并获得团队名称,赢和输。 提前谢谢你。


共1个答案

匿名用户

请尝试以下操作:

代码

import requests
from bs4 import BeautifulSoup

url = 'https://www.basketball-reference.com/international/euroleague/2020.html'

soup = BeautifulSoup(requests.get(url).text, 'html.parser')
teams = soup.find('div', class_='table_outer_container')

for team in teams.find_all('a'):
    # prints only first item
    team_name = team.text
    wins = team.parent.parent.find('td', {'data-stat': 'wins|Regular Season'}).text
    losses = team.parent.parent.find('td', {'data-stat': 'losses|Regular Season'}).text
    print(team_name, wins, losses)

输出量

Anadolu Efes 24 4
Real Madrid 22 6
FC Barcelona 22 6
CSKA Moscow 19 9
Maccabi FOX Tel Aviv 19 9
Panathinaikos OPAP 14 14
Fenerbahçe Beko 13 15
Khimki 13 15
Kirolbet Baskonia 12 16
Olympiacos 12 16
Žalgiris 12 16
Valencia Basket 12 16
AX Armani Exchange Olimpia 12 16
Crvena zvezda mts 11 17
LDLC ASVEL 10 18
Alba Berlin 9 19
Zenit Saint Petersburg 8 20
Bayern Munich 8 20