提问者:小点点

将多个数据类型刮入同一数据帧


我正在尝试浏览以下网站:https://www.basketball-reference.com/players/a/

我的最终目标是构建该表的数据框架,以及包含players索引的新列。 例如,对于顶级球员,这将是Abdelal01。

我目前的尝试:

url = "https://www.basketball-reference.com/players/a"
# this is the HTML from the given URL
html = urlopen(url)
soup = BeautifulSoup(html)

headers = [th.getText() for th in soup.findAll('tr')[0].findAll('th')]
headers = headers

rows = soup.findAll('tr')

player_names = [[td.getText() for td in rows[i].findAll('th')]
            for i in range(len(rows))]



names = pd.DataFrame(player_names, columns = headers)
names.head(10)

player_stats = [[td.getText() for td in rows[i].findAll('td')]
            for i in range(len(rows))]


stats = pd.DataFrame(player_stats, columns = headers[1:])
stats['Player'] = names['Player']

实际上,这完全重建了表,但没有指向播放器的URL。 有没有更简单的方法来实现这一点,而不是构建两个数据帧,因为在html中它们有不同的参考点?

而收集指数给玩家最好的方法是什么呢?


共1个答案

匿名用户

提取表数据的最简单方法是通过pandas包。 这样就可以很容易地进行操作。

read_html()方法从页面中抓取任何表数据。

import pandas as pd
df = pd.read_html('https://www.basketball-reference.com/players/a/')[0]
df
          Player    From    To      Pos Ht      Wt  Birth Date  Colleges
0   Alaa Abdelnaby  1991    1995    F-C 6-10    240 June 24, 1968   Duke
1   Zaid Abdul-Aziz 1969    1978    C-F 6-9 235 April 7, 1946   Iowa State
2   Kareem Abdul-Jabbar*    1970    1989    C   7-2 225 April 16, 1947  UCLA
3   Mahmoud Abdul-Rauf  1991    2001    G   6-1 162 March 9, 1969   LSU
4   Tariq Abdul-Wahad   1998    2003    F   6-6 223 November 3, 1974    Michigan, San Jose State
... ... ... ... ... ... ... ... ...
161 Dennis Awtrey   1971    1982    C   6-10    235 February 22, 1948   Santa Clara
162 Gustavo Ayón    2012    2014    C   6-10    250 April 1, 1985   NaN
163 Jeff Ayres  2010    2016    F   6-9 240 April 29, 1987  Arizona State
164 Deandre Ayton   2019    2020    C   6-11    250 July 23, 1998   Arizona
165 Kelenna Azubuike    2007    2012    G   6-5 220 December 16, 1983   Kentucky
df['players']
0            Alaa Abdelnaby
1           Zaid Abdul-Aziz
2      Kareem Abdul-Jabbar*
3        Mahmoud Abdul-Rauf
4         Tariq Abdul-Wahad
               ...         
161           Dennis Awtrey
162            Gustavo Ayón
163              Jeff Ayres
164           Deandre Ayton
165        Kelenna Azubuike