我正在尝试使用漂亮的汤解析网页(这是我有生以来第一次),我遇到了一个奇怪的错误。html结构中的标记中有一个标记,我一直收到错误
AttributeError: 'NoneType' object has no attribute 'text'
html标签的结构如下:页面上项目的整个网格都在div类"properties_reviews"中,然后进入div类"preview"中,用于特定项目,该类"preview"还有两个类:照片的"preview-media"和我需要解析的文本信息的"preview-content"。"preview-content"类有[a]
标签,其中包含两个带有项目价格和平方的标签,以及一个带有我也需要的区域的
[h2]
标签。
<div class="properties-previews">
<div class="preview"
<div class="preview-media">
<div class="preview-content">
<a href="/properties/1042-us-highway-1-hancock-me-04634/1330428"
class="preview__link">
<span class="preview__price">$89,900</span>
<span class="preview__size">1 ac</span>
<div class="preview__subtitle">
<h2 class="-g-truncated preview__subterritory">Hancock County
</h2>
<span class="preview__extended">-- sq ft</span>
</div>
</a>
所以我试图从preview_price
中取出89,990美元;从中取出1个acpreview_size
;汉考克县从preview_subtitle
,到目前为止,我的python代码是这样的(我省略了所有导入和请求):
landplots = soup.find_all('div', class_ = 'properties-previews')
for l in landplots:
plot_price = l.find('span', {"class": 'preview_price'})
plot_square = l.find('span', {"class": 'preview_size'})
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').text
plot_location = l.find('span', class_ = 'preview__locality -g-truncated').text
print(plot_price).text
print(plot_county)
我做错了什么?我已经开始理解,一旦一个标签在另一个标签中,应该有一些特殊的语法来获取这些单词,但是错误提示我根本没有文本(在我正在做的两个印刷品上),这让我很困惑。请帮助!
每个值都在一个文本节点下。因此您可以调用< code >。find_next(text=True)提取所需的数据项
html='''
<div class="properties-previews">
<div <div="" class="preview-media">
<div class="preview-content">
<a class="preview__link" href="/properties/1042-us-highway-1-hancock-me-04634/1330428">
<span class="preview__price">
$89,900
</span>
<span class="preview__size">
1 ac
</span>
<div class="preview__subtitle">
<h2 class="-g-truncated preview__subterritory">
Hancock County
</h2>
<span class="preview__extended">
-- sq ft
</span>
</div>
</a>
</div>
</div>
</div>
'''
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
#print(soup.prettify())
landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")
for l in landplots:
plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
print(plot_price)
print( plot_square)
输出:
$89,900
1 ac
更新:它的工作很好,没有任何问题,根据html dom
import requests
from bs4 import BeautifulSoup
url='https://www.landsearch.com/industrial/united-states/p1'
res= requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")
for l in landplots:
plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
print(plot_price)
print( plot_square)
输出:
$89,900
1 ac
$995,000
2.32 ac
$85,000
0.93 ac
$888,000
11 ac
$599,000
21.6 ac
$225,000
3.72 ac
$100,000
6.5 ac
$75,000
4.48 ac
$749,000
8.2 ac
$225,000
84.5 ac
$225,000
84.5 ac
$275,000
29 ac
$275,000
29 ac
$40,000
0.22 ac
$2,330,000
2.8 ac
$535,000
3.71 ac
$169,900
34 ac
$499,000
1 ac
$299,000
2.53 ac
$299,000
2.53 ac
$299,000
2.53 ac
$799,000
2 ac
$199,000
0.79 ac
$997,600
3.27 ac
$699,000
1.71 ac
$529,000
1 ac
$499,900
1 ac
$50,000
1.14 ac
$250,000
55 ac
$50,000
1.14 ac
$11,000,000
31.4 ac
$1,200,000
1.68 ac
$94,900
85 ac
$896,000
2.38 ac
$189,000
1 ac