提问者:小点点

美丽的汤 - 从另一个标签内的标签中获取文本


我正在尝试使用漂亮的汤解析网页(这是我有生以来第一次),我遇到了一个奇怪的错误。html结构中的标记中有一个标记,我一直收到错误

AttributeError: 'NoneType' object has no attribute 'text'

html标签的结构如下:页面上项目的整个网格都在div类"properties_reviews"中,然后进入div类"preview"中,用于特定项目,该类"preview"还有两个类:照片的"preview-media"和我需要解析的文本信息的"preview-content"。"preview-content"类有[a]标签,其中包含两个带有项目价格和平方的标签,以及一个带有我也需要的区域的[h2]标签。

<div class="properties-previews">
    <div class="preview"
        <div class="preview-media">
        <div class="preview-content">
            <a href="/properties/1042-us-highway-1-hancock-me-04634/1330428"
               class="preview__link">
                <span class="preview__price">$89,900</span>
                <span class="preview__size">1 ac</span>
                <div class="preview__subtitle">
                    <h2 class="-g-truncated preview__subterritory">Hancock County
                    </h2>
                    <span class="preview__extended">-- sq ft</span>
                </div>
            </a>

所以我试图从preview_price中取出89,990美元;从中取出1个acpreview_size;汉考克县从preview_subtitle,到目前为止,我的python代码是这样的(我省略了所有导入和请求):

landplots = soup.find_all('div', class_ = 'properties-previews')

for l in landplots:
  plot_price = l.find('span', {"class": 'preview_price'})
  plot_square = l.find('span', {"class": 'preview_size'})
  plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').text
  plot_location = l.find('span', class_ = 'preview__locality -g-truncated').text

  print(plot_price).text
  print(plot_county)

我做错了什么?我已经开始理解,一旦一个标签在另一个标签中,应该有一些特殊的语法来获取这些单词,但是错误提示我根本没有文本(在我正在做的两个印刷品上),这让我很困惑。请帮助!


共1个答案

匿名用户

每个值都在一个文本节点下。因此您可以调用< code >。find_next(text=True)提取所需的数据项

html='''
<div class="properties-previews">
 <div <div="" class="preview-media">
  <div class="preview-content">
   <a class="preview__link" href="/properties/1042-us-highway-1-hancock-me-04634/1330428">    
    <span class="preview__price">
     $89,900
    </span>
    <span class="preview__size">
     1 ac
    </span>
    <div class="preview__subtitle">
     <h2 class="-g-truncated preview__subterritory">
      Hancock County
     </h2>
     <span class="preview__extended">
      -- sq ft
     </span>
    </div>
   </a>
  </div>
 </div>
</div>
'''

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, 'html.parser')
#print(soup.prettify())

landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")

for l in landplots:
  plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
  plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
  plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
 
  print(plot_price)
  print( plot_square)

输出:

$89,900
1 ac

更新:它的工作很好,没有任何问题,根据html dom

import requests
from bs4 import BeautifulSoup 
url='https://www.landsearch.com/industrial/united-states/p1'
res= requests.get(url)

soup = BeautifulSoup(res.content,'lxml')

landplots = soup.find_all('div', class_ = 'preview-content')#.find_all('div',class_="preview-media")

for l in landplots:
  plot_price = l.find('span', {"class": 'preview__price'}).find_next(text=True).get_text(strip=True)
  plot_square = l.find('span', {"class": 'preview__size'}).find_next(text=True).get_text(strip=True)
  plot_county = l.find('h2', class_ = '-g-truncated preview__subterritory').find_next(text=True).text
 
  print(plot_price)
  print( plot_square)

输出:

$89,900
1 ac    
$995,000
2.32 ac 
$85,000 
0.93 ac 
$888,000
11 ac   
$599,000
21.6 ac 
$225,000
3.72 ac 
$100,000
6.5 ac  
$75,000
4.48 ac
$749,000
8.2 ac
$225,000
84.5 ac
$225,000
84.5 ac
$275,000
29 ac
$275,000
29 ac
$40,000
0.22 ac
$2,330,000
2.8 ac
$535,000
3.71 ac
$169,900
34 ac
$499,000
1 ac
$299,000
2.53 ac
$299,000
2.53 ac
$299,000
2.53 ac
$799,000
2 ac
$199,000
0.79 ac
$997,600
3.27 ac
$699,000
1.71 ac
$529,000
1 ac
$499,900
1 ac
$50,000
1.14 ac
$250,000
55 ac
$50,000
1.14 ac
$11,000,000
31.4 ac
$1,200,000
1.68 ac
$94,900
85 ac
$896,000
2.38 ac
$189,000
1 ac