我是刮痧与汽车提供的网站,在那里我有型号,价格,里程等。我试图提取里程它只是文本在一个标签西里尔。 然而,无论我怎么做,它都说那里没有文本。
这是html代码:
<tr class="odd">
<td style="padding-right:0px;" valign="top" width="200">
<a href="offer/5ef37bc31cd64405946eff12">
<img align="left" border="0" onmouseout="UnTip()"
onmouseover="Tip('<div ><div style=\'float:left;\' class=\'ver15black\'><b>Citroen Xsara Picasso 1.6 hdi</b></div><div style=\'float:right;text-align:right;\'><span class=\'ver20black\'><strong>3,500</strong></span><br>ЛЕВА</strong></div></div><div style=\'clear:both;padding-top:5px;\'></div><center><img width=448 height=336 src=\'https://g1-bg.cars.bg/2020-06-24_2/5ef37adfca1c397c9015b753o.jpg\'></center><div style=\'clear:both\'></div><div class=\'ver13black\' style=\'padding-top:5px;\'>дизел, 2007 (нов внос) , 170011 км, Гоце Делчев</div>')"
src="https://g1-bg.cars.bg/2020-06-24_2/5ef37adfca1c397c9015b753b.jpg" style="padding-right:10px;"
width="200"/>
</a></td>
<td align="left" style="border-left:0px;padding-left:0px;" valign="top" width="360">
<span style="color:#808080;font-size: 0.85em;"><i>днес, 21:12</i></span><br/>
<a class="ver15black" href="offer/5ef37bc31cd64405946eff12"><span class="ver15black"><b>Citroen Xsara Picasso 1.6 hdi</b></span></a>
<br>
дизел, 170,011 км
<div style="word-break: break-all;margin-top: 10px; font-style: italic; font-size: 0.9em; /*line-height: 1.5em;*/ color:#666666;">
ЛИЗИНГ БЕЗ ДОКАЗВАНЕ НА ДОХОДИ С ИЗКЛЮЧИТЕЛНО ГЪВКАВИ УСЛОВИЯ Aвтомобила е нов внос от ИТАЛИЯ на реални
километри перфектен мотор, скорости, ходова ча...
</div>
</br></td>
<td align="center" valign="top" width="80"><span class="year">2007</span><br/>
нов внос
</td>
<td align="right" valign="top" width="120">
<span class="ver20black"><strong>3,500</strong></span><br/>
ЛЕВА
</td>
<td align="center" valign="top" width="150">
<span class="ownerName">частно лице</span>
<br/>
<img height="5" src="https://assets.cars.bg/desktop/images/px.gif" width="1"/>
<br/>
<span class="cityName">Гоце Делчев</span>
</td>
<td>
<div class="iconed notepadlist icon-star" id="5ef37bc31cd64405946eff12"
style="font-size: 1.8em; position: relative; float: right; cursor: pointer;" title="Запази"></div>
</td>
</tr>
我需要摘录的文本是这个:authorizated.,170,011ca.p./code>
我尝试了两种不同的获取数据的方法(使用requests get和urllib.request)
from bs4 import BeautifulSoup as soup
from requests import get
import json
my_url='https://www.cars.bg/?go=cars&search=1&fromhomeu=1¤cyId=1&autotype=1&stateId=1&offersForD=1&offersForA=1&filterOrderBy=1&radius=1'
#headers = {"Accept-Language": "bg-BG, bg;q=0.5"}
response = get(my_url)
html_soup = soup(response.text, 'html.parser')
#type(html_soup)
page_content = html_soup.find_all('tr', class_='odd')
for container in page_content:
name = container.td.text
print(name)
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url='https://www.cars.bg/?go=cars&search=1&fromhomeu=1¤cyId=1&autotype=1&stateId=1&offersForD=1&offersForA=1&filterOrderBy=1&radius=1'
#opening up connection
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
#html parsing
page_soup = soup(page_html, "html.parser")
#grab each product
odd = page_soup.findAll("tr",{"class":"odd"})
for od in odd:
mileage=od.td.text
print(mileage)
所以我的问题是如何提取文本?
此脚本没有完全按照您的要求提取脚本。 但几乎离它很近。
from bs4 import BeautifulSoup as soup
from requests import get
import json
my_url='https://www.cars.bg/?go=cars&search=1&fromhomeu=1¤cyId=1&autotype=1&stateId=1&offersForD=1&offersForA=1&filterOrderBy=1&radius=1'
#headers = {"Accept-Language": "bg-BG, bg;q=0.5"}
response = get(my_url)
html_soup = soup(response.text, 'html.parser')
#type(html_soup)
page_content = html_soup.find_all('td', {"width": 360})
for container in page_content:
name = container.text.split(" ")
name = [ele for ele in name if ele != '']
name=name[name.index('км')-3 if not '\n' in name[name.index('км')-3] else name.index('км')-2 :name.index('км')+1]
print(name)
输出:
['дизел,', '212,000', 'км']
['дизел,', 'автоматик,', '182,000', 'км']
['дизел,', '129,000', 'км']
['бензин,', '42,000', 'км']
['дизел,', '202,000', 'км']
['бензин,', '14,000', 'км']
['бензин,', 'автоматик,', '200,000', 'км']
['дизел,', '186,000', 'км']
['бензин,', '118,000', 'км']
['дизел,', 'автоматик,', '200,000', 'км']
['дизел,', '204,000', 'км']
['дизел,', '163,000', 'км']
['дизел,', '212,000', 'км']
['бензин,', 'автоматик,', '144,000', 'км']
['дизел,', '183,000', 'км']
['дизел,', '152,000', 'км']
['бензин,', '103,000', 'км']
['дизел,', '190,000', 'км']
['дизел,', '166,000', 'км']
['дизел,', '192,000', 'км']