请帮助,我不知道如何使用BeautifulSoup选择特定的div,当多个div具有相同的类名,没有id标记时。
我正在尝试浏览的网页:https://www.helpmefind.com/rose/l.php?l=2.65689。
我想独立地选择特定div的内容,然后传递给csv文件。 由于find_all返回了多个div,所以陷入了困境,我不知道如何进一步限制。
rose_div = rose.find_all("div", class_="hdg")
返回:
[<div class="hdg">HMF Ratings:</div>, <div class="hdg">Origin:</div>, <div class="hdg">Class:</div>, <div class="hdg">Bloom:</div>, <div class="hdg">Parentage:</div>, <div class="hdg">Notes:</div>, <div class="hdg"> </div>]
我想在div下面单独选择:
<div class="hdg">Origin:</div>
<div class="hdg">Class:</div>
<div class="hdg">Bloom:</div>
<div class="hdg">Parentage:</div>
您可以使用CSS选择器 例如: 打印:div.hdg:contains(“origin:”)
选择包含单词“origing:”的class=“hdg”
包含单词“origing:”。 要获取具有类grp
的下一个元素,可以添加+.grp
。import requests
from bs4 import BeautifulSoup
url = 'https://www.helpmefind.com/rose/l.php?l=2.65689'
soup = BeautifulSoup( requests.get(url).content, 'html.parser' )
origin = soup.select_one('div.hdg:contains("Origin:") + .grp').text
class_ = soup.select_one('div.hdg:contains("Class:") + .grp').text
bloom = soup.select_one('div.hdg:contains("Bloom:") + .grp').text
parentage = soup.select_one('div.hdg:contains("Parentage:") + .grp').text
print(origin)
print(class_)
print(bloom)
print(parentage)
Bred by Arai (Japan, before 2009).
Floribunda.
Light pink and white, yellow stamens. Single (4-8 petals), cluster-flowered bloom form. Blooms in flushes throughout the season.
If you know the parentage of this rose, or other details, please contact us.