提问者:小点点

使用Python/BeautifulSoup从多个DIV+DIV样式中提取文本


我试过很多种方法来解决这个问题,但都找不到答案

我有以下HTML:

<section id="content4" class="tab-content">
                                                    <p>
    <div class="Text_Title">Product 1</div>
    <div style="display: inline-block;">Red Ball<div></p>
                                                    <p>
    <div class="Text_Title">Product 2</div>
    <div style="display: inline-block;">Green Ball</div></p>
                                                    <p>
    <div class="Text_Title">Product 3</div>
    <div style="display: inline-block;">Yellow Ball</div></p>
    

我试图从div=text_titlestyle=display:inline-block;中提取文本

我试图获取的输出:

Product 1 - Red Ball
Product 2 - Green Ball
Product 3 - Yellow Ball

共1个答案

匿名用户

使用findAll提取匹配给定条件的标记对象列表,然后使用zip并行迭代显示可迭代对象。

from bs4 import BeautifulSoup

input_ = """<section id="content4" class="tab-content">
                                                    <p>
    <div class="Text_Title">Product 1</div>
    <div style="display: inline-block;">Red Ball<div></p>
                                                    <p>
    <div class="Text_Title">Product 2</div>
    <div style="display: inline-block;">Green Ball</div></p>
                                                    <p>
    <div class="Text_Title">Product 3</div>
    <div style="display: inline-block;">Yellow Ball</div></p>"""

soup = BeautifulSoup(input_, "html.parser")

for x, y in zip(soup.findAll("div", attrs={"class": "Text_Title"}),
                soup.findAll("div", attrs={"style": "display: inline-block;"})):
    print(x.text, "-", y.text)
Product 1 - Red Ball
Product 2 - Green Ball
Product 3 - Yellow Ball