提取标题和强标签与美丽汤

提问者：小点点

提取标题和强标签与美丽汤

我希望从< code>div内的标题和< code >内的文本中提取文本字符串

我可以用＜code＞汤得到标题。h1＜/code＞，但我想获得特定于div＜code＞中的＜code＞h1＜/code＞

超文本标记语言：

所以我想得到这是标题和（还有一点！）有人能帮忙吗？

谢谢




             共2个答案


                        

                
                    匿名用户

                




                
					
你可以使用查找 attrs 参数，例如：
soup.find('div', attrs={'class': 'site-content'}).h1
编辑:仅获取直接文本
for div in soup.findAll('div', attrs={'class': 'site-content'}):
    print ''.join([x for x in div.h1.contents \
                                 if isinstance(x, bs4.element.NavigableString)])
使用lxml和xpath，生活更轻松：
>>> from lxml import html
>>> root = html.parse('x.html')
>>> print root.xpath('//div[@class="site-content"]/h1/text()')
['Here is the title']
>>> print root.xpath('//div[@class="site-content"]/h1//text()')
['Here is the title', '( And a bit more! )']
>>> print root.xpath('//div[@class="site-content"]/h1/strong/text()')
['( And a bit more! )']
				

                
                
            

            
                        

                
                    匿名用户

                




                
					
使用BeautifulSoup从div内的标题和标记内的文本中提取文本字符串的代码。
>>> from bs4 import BeautifulSoup
>>> data = """<div class="site-content"><h1>Here is the title<strong>( And a bit more! )</strong></h1>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> reqText = soup.find('h1').text
>>> print(reqText)
'Here is the title( And a bit more! )'
>>> reqText1 = soup.find('strong').text
>>> print(reqText1)
'( And a bit more! )'
或者
>>> data = """<div class="site-content"><h1>Here is the title<strong>( And a bit more! )</strong></h1>"""
>>> soup = BeautifulSoup(data, "html.parser")
>>> soup.find('strong').text
'( And a bit more! )'
>>> reqText1 = soup.find('h1')
>>> for i in reqText1:
...    p_tag = soup.h1
...    s_tag = soup.strong
...    s_tag.decompose()
...    p_tag.get_text()
...
'Here is the title'


		      
                相关问题
                

																                
					
										   Spring Boot Thymeleaf白标签错误页面
										   如何从oracle中的列中提取子字符串？
										   如果锁可以动态获取，强制锁排序并不能保证防止死锁。这是什么意思？
										   JavaFX：如何让我的标题和菜单相互居中[带MVCE]
										   标签未显示在鼠标事件JavaFx上
										   python selenium在div内通过标签查找元素（没有名称、类、id或文本）
										   多个iframe标签Selenium网络驱动程序
										   CSS-超文本标记语言表，标题中带有动态宽度和省略号
										   标签添加动态时，未设置宽度以匹配父级
										   如何适应标签宽度屏幕在android
										   Android TabLayout--标签向左折叠
										   如何使用android.support.design. widget.TabLayout创建标签的自定义布局？
										   Android素材设计点击标签上的事件
										   如何添加标签到TabLayout布局XML文件在Android？
										   将数据帧转换为强类型数据集？
										   Pyspark-通过字符串迭代以提取多个键值对
										   如何强制实现受保护的静态函数
										   克隆#克隆方法不必要的强制转换
										   Java摆动：如何停止不必要的移位标签击键动作
										   Vaadin：强制按钮在视觉上被禁用

提取标题和强标签与美丽汤

共2个答案

相关问题

热门标签

微信关注