下面是给定的html
<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">
<div class="table-responsive grid_class">
<table class="table lightgallery">
<thead>
<tr class="active">
<th class="col-md-9">Col A</th>
<th class="col-md-2">Col B</th>
</tr>
</thead>
<tr>
<td class="">
<span>some text here
</span>
</span>
</span>
</td>
<td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
</tr>
<tr>
<td class="">
<span>some text here
</span>
</span>
</span>
</td>
<td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
</tr>
</table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>
如何在Python中只获取html而不获取库?
我尝试了urllib
库和request
库,但都不工作
如有任何帮助,我们将不胜感激
只是为了阅读HTML,你可以使用BeautfulSoup
#python -m pip install beautifulsoup4 lxml
from bs4 import BeautifulSoup
html = '''
<link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">
<div class="table-responsive grid_class">
<table class="table lightgallery">
<thead>
<tr class="active">
<th class="col-md-9">Col A</th>
<th class="col-md-2">Col B</th>
</tr>
</thead>
<tr>
<td class="">
<span>some text here
</span>
</span>
</span>
</td>
<td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
</tr>
<tr>
<td class="">
<span>some text here
</span>
</span>
</span>
</td>
<td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
</tr>
</table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>
'''
soup = BeautifulSoup(html, 'lxml')
您可以使用访问变量和标记。查找[_all]
或。选择
,例如。
ths = soup.find_all('th')
print([col.text for col in ths])
# ['Col A', 'Col B']