提问者:小点点

如何使用Python3从url只读html?


下面是给定的html

    <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">

    <div class="table-responsive grid_class">
    <table class="table lightgallery">
        <thead>
        <tr class="active">
            <th class="col-md-9">Col A</th>
            <th class="col-md-2">Col B</th>
        </tr>
        </thead>

        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>
       
        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>   
        
    </table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>


如何在Python中只获取html而不获取库?

我尝试了urllib库和request库,但都不工作

如有任何帮助,我们将不胜感激


共1个答案

匿名用户

只是为了阅读HTML,你可以使用BeautfulSoup

#python -m pip install beautifulsoup4 lxml

from bs4 import BeautifulSoup

html = '''
 <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">

    <div class="table-responsive grid_class">
    <table class="table lightgallery">
        <thead>
        <tr class="active">
            <th class="col-md-9">Col A</th>
            <th class="col-md-2">Col B</th>
        </tr>
        </thead>

        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>
       
        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>   
        
    </table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>
'''

soup = BeautifulSoup(html, 'lxml')

您可以使用访问变量和标记。查找[_all]。选择,例如。

ths = soup.find_all('th')
print([col.text for col in ths])
# ['Col A', 'Col B']

相关问题