我试图从这个结果页面的“Show Map”按钮中刮出纬度和经度数据:https://www.psychologytoday.com/us/therapists/60148/374863?sid=5d01e84909804&ref=2&tr=resultsName:https://www.psychologytoday.com/us/therapists/60148/374863?sid=5d01e84909804&ref=2&tr=resultsName
这是我目前尝试的方法=
button = soup.find('button', {"data-event-label":'Address_MapButton'})
print(button['data-map-lat'])
完整代码:从bs4导入请求从bs4导入BeautifulSoup.元素导入标记
zipcode = int(input("Zipcode: "))
url = 'https://www.psychologytoday.com/us/therapists/{0}'.format(zipcode)
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
result = soup.find(class_='results-column')
addressArray = []
for tag in result:
if isinstance(tag,Tag):
_class = tag.get("class")
if _class is None or _class is not None and "row" not in _class:
continue
link = (tag.find(class_='result-actions')).find('a',href=True)
_href = link['href']
address_link = requests.get(_href, headers=headers)
soup1 = BeautifulSoup(address_link.text, 'html.parser')
address = (soup1.find(class_='address')).find(class_="location-address-phone")
button = soup.find('button', {"data-event-label":'Address_MapButton'})
print(button['data-map-lat'])
print(addressArray)
我得到的回报是零。我想看看纬度坐标。
您希望提取适当的属性。我使用类btn-location
上的第一个匹配来获取包含属性的元素。
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://www.psychologytoday.com/us/therapists/60148/374863?sid=5d01e84909804&ref=2&tr=ResultsName', headers = {'User-Agent': 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
elem = soup.select_one('.btn-location[data-map-lat]')
lat = elem['data-map-lat']
lon = elem['data-map-lon']
print(lat, lon)
OP的版本:
soup2 = bs(address_link.content, 'lxml')
elem = soup2.select_one('.btn-location[data-map-lat]')
lat = elem['data-map-lat']
lon = elem['data-map-lon']
latArray.append(lat)