提问者:小点点

在Scrapy中,如何从不同行的列表中提取每个元素?


我是新的scrapy和试图从以下给定的URL刮一些数据。

from scrapy.spiders import Spider
from ..items import QtItem

class QuoteSpider(Spider):
    name = 'acres'
    start_urls = ['https://housing.com/in/buy/searches/AB1AC0M1P4hkd3fsj8fd9kanb']

    def parse(self, response):
        items = QtItem()

        all_div_names = response.xpath('//article')

        for bks in all_div_names:
            name = bks.xpath('//span[@class="css-fwbz9r"]/text()').get()
            price = bks.xpath('//h2[@class="css-yr18fa"]/text()').get()
            sqft = bks.xpath('//div[@class="css-1ty8tu4"]/text()').get()
            bhk = bks.xpath('//a[@class="css-163eyf0"]/text()').get()

        yield {
            'ttname': name,
            'ttprice': price,
            'ttsqft': sqft,
            'ttbhk': bhk
        }

csv文件中的输出如下:

{'ttname': ['Jodhpur Village, Jodhpur, Ahmedabad', 'Shapers Swastik Platinum, Narolgam, Ahmedabad', 'Gayatri
 Maitri Lake View, Zundal, Ahmedabad', 'Maruti Zenobia, Bodakdev, Ahmedabad', 'arjun greens, Naranpura, Ahme
dabad', 'Aariyana Lakeside, Shilaj, Ahmedabad', 'Ganesh Malabar County II, Chharodi, Ahmedabad', 'Jodhpur Vi
llage, Jodhpur, Ahmedabad', 'Ratna Paradise, Khoraj, Ahmedabad', 'Teraiya Adhisthan Shriya, Sola Village, Ah
medabad', 'Thaltej, Ahmedabad', 'Binori Solitaire, Bopal, Ahmedabad', 'Arvind & Safal Parishkaar Apartments,
 Amraiwadi, Ahmedabad', 'Pacifica La Habitat, Thaltej, Ahmedabad', 'Siddhivinayak Omkar Lotus, Chandkheda, A
hmedabad', 'Sthapatya Pratham Lakeview, Science City, Ahmedabad', 'Orchid Whitefield , Prahlad Nagar, Ahmeda
bad', 'Maple Tree, Memnagar, Ahmedabad', 'VISHWAS CITY , Gota, Ahmedabad', 'Gala Aria, Bopal, Ahmedabad'], '
ttprice': ['₹95.0 L', '₹17.0 L', '₹28.75 L', '₹1.35 Cr', '₹1.0 Cr', '₹3.5 Cr', '₹43.0 L', '₹47.5 L', '₹1.55
Cr', '₹1.02 Cr', '₹65.0 L', '₹1.1 Cr', '₹42.0 L', '₹1.0 Cr', '₹74.0 L', '₹1.09 Cr', '₹78.0 L', '₹1.55 Cr', '
₹30.0 L', '₹1.18 Cr'], 'ttsqft': ['1750 sq.ft', '₹5.43 K/sq.ft', '870 sq.ft', '₹1.95 K/sq.ft', '1125 sq.ft',
 '₹2.56 K/sq.ft', '1755 sq.ft', '₹7.69 K/sq.ft', '1812 sq.ft', '₹5.52 K/sq.ft', '4275 sq.ft', '₹8.19 K/sq.ft
', '1170 sq.ft', '₹3.67 K/sq.ft', '1200 sq.ft', '₹3.96 K/sq.ft', '3340 sq.ft', '₹4.64 K/sq.ft', '2040 sq.ft'
, '₹5.00 K/sq.ft', '1710 sq.ft', '₹3.80 K/sq.ft', '2214 sq.ft', '₹4.97 K/sq.ft', '1108 sq.ft', '₹3.79 K/sq.f
t', '1961 sq.ft', '₹5.10 K/sq.ft', '1960 sq.ft', '₹3.77 K/sq.ft', '1890 sq.ft', '₹5.77 K/sq.ft', '1700 sq.ft
', '₹4.59 K/sq.ft', '2400 sq.ft', '₹6.46 K/sq.ft', '954 sq.ft', '₹3.14 K/sq.ft', '2115 sq.ft', '₹5.58 K/sq.f
t'], 'ttbhk': ['3 BHK Apartment', '2 BHK Apartment', '2 BHK Apartment', '3 BHK Apartment', '3 BHK Apartment'
, '4 BHK Apartment', '2 BHK Apartment', '2 BHK Apartment', '4 BHK Apartment', '3 BHK Apartment', '3 BHK Apar
tment', '3 BHK Apartment', '2 BHK Apartment', '3 BHK Apartment', '3 BHK Apartment', '3 BHK Apartment', '3 BH
K Apartment', '3 BHK Apartment', '2 BHK Apartment', '3 BHK Apartment']}

我面临的问题是,每当我试图在。csv中获得输出时,上面的列表都是整行打印的,而我需要提取不同行中的每个元素。

我需要以下格式的输出,我已经手动输入了输出供您参考。

https://i.stack.imgur.com/3urdn.png


共1个答案

匿名用户

问题出在XPath表达式中。 //是对后代或自轴的引用,它最终选择了自轴中的所有数据,这是您不想要的。

您可以在//之前使用.将表达式仅应用于上下文中的项,或者使用替代表达式,例如构建完整的XPath.
这两个示例将返回相同的结果:

name = bks.xpath('div/div/span[@class="css-fwbz9r"]/text()').get()

name = bks.xpath('.//span[@class="css-fwbz9r"]/text()').get()

这将给出您想要的结果:

from scrapy.spiders import Spider 
from ..items import QtItem

class QuoteSpider(Spider): 
    name = 'acres' 
    start_urls = ['https://housing.com/in/buy/searches/AB1AC0M1P4hkd3fsj8fd9kanb']

    def parse(self, response):
        items = QtItem()

        all_div_names = response.xpath('//article')

        for bks in all_div_names:
            name = bks.xpath('.//span[@class="css-fwbz9r"]/text()').get()
            price = bks.xpath('.//h2[@class="css-yr18fa"]/text()').get()
            sqft = bks.xpath('.//div[@class="css-1ty8tu4"]/text()').get()
            bhk = bks.xpath('.//a[@class="css-163eyf0"]/text()').get()
            yield {
                'ttname': name,
                'ttprice': price,
                'ttsqft': sqft,
                'ttbhk': bhk
            }

>

  • 您没有填充您的QtItem,您只是实例化它并保持不变。 你可能想要这样的东西:

    yield QtItem(
          ttname=name,
          ttprice=price,
          ttsqft=sqft,
          ttbhk=bhk
    )
    

    我没有在代码中更改它,因为我不知道QTItem是如何构建的,所以我无法确定。