使用scrapy下载图像时出现问题

提问者：小点点

使用scrapy下载图像时出现问题

我用python scrapy编写了一个脚本，从一个网站下载一些图片。当我运行我的脚本时，我可以在控制台中看到图像的链接（它们都是.jpg格式）。然而，当我打开下载完成时应该保存图像的文件夹时，我什么也没有看到。我犯错的地方？

这是我的蜘蛛（我正在从Sublime文本编辑器运行）：

import scrapy
from scrapy.crawler import CrawlerProcess

class YifyTorrentSpider(scrapy.Spider):
    name = "yifytorrent"

    start_urls= ['https://www.yify-torrent.org/search/1080p/']

    def parse(self, response):
        for q in response.css("article.img-item .poster-thumb"):
            image = response.urljoin(q.css("::attr(src)").extract_first())
            yield {'':image}

c = CrawlerProcess({
    'USER_AGENT': 'Mozilla/5.0',   
})
c.crawl(YifyTorrentSpider)
c.start()

这是我在settings.py中为要保存的图像定义的内容：

ITEM_PIPELINES = {
    'scrapy.pipelines.images.ImagesPipeline': 1,
}
IMAGES_STORE = "/Desktop/torrentspider/torrentspider/spiders/Images"

为了让事情更清楚：

我希望保存图像的文件夹名为images，我已将其放在项目TorrentSpider下的Spider文件夹中。
images文件夹的实际地址是C:\users\wcs\desktop\torrentspider\torrentspider\spiders。

这并不是要在items.py文件的帮助下成功运行脚本。因此，任何使用items.py文件进行下载的解决方案都不是我想要的。

共1个答案

匿名用户

您正在生成的项目没有遵循Scrapy的文档。正如媒体管道文档中详细介绍的那样，该项目应该有一个名为image_urls的字段。您应该将解析方法更改为与此类似的方法。

def parse(self, response):
    images = []
    for q in response.css("article.img-item .poster-thumb"):
        image = response.urljoin(q.css("::attr(src)").extract_first())
        images.append(image)
    yield {'image_urls': images}

我刚测试了一下，它起作用了。此外，正如Pruthvi Kumar所评论的，IMAGES_STORE应该像

IMAGES_STORE = 'Images'

使用scrapy下载图像时出现问题

共1个答案

相关问题

热门标签

使用scrapy下载图像时出现问题

共1个答案

相关问题

热门标签

微信关注