当前位置：移动技术网 > IT编程>脚本编程>Python > Python：使用Scrapy框架的ImagesPipeline下载图片如何保持原图片名称呢？

Python：使用Scrapy框架的ImagesPipeline下载图片如何保持原图片名称呢？

2019年04月19日 | 移动技术网IT编程 | 我要评论

隔山有眼高清,中天新闻台直播,3d免费预测neiba

默认情况下，使用imagepipeline下载图片的时候，图片名称是以图片url的sha1值进行保存的。
如：
图片url:https://www.example.com/image.jpg
sha1结果：3afec3b4765f8f0a07b78f98c07b83f013567a0a
则图片名称：3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg
但是，我想要以原来的图片名称进行保存，比如上面例子中的图片保存到本地的话，图片名称就应该是：image.jpg

stackoverflow上说是可以重写image_key函数，不过我试了下，结果发现不行，重写的image_key函数没被调用。

后面查看了下imagepipeline的：

class imagespipeline(filespipeline):
    """abstract pipeline that implement the image thumbnail generation logic

    """

    media_name = 'image'
    min_width = 0
    min_height = 0
    thumbs = {}
    default_images_urls_field = 'image_urls'
    default_images_result_field = 'images'

    @classmethod
    def from_settings(cls, settings):
        cls.min_width = settings.getint('images_min_width', 0)
        cls.min_height = settings.getint('images_min_height', 0)
        cls.expires = settings.getint('images_expires', 90)
        cls.thumbs = settings.get('images_thumbs', {})
        s3store = cls.store_schemes['s3']
        s3store.aws_access_key_id = settings['aws_access_key_id']
        s3store.aws_secret_access_key = settings['aws_secret_access_key']

        cls.images_urls_field = settings.get('images_urls_field', cls.default_images_urls_field)
        cls.images_result_field = settings.get('images_result_field', cls.default_images_result_field)
        store_uri = settings['images_store']
        return cls(store_uri)

    def file_downloaded(self, response, request, info):
        return self.image_downloaded(response, request, info)

    def image_downloaded(self, response, request, info):
        checksum = none
        for path, image, buf in self.get_images(response, request, info):
            if checksum is none:
                buf.seek(0)
                checksum = md5sum(buf)
            width, height = image.size
            self.store.persist_file(
                path, buf, info,
                meta={'width': width, 'height': height},
                headers={'content-type': 'image/jpeg'})
        return checksum

    def get_images(self, response, request, info):
        path = self.file_path(request, response=response, info=info)
        orig_image = image.open(stringio(response.body))

        width, height = orig_image.size
        if width

其中，有这么一句话：
imagespipeline.image_key(url) and file_key(url) methods are deprecated, please use file_path(request, response=none, info=none) instead
也就是说，在最新版本的scrapy中（0.22.2），使用file_path代替image_key函数。

因此，我在自定义的imagepipeline类中，重写了file_path函数，代码如下：

__author__ = 'fly'
#coding:utf-8
from scrapy.contrib.pipeline.images import imagespipeline
from scrapy.http import request
from scrapy.exceptions import dropitem

class myimagespipeline(imagespipeline):
    def file_path(self, request, response=none, info=none):
        image_guid = request.url.split('/')[-1]
        return 'full/%s' % (image_guid)

    def get_media_requests(self, item, info):
        for image_url in item['image_urls']:
            yield request(image_url)

    def item_completed(self, results, item, info):
        image_paths = [x['path'] for ok, x in results if ok]
        if not image_paths:
            raise dropitem("item contains no images")
        return item

以上代码主要返回原图片名称+图片后缀。

作者：曾是土木人（https://blog.csdn.net/php_fly）

原文地址：https://blog.csdn.net/php_fly/article/details/19688595

您可能感兴趣的文章:

如对本文有疑问，请在下面进行留言讨论，广大热心网友会与你互动！！点击进行留言回复

python如何查看网页代码

用python查看网页代码的方法：1、使用“import”导入requests包import requests2、使用requests包的get()函数通过网页... [阅读全文]
Python如何用wx模块创建文本编辑器

用python的wx模块创建文本编辑器的方法：1、设置按钮的位置import wxapp = wx.app()win = wx.frame(none,title... [阅读全文]
python如何保存文本文件

python保存文本文件的方法：使用python内置的open()类可以打开文本文件，向文件里面写入数据可以用write()函数，写完之后，使用close()函... [阅读全文]
python如何编写win程序

python可以编写win程序。win程序的格式是exe，下面我们就来看一下使用python编写exe程序的方法。编写好python程序后py2exe模块即可将... [阅读全文]
Python替换NumPy数组中大于某个值的所有元素实例

我有一个2d(二维) numpy数组，并希望用255.0替换大于或等于阈值t的所有值。据我所知，最基础的方法是：shape = arr.shaperesult ... [阅读全文]
使用Numpy对特征中的异常值进行替换及条件替换方式

原始数据为excel文件，由传感器获得，通过pyhton xlrd模块读入，读入后为数组形式，由于其存在部分异常值和缺失值，所以便利用numpy对其中的异常值进... [阅读全文]
Python 实现将numpy中的nan和inf,nan替换成对应的均值

nan：not a numberinf：infinity;正无穷numpy中的nan和inf都是float类型t!=t 返回bool类型的数组(矩阵)np.co... [阅读全文]
给ubuntu18安装python3.7的详细教程

参考文章准备工作安装工具sudo apt updatesudo apt upgradesudo apt install gccsudo apt install ... [阅读全文]
python爬虫把url链接编码成gbk2312格式过程解析

1. 问题　　抓取某个网站，发现请求参数是乱码格式，这是点击 textview，发现请求参数如下图所示3. 那么=%b9%fa%ce%f1%d4%ba%b7%a... [阅读全文]
pyecharts在数据可视化中的应用详解

使用pyecharts进行数据可视化安装 pip install pyecharts也可以在pycharm软件里进行下载pyecharts库包。下载成功后进行查... [阅读全文]

网友评论


验证码：

Python：使用Scrapy框架的ImagesPipeline下载图片如何保持原图片名称呢？

2019年04月19日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论