隔山有眼高清,中天新闻台直播,3d免费预测neiba
默认情况下,使用imagepipeline下载图片的时候,图片名称是以图片url的sha1值进行保存的。https://www.example.com/image.jpg
3afec3b4765f8f0a07b78f98c07b83f013567a0a
3afec3b4765f8f0a07b78f98c07b83f013567a0a.jpg
image.jpg
stackoverflow上说是可以重写image_key函数,不过我试了下,结果发现不行,重写的image_key函数没被调用。
后面查看了下imagepipeline的:
class imagespipeline(filespipeline): """abstract pipeline that implement the image thumbnail generation logic """ media_name = 'image' min_width = 0 min_height = 0 thumbs = {} default_images_urls_field = 'image_urls' default_images_result_field = 'images' @classmethod def from_settings(cls, settings): cls.min_width = settings.getint('images_min_width', 0) cls.min_height = settings.getint('images_min_height', 0) cls.expires = settings.getint('images_expires', 90) cls.thumbs = settings.get('images_thumbs', {}) s3store = cls.store_schemes['s3'] s3store.aws_access_key_id = settings['aws_access_key_id'] s3store.aws_secret_access_key = settings['aws_secret_access_key'] cls.images_urls_field = settings.get('images_urls_field', cls.default_images_urls_field) cls.images_result_field = settings.get('images_result_field', cls.default_images_result_field) store_uri = settings['images_store'] return cls(store_uri) def file_downloaded(self, response, request, info): return self.image_downloaded(response, request, info) def image_downloaded(self, response, request, info): checksum = none for path, image, buf in self.get_images(response, request, info): if checksum is none: buf.seek(0) checksum = md5sum(buf) width, height = image.size self.store.persist_file( path, buf, info, meta={'width': width, 'height': height}, headers={'content-type': 'image/jpeg'}) return checksum def get_images(self, response, request, info): path = self.file_path(request, response=response, info=info) orig_image = image.open(stringio(response.body)) width, height = orig_image.size if width其中,有这么一句话:
因此,我在自定义的imagepipeline类中,重写了file_path函数,代码如下:
__author__ = 'fly' #coding:utf-8 from scrapy.contrib.pipeline.images import imagespipeline from scrapy.http import request from scrapy.exceptions import dropitem class myimagespipeline(imagespipeline): def file_path(self, request, response=none, info=none): image_guid = request.url.split('/')[-1] return 'full/%s' % (image_guid) def get_media_requests(self, item, info): for image_url in item['image_urls']: yield request(image_url) def item_completed(self, results, item, info): image_paths = [x['path'] for ok, x in results if ok] if not image_paths: raise dropitem("item contains no images") return item以上代码主要返回原图片名称+图片后缀。
作者:曾是土木人(https://blog.csdn.net/php_fly)
原文地址:https://blog.csdn.net/php_fly/article/details/19688595
如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复
Python 实现将numpy中的nan和inf,nan替换成对应的均值
python爬虫把url链接编码成gbk2312格式过程解析
网友评论