当前位置：移动技术网 > IT编程>脚本编程>Python > 使用scrapy爬取suning

使用scrapy爬取suning

2019年05月12日 | 移动技术网IT编程 | 我要评论

佩雷思,我是大明星20130531,泰安迪厅

# -*- coding: utf-8 -*-
import scrapy
from copy import deepcopy


class suspider(scrapy.spider):
    name = 'su'
    allowed_domains = ['suning.com']
    start_urls = ['http://list.suning.com/?safp=d488778a.error1.0.4786e76351']

    def parse(self, response):
        # 获取大分类列表
        bcate_list = response.xpath("//div[@class='allsortleft']/ul/li")
        for bcate in bcate_list:
            item = {}
            # 获取大分类class的值
            class_name = bcate.xpath("./@class").extract_first()
            # 获取所有大分类的名称
            item["bcate"] = bcate.xpath("./a/span/text()").extract_first()
            # print(item["bcate"])
            # 根据大分类的class定位每个大分类下的所有小分类
            scate_list = response.xpath("//div[@class='{}']/div".format(class_name))
            for scate in scate_list:
                # 小分类的名称
                item["scate"] = scate.xpath("./div[1]/a/@title").extract_first()
                # 获取每个小分类下的所有标签
                tag_list = scate.xpath("./div[2]/a")
                for tag in tag_list:
                    # 每个标签的链接和名称
                    item["tag"] = tag.xpath("./text()").extract_first()
                    item["tag_link"] = "http:" + tag.xpath("./@href").extract_first()
                    # 进入列表页
                    yield scrapy.request(
                        item["tag_link"],
                        callback=self.good_list,
                        meta={"item": deepcopy(item)}
                    )

    def good_list(self, response):
        item = deepcopy(response.meta["item"])
        # 获取当前页的所有商品列表
        li_list = response.xpath("//div[@id='product-wrap']/div/ul/li")
        for li in li_list:
            # 获取商品的图片地址，名称，价格，商品详情页的链接
            item["good_img"] = "http:"+li.xpath(".//div[@class='res-img']/div/a/img/@src").extract_first()
            item["good_name"] = li.xpath(".//div[@class='res-info']/div/a/text()").extract_first()
            item["good_price"] = li.xpath(".//div[@class='res-info']/div/span/text()").extract_first()
            item["good_href"] = li.xpath(".//div[@class='res-info']/div/a/@href").extract_first()
            # 进入商品详情页
            if item["good_href"] != "javascript:void(0);":
                yield scrapy.request(
                    "http:"+item["good_href"],
                    callback=self.good_detail,
                    meta={"item": deepcopy(item)}
                )
        # 翻页
        next_url = response.xpath("//a[@id='nextpage']/@href").extract_first()
        if next_url:
            yield scrapy.request(
                next_url,
                callback=self.good_list,
                meta={"item": response.meta["item"]}
            )

    def good_detail(self, response):
        item = response.meta["item"]
        # 获取当前商品的属性规格：颜色、版本、
        size_list = response.xpath("//div[@id='j-tzm']/dl")
        for size in size_list:
            size_name = size.xpath("./dt/span/text()").extract_first()
            size_value = size.xpath("./dd/ul/li/@title").extract()
            item[size_name] = size_value
        print(item)

您可能感兴趣的文章:

如对本文有疑问，请在下面进行留言讨论，广大热心网友会与你互动！！点击进行留言回复

新手学习Python2和Python3中print不同的用法

在python2和python3中都提供print()方法来打印信息,但两个版本间的print稍微有差异主要体现在以下几个方面：1.python3中print是... [阅读全文]
Python基于os.environ从windows获取环境变量

安装python之后，我们往往面临这样一个问题，在命令行输入“python”，竟然出错，难道是没有安装成功吗？非也，其实是你的系统环境变量没有设置好。今天，小编... [阅读全文]
keras实现调用自己训练的模型,并去掉全连接层

其实很简单from keras.models import load_modelbase_model = load_model('model_resenet.h... [阅读全文]
python中def是做什么的

python使用def开始函数定义，紧接着是函数名，括号内部为函数的参数，内部为函数的具体功能实现代码，如果想要函数有返回值, 在expressions中的逻... [阅读全文]
Python xlwt模块使用代码实例

简介写入excle文档安装：pip3 install xlwt导入：import xlwtxlrd 模块方法写入案例import xlwt# 创建对象，设置编码... [阅读全文]
Keras之自定义损失(loss)函数用法说明

在keras中可以自定义损失函数，在自定义损失函数的过程中需要注意的一点是，损失函数的参数形式，这一点在keras中是固定的，须如下形式：def my_loss... [阅读全文]
Python xlrd模块导入过程及常用操作

简介读取excle文档，支持xls，xlsx格式安装：pip3 install xlrd导入：import xlrdxlrd 模块方法读取excelfile =... [阅读全文]
keras打印loss对权重的导数方式

notes怀疑模型梯度爆炸，想打印模型 loss 对各权重的导数看看。如果如果fit来训练的话，可以用keras.callbacks.tensorboard实现... [阅读全文]
keras 使用Lambda 快速新建层添加多个参数操作

keras许多简单操作，都需要新建一个层，使用lambda可以很好完成需求。# 额外参数def normal_reshape(x, shape): return... [阅读全文]
JAVA及PYTHON质数计算代码对比解析

java 实现class primenumber{public static void main(string[] args) {long start=syst... [阅读全文]

网友评论


验证码：

使用scrapy爬取suning

2019年05月12日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论