当前位置：移动技术网 > IT编程>脚本编程>Python > 利用python爬取京东商品评论

利用python爬取京东商品评论

2020年07月22日 | 移动技术网IT编程 | 我要评论

京东评论的爬取和淘宝的差不多，可以参考上两篇文章文章：
利用python分析Ajax爬取淘宝评论
 最新Python爬取淘宝评论（2020年4月）

import time
import re
import requests
import json
import random
import csv



class JdSpider_content():
    def __init__(self, productId, page, name):
        self.name = name #要保存为的文件名称
        self.page = page #页码
        self.productId = productId #商品id
        self.url = "https://club.jd.com/comment/productPageComments.action?"
        self.headers = {"User-Agent": "自己的User-Agent",
                        "referer": "https://item.jd.com/10999284925.html",
                        "Cookie": '自己的cookie'
                        }

    def get_page(self):
       
        params = {
            "productId": self.productId,
            "page": self.page,
            "callback": "fetchJSON_comment98",
            "score": "0",  # 0是正常评价 1是差评 2是中评
            "sortType": "5",
            "pageSize": "10",
            "isShadowSku": "0",
            "rid": "0",
            "fold": "1"
        }
        res = requests.get(self.url, params=params, headers=self.headers)
        try:
            if res.status_code == 200:
                res = requests.get(
                    self.url, params=params, headers=self.headers).text[20:-2]
                res_json = json.loads(res)
                res_str = json.dumps(res_json, indent=4)
                return json.loads(res_str)
        except:
            return None

    def get_content(self, json_data):
        if json_data != None:
            for item in json_data.get("comments"):
                content_data = item.get("content")
                content_time = item.get("creationTime")
                content_name = item.get("nickname")
                type_size = item.get("productSize")
                type_color = item.get("productColor")
                yield {
                    "content_time": content_time,
                    "type_color": type_color,
                    "type_size": type_size,
                    "content_name": content_name,
                    "content_data": content_data,
                }

        else:
            print("该页出错啦！")
            return None
	
    def get_word(self, json_data):
        if json_data != None:
            word_list = re.findall(
                ".*?name.*?: '(.*?)'", str(json_data.get("hotCommentTagStatistics")))
            for i in word_list:
                with open(self.name+"关键词.txt", "a", encoding="utf-8") as file:
                    file.write(i+"\n")
	#将结果保存为txt文本
    def write_txt(self, data):
        with open(self.name+".txt", "a", encoding="utf-8") as file:
            file.write(json.dumps(data, indent=2, ensure_ascii=False))
            file.write("\n")
	#将结果保存为csv
    def write_csv(self, data):
        with open(self.name+".csv", "a", encoding="utf-8-sig", newline='') as file:
            fieldnames = ["content_time", "content_type",
                          "content_name", "content_data"]
            writer = csv.DictWriter(file, fieldnames=fieldnames)
            writer.writerow(data)
	#将结果保存为json格式
    def write_json(self, data):
        with open("taobaocontent.json", "a", encoding="utf-8") as file:
            file.write(json.dumps(data, indent=2, ensure_ascii=False))

    def main(self):
        json_data = self.get_page()
        self.get_content(json_data)
        return self.get_content(json_data)


if __name__ == "__main__":
    ls = []
    for j in range(2):
        print("\n")
        print("现在是第%d页" % (j+1))
        a = JdSpider_content(
            productId="24155385153", page=j+1, name="祺奥")
        if j==0:
            json_data = a.get_page()
            a.get_word(json_data)
        if a.main() != None:
            for i in a.main():
                print(i)
                ls.append(i)
        else:
            pass
        time.sleep(random.randint(15,20)) #防止ip被封，或者用代理池也行。
    a.write_txt(ls)

本文地址：https://blog.csdn.net/m0_46412065/article/details/107468840

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

用python实现学生管理系统

学生管理系统相信大家学各种语言的时候，练习总是会写各种管理系统吧，管理系统主要有对数据的增删查改操作，原理不难，适合作为练手的小程序数据的结构要保存数据就需要数... [阅读全文]
Python按照先后顺序，对列表进行多条件自定义排序

需求：对指定的列表，按照以下顺序排序：①先按照【编号】从小到大进行排序②再按照列表中包含【方案、扩初、施工图、后... [阅读全文]
Python经典入门100题 (21-30题)

Python入门练手，有这100题就够了！ [阅读全文]
python实现LRU算法

LRU算法python实现学习mysql数据库时，了解了一下ib_buffer_pool的存储机制，使用LRU... [阅读全文]
Python学习笔记——主要函数及基本使用（与C的对比）

实时更新中…文章目录实时更新中...1.函数1.1 title()函数-以首字母大写的方式显示每个单词1.2 合... [阅读全文]
线性回归—梯度下降python实现

import numpy as npimport pandas as pd导入数据data=pd.read_cs... [阅读全文]
python中 if语句（分支结构）使用方式

python中if语句有三种使用形式：if单分支结构（if），if双分支结构（if-else），if多分支结构（... [阅读全文]
KNN算法的理解以及Python实现

参考大佬文章https://blog.csdn.net/c406495762/article/details/7... [阅读全文]
Day03_数据类型介绍&Python运算符&IF分支

一. 数据类型数值类型: int, float, complex字符串: str布尔类型: bool只有2个值:... [阅读全文]
10. 说说Python的某些有意思的库（下）

嘿各位，上次说的那些库是不是不过瘾？那是，真正好玩的还没给你看呢！ [阅读全文]

网友评论


验证码：

利用python爬取京东商品评论

2020年07月22日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论