当前位置：移动技术网 > IT编程>网页制作>HTML > （1）爬虫笔记备份

（1）爬虫笔记备份

2020年07月07日 | 移动技术网IT编程 | 我要评论

'''
第一天
import requests
from urllib.request import urlopen
url = 'http://quote.eastmoney.com/us/BIDU.html?from=BaiduAladdin'
response = urlopen(url)
info = response.read()
print(info.decode())
print(response.info())
'''



'''
动态UA

pip install fake_useragent
from fake_useragent import UserAgent
ua=UserAgent()
print(ua.chrome)

from urllib.request import urlopen
from urllib.request import Request
from random import choice
url = 'http://quote.eastmoney.com/us/BIDU.html?from=BaiduAladdin'
user_agents=['Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50',
             'Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv:2.0.1)Gecko/20100101Firefox/4.0.1',
             'Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11']
print(choice(user_agents))
headers={
    'User-Agent':choice(user_agents)
}
request=Request(url,headers=headers)
response=urlopen(request)
info=response.read()
print(info.decode())
'''





'''
搜索中文转码1

from urllib.request import urlopen
from urllib.request import Request
from urllib.parse import quote
print(quote('历史'))
url = 'https://www.baidu.com/s?wd={}'.format(quote('历史'))
headers={
    'User-Agent':'Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50'
}
print(url)
request = Request(url,headers=headers)
response= urlopen(request)
print(response.read().decode())
'''

'''
搜索中文转码2

from urllib.request import urlopen
from urllib.request import Request
from urllib.parse import urlencode
arg={
'wd':'历史',
'ie':'utf-8'
}
print(urlencode(arg))
url = 'https://www.baidu.com/s?{}'.format(urlencode(arg))
headers={
    'User-Agent':'Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50'
}
print(url)
request = Request(url,headers=headers)
response= urlopen(request)
print(response.read().decode())

'''


'''
爬贴吧

‘’‘
'''
from urllib.request import urlopen
from urllib.request import Request
from urllib.parse import urlencode
from random import choice
def  get_html(url):
    user_agents = ['Mozilla/5.0(Windows;U;WindowsNT6.1;en-us)AppleWebKit/534.50(KHTML,likeGecko)Version/5.1Safari/534.50',
             'Mozilla/5.0(Macintosh;IntelMacOSX10.6;rv:2.0.1)Gecko/20100101Firefox/4.0.1',
             'Opera/9.80(Macintosh;IntelMacOSX10.6.8;U;en)Presto/2.8.131Version/11.11']
    #print(choice(user_agents))
    headers = {
        'User-Agent': choice(user_agents)
    }
    request=Request(url,headers=headers)
    response=urlopen(request)
    return response.read()

def  save_html(filename,html_bytes):
    with open(filename,'wb') as f:
        f.write(html_bytes)


def main():
    content=input('download')
    num = input('num')
    base_url='https://tieba.baidu.com/f?ie=utf-8&{}'
    for pn in range(int(num)):
        args={
            'pn':pn*50,
            'kw':content
        }
        args=urlencode(args)

        print(base_url.format(args))
        #url=base_url.format(args)
        html_bytes = get_html(base_url.format(args))
        filename = '第'+str(pn+1)+'页.html'
        print('正在下载'+filename)
        save_html(filename,html_bytes)

if __name__ == '__main__':
    main()

本文地址：https://blog.csdn.net/qq_42830971/article/details/107154486

您可能感兴趣的文章:

如对本文有疑问，点击进行留言回复！！

Dom Api之节点操作

Dom中,标签元素、属性、标签文本、注释等都可以属于节点,分别称为元素节点、属性节点、文本节点、注释节点,下面看... [阅读全文]
postman常见断言方法

转自：postman的常见断言**postman常见断言方法介绍：**Setting an environmen... [阅读全文]
Promise 的基本用法

Promise的基本用法1.Promise的含义含义：Promise对象是异步编程的一种解决方案。比传统的解... [阅读全文]
html01

HTML_01html常用标签一点补充html常用标签段落标签<p>段落内容</p>段落... [阅读全文]
前端学习笔记（五）HTML+CSS静态页面实战案例：幸福西饼首页和百度首页

第一次跟着教程做了幸福西饼实战静态页面项目。第一步：把教程中给的common.css 改成规范的格式，把自己看不... [阅读全文]
CSS3笔记整理

一、CSS入门1.什么是CSS【人靠衣装马靠鞍】层叠样式表(Cascading Style Sheets，缩写为... [阅读全文]
 5 节流和防抖概念，手写，区别

5 节流和防抖参考多个博客写的，如果有侵权，请联系我https://www.jianshu.com/p/c8b8... [阅读全文]
node在指定时间执行代码

在项目中我们有时需要在指定的时间来执行一定的时间，比如下午2点提示一下安装npm install node-sc... [阅读全文]
用css写一个有趣的奥运五环~。

用css实现奥运五环样式，并且于页面居中显示，不随页面滚动条而移动，一直处于居中位置。html代码部分就一个di... [阅读全文]
Asp.net上传多个文件【随意添加和删除个数】

项目中问题手记，网上搜集并整理正确首先上前端页面运行成功是这的前端代码<%@ Page Language=... [阅读全文]

网友评论


验证码：

（1）爬虫笔记备份

2020年07月07日 | 移动技术网IT编程 | 我要评论

您可能感兴趣的文章:

相关文章:

网友评论