中国军事网址大全,宣霞金,完美小强官网
博问上有人不会,我写了一下
绝对不要加多线程多线程进去
import re import requests from lxml.html import etree url = 'http://www.liyang.gov.cn/default.php?mod=article&fid=163250&s99679207_start=0' rp = requests.get(url) re_html = etree.html(rp.text) url_xpath = '//*[@id="s99679207_content"]/table/tbody/tr/td/span[1]/span/a/@href' title_xpath = '//*[@id="s99679207_content"]/table/tbody/tr/td/span[1]/span/a/text()' url_list = re_html.xpath(url_xpath) title_list = re_html.xpath(title_xpath) title_list = title_list[::-1] data_url_list = [] for url_end in url_list: new_url = f'http://www.liyang.gov.cn/{url_end}' print(new_url) rp_1 = requests.get(new_url) print(rp_1.text) try: re_1_html = etree.html(rp_1.text) data_url_xpth = '//tbody/tr[1]/td[2]/a' data_url = re_1_html.xpath(data_url_xpth)[0] except: data_list = re.findall('<a href="(.*?)" target="_blank">', rp_1.text) data_url = data_list[0] print(data_url) data_url = f'http://www.liyang.gov.cn/{data_url}' re = requests.get(data_url) data = re.content with open(f'{title_list.pop()}.pdf', 'wb') as fw: fw.write(data)
如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复
新手学习Python2和Python3中print不同的用法
Python基于os.environ从windows获取环境变量
网友评论