当前位置: 移动技术网 > IT编程>脚本编程>Python > 『爬虫』学习记录

『爬虫』学习记录

2019年06月10日  | 移动技术网IT编程  | 我要评论

鸽子树胶原蛋白,防爆对讲机,留守女人

## 在学习爬虫中遇到很多坑,写出来供道友参考

  • 出现诸如以下错误
    modulenotfounderror: no module named 'js2xml'
    nameerror: name 'js2xml' is not defined
  则可能是库没有导入


  • 在将 str 转换为 json
    jsondecodeerror: extra data: line 1 column 234701 (char 234700)

   则可能是 str 不符合 json 格式

  1. 可以用 start 和 end 标示开头结尾,如 str[start, end] ;

  2. 可以对 str 进行剪切,使用 strip('symbol') 方法,对首尾存在 symbol 的进行剪切

   又或者是存在多重结构,则

 

  one-liner for your problem:

  data = [json.loads(line) for line in open('tweets.json', 'r')]

 

。。。存坑

 

 

过去一段时间后,再次运行 jupyter notebook,出现错误

错误:

'jupyter' 不是内部或外部命令,也不是可运行的程序

原因及解决:环境变量中添加 d:\users\23525\anaconda3\scripts,里面有 jupyter_notebook.exe、pip.exe 等命令

 

然后又出现如下错误:

traceback (most recent call last):
file "c:\programdata\anaconda3\scripts\jupyter-notebook-script.py", line 6, in <module>
from notebook.notebookapp import main
file "c:\programdata\anaconda3\lib\site-packages\notebook\notebookapp.py", line 47, in <module>
from zmq.eventloop import ioloop
file "c:\programdata\anaconda3\lib\site-packages\zmq\__init__.py", line 47, in <module>
from zmq import backend
file "c:\programdata\anaconda3\lib\site-packages\zmq\backend\__init__.py", line 40, in <module>
reraise(*exc_info)
file "c:\programdata\anaconda3\lib\site-packages\zmq\utils\sixcerpt.py", line 34, in reraise
raise value
file "c:\programdata\anaconda3\lib\site-packages\zmq\backend\__init__.py", line 27, in <module>
_ns = select_backend(first)
file "c:\programdata\anaconda3\lib\site-packages\zmq\backend\select.py", line 27, in select_backend
mod = __import__(name, fromlist=public_api)
file "c:\programdata\anaconda3\lib\site-packages\zmq\backend\cython\__init__.py", line 6, in <module>
from . import (constants, error, message, context,
importerror: dll load failed: 找不到指定的模块。

原因:问题都出现在 zmq 文件夹中,搜索答案需要重新安装 zmq

解决:

pip uninstall pyzmq 
pip install pyzmq 

 

在 install 时又出现如下错误:

pip is configured with locations that require tls/ssl, however the ssl module in python is not available. collecting pyzmq

retrying (retry(total=4, connect=none, read=none, redirect=none, status=none)) after connection broken by 'sslerror("can't connect to https url because the ssl module is not available.")': /simple/pyzmq/

retrying (retry(total=3, connect=none, read=none, redirect=none, status=none)) after connection broken by 'sslerror("can't connect to https url because the ssl module is not available.")': /simple/pyzmq/

retrying (retry(total=2, connect=none, read=none, redirect=none, status=none)) after connection broken by 'sslerror("can't connect to https url because the ssl module is not available.")': /simple/pyzmq/

retrying (retry(total=1, connect=none, read=none, redirect=none, status=none)) after connection broken by 'sslerror("can't connect to https url because the ssl module is not available.")': /simple/pyzmq/

retrying (retry(total=0, connect=none, read=none, redirect=none, status=none)) after connection broken by 'sslerror("can't connect to https url because the ssl module is not available.")': /simple/pyzmq/

could not fetch url https://pypi.org/simple/pyzmq/: there was a problem confirming the ssl certificate: httpsconnectionpool(host='pypi.org', port=443): max retries exceeded with url: /simple/pyzmq/ (caused by sslerror("can't connect to https url because the ssl module is not available.")) - skipping

could not find a version that satisfies the requirement pyzmq (from versions: ) no matching distribution found for pyzmq pip is configured with locations that require tls/ssl, however the ssl module in python is not available.

could not fetch url https://pypi.org/simple/pip/: there was a problem confirming the ssl certificate: httpsconnectionpool(host='pypi.org', port=443): max retries exceeded with url: /simple/pip/ (caused by sslerror("can't connect to https url because the ssl module is not available.")) - skipping

 

原因:

我得到了相同的“ssl模块不可用”错误运行anaconda附带的原生点(目前为18.1)。在我的例子中,这是一个系统路径问题,我通过将以下目录添加到我的路径变量来解决:

%miniconda3_dir%;%miniconda3_dir%\library\mingw-w64\bin;%miniconda3_dir%\library\usr\bin;%miniconda3_dir%\library\bin;%miniconda3_dir%\scripts;%miniconda3_dir%\bin;

在哪里,%miniconda3_dir%应该用你的miniconda(或anaconda)安装路径代替。

参考:

 

其实出现一段时间不能运行的程序,重新安装是最简单的操作,但我想要真正得解决问题,让我对世界能多少掌握一点控制权。通过一步步发现问题、解决问题、总结及预防,不正是人类发展的恒在规律吗?希望人类继承和探索之路长明。

 

如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复

相关文章:

验证码:
移动技术网