当前位置: 移动技术网 > IT编程>脚本编程>Python > _markupbase.py if not match: UnboundLocalError: local variable 'match' referenced before assignment,分析Python 库 html.parser 中存在的一个解析BUG

_markupbase.py if not match: UnboundLocalError: local variable 'match' referenced before assignment,分析Python 库 html.parser 中存在的一个解析BUG

2019年02月21日  | 移动技术网IT编程  | 我要评论

商业价值,内江师范学院录取查询,诸神学徒txt下载

bug触发时的完整报错内容(本地无关路径用已经用 **** 隐去):

**************\lib\site-packages\bs4\builder\_htmlparser.py:78: userwarning: unknown status keyword 'end ' in marked section
  warnings.warn(msg)
traceback (most recent call last):
  file "**************/test.py", line 5, in <module>
    bs = beautifulsoup(html, 'html.parser')
  file "**************\lib\site-packages\bs4\__init__.py", line 281, in __init__
    self._feed()
  file "**************\lib\site-packages\bs4\__init__.py", line 342, in _feed
    self.builder.feed(self.markup)
  file "**************\lib\site-packages\bs4\builder\_htmlparser.py", line 247, in feed
    parser.feed(markup)
  file "d:\program files\python37\lib\html\parser.py", line 111, in feed
    self.goahead(0)
  file "d:\program files\python37\lib\html\parser.py", line 179, in goahead
    k = self.parse_html_declaration(i)
  file "d:\program files\python37\lib\html\parser.py", line 264, in parse_html_declaration
    return self.parse_marked_section(i)
  file "d:\program files\python37\lib\_markupbase.py", line 160, in parse_marked_section
    if not match:
unboundlocalerror: local variable 'match' referenced before assignment

在解析html时,标签开始部分使用形如 <!-[if ie eq 9]> 的浏览器判断标识符,结束时结束标签<![end if]->(正确的开始和结束标签应该为<!--[if ie 9]><![endif]-->)无法正常匹配关闭即可触发。
触发bug的示例代码如下:

from bs4 import beautifulsoup

html = """
<!-[if ie eq 9]>
    <a href="https://www.shwww.net/">https://www.shwww.net/</a>
<![end if]->
"""

bs = beautifulsoup(html, 'html.parser')

在 python 3.7.0 版本中,触发bug部分的代码存在于 \lib\_markupbase.py 中的 146 行的 parse_marked_section 方法,该方法代码如下:
https://github.com/python/cpython/blob/bb9ddee3d4e293f0717f8c167afdf5749ebf843d/lib/_markupbase.py#l160

    def parse_marked_section(self, i, report=1):
        rawdata= self.rawdata
        assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
        sectname, j = self._scan_name( i+3, i )
        if j < 0:
            return j
        if sectname in {"temp", "cdata", "ignore", "include", "rcdata"}:
            # look for standard ]]> ending
            match= _markedsectionclose.search(rawdata, i+3)
        elif sectname in {"if", "else", "endif"}:
            # look for ms office ]> ending
            match= _msmarkedsectionclose.search(rawdata, i+3)
        else:
            self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
        if not match:
            return -1
        if report:
            j = match.start(0)
            self.unknown_decl(rawdata[i+3: j])
        return match.end(0)

由于错误的html代码未正确关闭,使得流程判断既没有进入 if sectname in {"temp", "cdata", "ignore", "include", "rcdata"}:
elif sectname in {"if", "else", "endif"}: ,而是报出一个错误 userwarning: unknown status keyword 'end ' in marked section warnings.warn(msg) 后执行到 if not match ,而此时 match 未申明,故而触发错误。

此bug存在于多个python版本中,修复方法,在 if sectname in {"temp", "cdata", "ignore", "include", "rcdata"}: 之前预定义一个match变量即可:
https://github.com/python/cpython/blob/bb9ddee3d4e293f0717f8c167afdf5749ebf843d/lib/_markupbase.py#l152

    def parse_marked_section(self, i, report=1):
        rawdata= self.rawdata
        assert rawdata[i:i+3] == '<![', "unexpected call to parse_marked_section()"
        sectname, j = self._scan_name( i+3, i )
        if j < 0:
            return j
        match = none
        if sectname in {"temp", "cdata", "ignore", "include", "rcdata"}:
            # look for standard ]]> ending
            match= _markedsectionclose.search(rawdata, i+3)
        elif sectname in {"if", "else", "endif"}:
            # look for ms office ]> ending
            match= _msmarkedsectionclose.search(rawdata, i+3)
        else:
            self.error('unknown status keyword %r in marked section' % rawdata[i+3:j])
        if not match:
            return -1
        if report:
            j = match.start(0)
            self.unknown_decl(rawdata[i+3: j])
        return match.end(0)

如对本文有疑问,请在下面进行留言讨论,广大热心网友会与你互动!! 点击进行留言回复

相关文章:

验证码:
移动技术网