python - Why does doctest fail with string containing UTF-8 chars? -
i'm reading plain ascii html file (charset=utf-8) containing string:
<title>what’s new?</title>
this string unusable came function (to used after string has been hexlify'd). then, included docstring test:
def to_be_replaced(reprstring): """ :reprstring: repr(string) -- won't work otherwise >>> s = "<title>what’s new?</title>" >>> r = repr(s) >>> print r <title>what\xe2\x80\x99s new?</title> >>> to_be_replaced(r) set(['\xe2\x80\x99']) """ regex = re.compile('([\x7f-\xff]{2,})') return set(re.findall(regex, reprstring))
unfortunately, test fails:
>"e:\python27\pythonw.exe" -u "test_to_be_replaced.py" replaced: set(['\xe2\x80\x99']) ********************************************************************** file "test_to_be_replaced.py", line 14, in __main__.to_be_replaced failed example: print r expected: <title>what’s new?</title> got: '<title>what\xe2\x80\x99s new?</title>' ********************************************************************** file "test_to_be_replaced.py", line 16, in __main__.to_be_replaced failed example: to_be_replaced(r) expected: set(['’']) got: set([]) ********************************************************************** 1 items had failures: 2 of 4 in __main__.to_be_replaced ***test failed*** 2 failures. >exit code: 0
the output above running:
if __name__ == '__main__': s = '<title>what\xe2\x80\x99s new?</title>' print 'to replaced:', to_be_replaced(s) # works intended import doctest doctest.testmod()
what can in order make test pass?
using python 2.7.10 on windows 7 x32.
Comments
Post a Comment