python - Why does doctest fail with string containing UTF-8 chars? -

- June 15, 2011

i'm reading plain ascii html file (charset=utf-8) containing string:

<title>whatâ€™s new?</title>

this string unusable came function (to used after string has been hexlify'd). then, included docstring test:

def to_be_replaced(reprstring):     """     :reprstring: repr(string) -- won't work otherwise      >>> s = "<title>whatâ€™s new?</title>"     >>> r = repr(s)     >>> print r     <title>what\xe2\x80\x99s new?</title>     >>> to_be_replaced(r)     set(['\xe2\x80\x99'])     """     regex = re.compile('([\x7f-\xff]{2,})')     return set(re.findall(regex, reprstring))

unfortunately, test fails:

>"e:\python27\pythonw.exe" -u "test_to_be_replaced.py"  replaced: set(['\xe2\x80\x99'])  ********************************************************************** file "test_to_be_replaced.py", line 14, in __main__.to_be_replaced failed example:     print r expected:     <title>whatâ€™s new?</title> got:     '<title>what\xe2\x80\x99s new?</title>' ********************************************************************** file "test_to_be_replaced.py", line 16, in __main__.to_be_replaced failed example:     to_be_replaced(r) expected:     set(['â€™']) got:     set([]) ********************************************************************** 1 items had failures:    2 of   4 in __main__.to_be_replaced ***test failed*** 2 failures. >exit code: 0

the output above running:

if __name__ == '__main__':     s = '<title>what\xe2\x80\x99s new?</title>'     print 'to replaced:', to_be_replaced(s)  # works intended     import doctest     doctest.testmod()

what can in order make test pass?

using python 2.7.10 on windows 7 x32.

Search This Blog

Image

python - Why does doctest fail with string containing UTF-8 chars? -

Comments

Post a Comment

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -