Getting the same Unicode string length in both Python 2 and 3? -
uhh, python 2 / 3 frustrating... consider example, test.py
:
#!/usr/bin/env python # -*- coding: utf-8 -*- import sys if sys.version_info[0] < 3: text_type = unicode binary_type = str def b(x): return x def u(x): return unicode(x, "utf-8") else: text_type = str binary_type = bytes import codecs def b(x): return codecs.latin_1_encode(x)[0] def u(x): return x tstr = " ▲ " sys.stderr.write(tstr) sys.stderr.write("\n") sys.stderr.write(str(len(tstr))) sys.stderr.write("\n")
running it:
$ python2.7 test.py ▲ 5 $ python3.2 test.py ▲ 3
great, 2 differing string sizes. wrapping string in 1 of these wrappers found around net help?
for tstr = text_type(" ▲ ")
:
$ python2.7 test.py traceback (most recent call last): file "test.py", line 21, in <module> tstr = text_type(" ▲ ") unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) $ python3.2 test.py ▲ 3
for tstr = u(" ▲ ")
:
$ python2.7 test.py traceback (most recent call last): file "test.py", line 21, in <module> tstr = u(" ▲ ") file "test.py", line 11, in u return unicode(x) unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) $ python3.2 test.py ▲ 3
for tstr = b(" ▲ ")
:
$ python2.7 test.py ▲ 5 $ python3.2 test.py traceback (most recent call last): file "test.py", line 21, in <module> tstr = b(" ▲ ") file "test.py", line 17, in b return codecs.latin_1_encode(x)[0] unicodeencodeerror: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)
for tstr = binary_type(" ▲ ")
:
$ python2.7 test.py ▲ 5 $ python3.2 test.py traceback (most recent call last): file "test.py", line 21, in <module> tstr = binary_type(" ▲ ") typeerror: string argument without encoding
well, makes things easy.
so, how same string length (in case, 3) in both python 2.7 , 3.2?
well, turns out unicode() in python 2.7 has encoding
argument, , apparently helps:
#!/usr/bin/env python # -*- coding: utf-8 -*- import sys if sys.version_info[0] < 3: text_type = unicode binary_type = str def b(x): return x def u(x): return unicode(x, "utf-8") else: text_type = str binary_type = bytes import codecs def b(x): return codecs.latin_1_encode(x)[0] def u(x): return x tstr = u(" ▲ ") sys.stderr.write(tstr) sys.stderr.write("\n") sys.stderr.write(str(len(tstr))) sys.stderr.write("\n")
running this, needed:
$ python2.7 test.py ▲ 3 $ python3.2 test.py ▲ 3
Comments
Post a Comment