Getting the same Unicode string length in both Python 2 and 3? -

June 15, 2015

uhh, python 2 / 3 frustrating... consider example, test.py:

#!/usr/bin/env python # -*- coding: utf-8 -*-  import sys if sys.version_info[0] < 3:   text_type = unicode   binary_type = str   def b(x):     return x   def u(x):     return unicode(x, "utf-8") else:   text_type = str   binary_type = bytes   import codecs   def b(x):     return codecs.latin_1_encode(x)[0]   def u(x):     return x  tstr = " ▲ "  sys.stderr.write(tstr) sys.stderr.write("\n") sys.stderr.write(str(len(tstr))) sys.stderr.write("\n")

running it:

$ python2.7 test.py   ▲  5 $ python3.2 test.py   ▲  3

great, 2 differing string sizes. wrapping string in 1 of these wrappers found around net help?

for tstr = text_type(" ▲ "):

$ python2.7 test.py  traceback (most recent call last):   file "test.py", line 21, in <module>     tstr = text_type(" ▲ ") unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) $ python3.2 test.py   ▲  3

for tstr = u(" ▲ "):

$ python2.7 test.py  traceback (most recent call last):   file "test.py", line 21, in <module>     tstr = u(" ▲ ")   file "test.py", line 11, in u     return unicode(x) unicodedecodeerror: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) $ python3.2 test.py   ▲  3

for tstr = b(" ▲ "):

$ python2.7 test.py   ▲  5 $ python3.2 test.py  traceback (most recent call last):   file "test.py", line 21, in <module>     tstr = b(" ▲ ")   file "test.py", line 17, in b     return codecs.latin_1_encode(x)[0] unicodeencodeerror: 'latin-1' codec can't encode character '\u25b2' in position 1: ordinal not in range(256)

for tstr = binary_type(" ▲ "):

$ python2.7 test.py   ▲  5 $ python3.2 test.py  traceback (most recent call last):   file "test.py", line 21, in <module>     tstr = binary_type(" ▲ ") typeerror: string argument without encoding

well, makes things easy.

so, how same string length (in case, 3) in both python 2.7 , 3.2?

well, turns out unicode() in python 2.7 has encoding argument, , apparently helps:

#!/usr/bin/env python # -*- coding: utf-8 -*-  import sys if sys.version_info[0] < 3:   text_type = unicode   binary_type = str   def b(x):     return x   def u(x):     return unicode(x, "utf-8") else:   text_type = str   binary_type = bytes   import codecs   def b(x):     return codecs.latin_1_encode(x)[0]   def u(x):     return x  tstr = u(" ▲ ")  sys.stderr.write(tstr) sys.stderr.write("\n") sys.stderr.write(str(len(tstr))) sys.stderr.write("\n")

running this, needed:

$ python2.7 test.py   ▲  3 $ python3.2 test.py   ▲  3

Search This Blog

Parth Code

Getting the same Unicode string length in both Python 2 and 3? -

Comments

Post a Comment

Popular posts from this blog

c# - WPF Converters DLL - Failed to Add Reference -

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -