python - Convert raw byte string to Unicode without knowing the codepage beforehand -


when using right-click menu context, windows passes file path raw (byte) string type.

for example:

path = 'c:\\mydir\\\x99\x8c\x85\x8d.mp3' 

many external packages in application expecting unicode type strings, have convert unicode.

that easy if we'd known raw string's encoding beforehand (in example, cp1255). can't know encoding used locally on each computer around world.

how can convert string unicode? perhaps using win32api needed?

no idea why might getting dos code page (862) instead of ansi (1255) - how right-click option set up?

either way - if need accept arbitrary unicode character in arguments can't python 2's sys.argv. list populated bytes returned non-unicode version of win32 api (getcommandlinea), , encoding never unicode-safe.

many other languages including java , ruby in same boat; limitation comes microsoft c runtime's implementations of c standard library functions. fix it, 1 call unicode version (getcommandlinew) on windows instead of relying on cross-platform standard library. python 3 this.

in meantime python 2, can calling getcommandlinew it's not pretty. can use commandlinetoargvw if want windows-style parameter splittng. can win32 extensions or plain ctypes.

example (though step of encoding unicode string utf-8 bytes best skipped).


Comments

Popular posts from this blog

linux - xterm copying to CLIPBOARD using copy-selection causes automatic updating of CLIPBOARD upon mouse selection -

c++ - qgraphicsview horizontal scrolling always has a vertical delta -