python - Convert raw byte string to Unicode without knowing the codepage beforehand -
when using right-click menu context, windows passes file path raw (byte) string type.
for example:
path = 'c:\\mydir\\\x99\x8c\x85\x8d.mp3'
many external packages in application expecting unicode
type strings, have convert unicode
.
that easy if we'd known raw string's encoding beforehand (in example, cp1255
). can't know encoding used locally on each computer around world.
how can convert string
unicode
? perhaps using win32api
needed?
no idea why might getting dos code page (862) instead of ansi (1255) - how right-click option set up?
either way - if need accept arbitrary unicode character in arguments can't python 2's sys.argv
. list populated bytes returned non-unicode version of win32 api (getcommandlinea
), , encoding never unicode-safe.
many other languages including java , ruby in same boat; limitation comes microsoft c runtime's implementations of c standard library functions. fix it, 1 call unicode version (getcommandlinew
) on windows instead of relying on cross-platform standard library. python 3 this.
in meantime python 2, can calling getcommandlinew
it's not pretty. can use commandlinetoargvw
if want windows-style parameter splittng. can win32
extensions or plain ctypes
.
example (though step of encoding unicode string utf-8 bytes best skipped).
Comments
Post a Comment