[Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)
Victor Stinner
victor.stinner at gmail.com
Thu Jan 12 12:10:35 EST 2017
More information about the Python-ideas mailing list
Thu Jan 12 12:10:35 EST 2017
- Previous message (by thread): [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)
- Next message (by thread): [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2017-01-12 17:10 GMT+01:00 Oleg Broytman <phd at phdru.name>: >> Does it work to use a locale with encoding A for LC_CTYPE and a locale >> with encoding B for LC_MESSAGES (and others)? Is there a risk of > > It does when B is a subset of A (ascii and koi8; ascii and utf8, e.g.) My question is more when A and B encodings are not compatible. Ah yes, date, thank you for the example. Here is my example using LC_TIME locale to format a date and LC_CTYPE to decode a byte string: date.py: --- import locale, time locale.setlocale(locale.LC_ALL, "") b = time.strftime("%a") encoding=locale.getpreferredencoding() try: u = b.decode(encoding) except UnicodeError: u = '<failed to decode>' else: u = repr(u) print("bytes: %r, text: %s, encoding: %r" % (b, u, encoding)) --- When all locales are the same, it works fine: 목 (U+baa9) is the expected result $ LC_TIME=ko_KR.euckr LANG=ko_KR.euckr python2 date.py bytes: '\xb8\xf1', text: u'\ubaa9', encoding: 'EUC-KR' You get mojibake if LC_CTYPE uses the Latin1 encoding whereas LC_TIME uses the EUC-KR encoding: you get "¸ñ" (U+00b8, U+00f1). $ LC_TIME=ko_KR.euckr LANG=fr_FR python2 date.py bytes: '\xb8\xf1', text: u'\xb8\xf1', encoding: 'ISO-8859-1' The program can also fail with UnicodeDecodeError: $ LC_TIME=ko_KR.euckr LANG=fr_FR.UTF-8 python2 date.py bytes: '\xb8\xf1', text: <failed to decode>, encoding: 'UTF-8' Well, since we are talking about the POSIX locale which usually uses ASCII, it shouldn't be an issue in practice for the PEP 538. I was just curious :-) Victor
- Previous message (by thread): [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)
- Next message (by thread): [Python-ideas] RFC: PEP 540 version 3 (Add a new UTF-8 mode)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-ideas mailing list