[Python-ideas] PEP 540: Add a new UTF-8 mode
Oleg Broytman
phd at phdru.name
Fri Jan 6 14:12:16 EST 2017
More information about the Python-ideas mailing list
Fri Jan 6 14:12:16 EST 2017
- Previous message (by thread): [Python-ideas] PEP 540: Add a new UTF-8 mode
- Next message (by thread): [Python-ideas] PEP 540: Add a new UTF-8 mode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Fri, Jan 06, 2017 at 10:15:52AM +0900, INADA Naoki <songofacandy at gmail.com> wrote: > >> Always use UTF-8 > >> ---------------- > >> > >> Python already always use the UTF-8 encoding on Mac OS X, Android and Windows. > >> Since UTF-8 became the defacto encoding, it makes sense to always use it on all > >> platforms with any locale. > > > > Please don't! I use different locales and encodings, sometimes it's > > utf-8, sometimes not - but I have properly configured LC_* settings and > > I prefer Python to follow my command. It'd be disgusting if Python > > starts to bend me to its preferences. > > For stdio (including console), PYTHONIOENCODING can be used for > supporting legacy system. > e.g. `export PYTHONIOENCODING=$(locale charmap)` This means one more thing to reconfigure when I switch locales instead of Python to catches up automatically. > For commandline argument and filepath, UTF-8/surrogateescape can round trip. > But mojibake may happens when pass the path to GUI. > > If we chose "Always use UTF-8 for fs encoding", I think > PYTHONFSENCODING envvar should be > added again. (It should be used from startup: decoding command line argument). > > > > >> The risk is to introduce mojibake if the locale uses a different encoding, > >> especially for locales other than the POSIX locale. > > > > There is no such risk for me as I already have mojibake in my > > systems. Two most notable sources of mojibake are: > > > > 1) FTP servers - people create files (both names and content) in > > different encodings; w32 FTP clients usually send file names and > > content in cp1251 (Russian Windows encoding), sometimes in cp866 > > (Russian Windows OEM encoding). > > > > 2) MP3 tags and play lists - almost always cp1251. > > > > So whatever my personal encoding is - koi8-r or utf-8 - I have to > > deal with file names and content in different encodings. > > 3) unzip zip file sent by Windows. Windows user use no-ASCII filenames, and > create legacy (no UTF-8) zip file very often. Good example, thank you! I forgot about it because I have wrote my own zip.py and unzip.py that encode/decode filenames. > I think people using non UTF-8 should solve encoding issue by themselves. > People should use ASCII or UTF-8 always if they don't want to see mojibake. Impossible. Even if I'd always use UTF-8 I still will receive a lot of cp1251/cp866. Oleg. -- Oleg Broytman http://phdru.name/ phd at phdru.name Programmers don't die, they just GOSUB without RETURN.
- Previous message (by thread): [Python-ideas] PEP 540: Add a new UTF-8 mode
- Next message (by thread): [Python-ideas] PEP 540: Add a new UTF-8 mode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-ideas mailing list