PEP 263 comments
Martin von Loewis
loewis at informatik.hu-berlin.de
Thu Feb 28 09:09:23 EST 2002
More information about the Python-list mailing list
Thu Feb 28 09:09:23 EST 2002
- Previous message (by thread): PEP 263 comments
- Next message (by thread): PEP 276 (was Re: Status of PEP's?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"Stephen J. Turnbull" <stephen at xemacs.org> writes: > Hi, I'm Steve Turnbull, I do XEmacs. Mostly Mule. Barry asked me to > step up to bat on this. Thanks for your comments! > You don't. From now on, anything that goes into the official Python > sources is in UTF-8. Convert any existing stuff at your leisure. > This is recommended practice for 3rd party projects, too. People can > do want they want with their own stuff, but they are on notice that if > it screws up it's their problem. It's worse than this: under the proposed change, Python would refuse to accept source code if it is not UTF-8 encoded. In turn, code that has a euc-jp comment in it and is now happily accepted as source code in the current Python programming language would be rejected. This is like mandating that all Emacs-Lisp files are UTF-8, whether they are part of the Emacs sources, or installed somewhere out there in the wild. > XEmacs actually did this (half-way) three years ago. I convinced > Steve Baur to convert everything in the XEmacs CVS repository that > wasn't ISO 8859/1 to ISO-2022-JP (basically, start in ASCII, all other > character sets must designate to G0 or G1, and get invoked to GL; at > newlines, return to ASCII by designation; the "JP" part is really a > misnomer, it's fully multilingual). Presto! no more accidental Mule > corruption in the repository. This is a different issue: We are not discussing the encoding that the Python sources use in the Python CVS tree, we are discussing the encoding that Python source code uses. > Oh, and does Python have message catalogs and stuff like that? Do you > really want people doing multilingual work like translation mucking > about with random coding systems and error-prone coding cookies? > UTF-8 detection is much easier than detecting that an iso-8859-1 > cookie should really be iso-8859-15 (a reverse Turing test). Python supports gettext, but this is still a different issue. The Unicode type of Python is precisely that - it is not that Python would support different wide character implementations internally. Again, the issue is how source code is encoded. > So much for the alleged "backward compatibility" non-issue. :-) > People are abusing implementation dependencies; Just Say No. A very radical opinion :-) but I get the feeling you might be missing the point in question ... > Martin> Will you reject a source module just because it contains a > Martin> latin-1 comment? > > That depends. Somebody is going to run it through the converter; it's > just a question of whether it's me, or the submitter. 'you' in this case isn't the maintainer of a software package; it is the Python source code parser... > GNU Emacs supports your coding system cookies. XEmacs currently > doesn't, but we will, I already figured out what the change is and > told Barry OK. And I plan to add cookie-checking to my latin-unity > package (which undoes the Mule screwage that says Latin-1 NO-BREAK > SPACE != Latin-2 NO-BREAK SPACE). Other editors can do something > similar. I assume you are talking about the -*- coding: foo -*- stuff here? *This* is the issue in question. Should we allow it, or should we mandate that all Python source code (not just the one in the Python CVS) is UTF-8? > So people who insist on using a national coded character set in their > editor use cookies. Then the python-dev crew prepares a couple of > trivial scripts which munge sources from UTF-8 to national codeset + > cookie, and back (note you have to strip the cookie on the way back), > for the sake of people whose editor's Python-mode doesn't grok cookies. Again, not the issue: Most people run Python programs without ever submitting them to python-dev :-) You may wonder why Python (the programming language) needs to worry about the encoding at all. The reason is that we allow Unicode literals, in the form u"text" The question is what is the encoding of "text", on disk. In memory, it will be 2-byte Unicode, so the interpreter needs to convert. To do that, it must know what the encoding is, on disk. The choices are using either UTF-8, or allowing encoding cookies. > My apologies for the flood. I've been thinking about exactly this > kind of transition for XEmacs for about 5 years now, this compresses > all of that into a few dozen lines.... I'm not sure whether a similar issue exists in XEmacs: the encoding of ELisp would be closest, but only if the Lisp interpreter ever needs to worry about that. Regards, Martin
- Previous message (by thread): PEP 263 comments
- Next message (by thread): PEP 276 (was Re: Status of PEP's?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list