[Python-Dev] Unicode exception indexing
Antoine Pitrou
solipsis at pitrou.net
Thu Nov 3 20:29:50 CET 2011
More information about the Python-Dev mailing list
Thu Nov 3 20:29:50 CET 2011
- Previous message: [Python-Dev] Unicode exception indexing
- Next message: [Python-Dev] Unicode exception indexing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Thu, 03 Nov 2011 18:14:42 +0100 martin at v.loewis.de wrote: > There is a backwards compatibility issue with PEP 393 and Unicode exceptions: > the start and end indices: are they Py_UNICODE indices, or code point indices? > > On the one hand, these indices are used in formatting error messages such as > "codec can't encode character \u%04x in position %d", suggesting they > are regular > indices into the string (counting code points). > > On the other hand, they are used by error handlers to lookup the character, > and existing error handlers (including the ones we have now) use > PyUnicode_AsUnicode to find the character. This suggests that the indices > should be Py_UNICODE indices, for compatibility (and they currently do > work in this way). But what about error handlers written in Python? > The indices can only be different if the string is an UCS-4 string, and > Py_UNICODE is a two-byte type (i.e. on Windows). > > So what should it be? I'd say let's do the Right Thing and accept the small compatibility breach (surrogates on UCS-2 builds). Regards Antoine.
- Previous message: [Python-Dev] Unicode exception indexing
- Next message: [Python-Dev] Unicode exception indexing
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list