[Python-Dev] Bytes path related questions for Guido
Antoine Pitrou
antoine at python.org
Sun Aug 24 16:23:52 CEST 2014
More information about the Python-Dev mailing list
Sun Aug 24 16:23:52 CEST 2014
- Previous message: [Python-Dev] Bytes path related questions for Guido
- Next message: [Python-Dev] Bytes path related questions for Guido
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Le 24/08/2014 09:04, Nick Coghlan a écrit : > On 24 August 2014 14:44, Nick Coghlan <ncoghlan at gmail.com> wrote: >> 2. Should we add some additional helpers to the string module for >> dealing with surrogate escaped bytes and other techniques for >> smuggling arbitrary binary data as text? >> >> My proposal [3] is to add: >> >> * string.escaped_surrogates (constant with the 128 escaped code points) >> * string.clean(s): replaces surrogates with '\ufffd' or another >> specified code point >> * string.redecode(s, encoding): encodes a string back to bytes and >> then decodes it again using the specified encoding (the old encoding >> defaults to 'latin-1' to match the assumptions in WSGI) > > > Serhiy & Ezio convinced me to scale this one back to a proposal for > "codecs.clean_surrogate_escapes(s)", which replaces surrogates that > may be produced by surrogateescape (that's what string.clean() above > was supposed to be, but my description was not correct, and the name > was too vague for that error to be obvious to the reader) "clean" conveys the wrong meaning. It should use a scary word such as "trap". "Cleaning" surrogates is unlikely to be the right procedure when dealing with surrogates produced by undecodable byte sequences. Regards Antoine.
- Previous message: [Python-Dev] Bytes path related questions for Guido
- Next message: [Python-Dev] Bytes path related questions for Guido
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list