[Python-Dev] _PyUnicode_CheckConsistency() too strict?
Phil Thompson
phil at riverbankcomputing.com
Mon Feb 3 16:44:27 CET 2014
More information about the Python-Dev mailing list
Mon Feb 3 16:44:27 CET 2014
- Previous message: [Python-Dev] _PyUnicode_CheckConsistency() too strict?
- Next message: [Python-Dev] _PyUnicode_CheckConsistency() too strict?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 03-02-2014 3:35 pm, Victor Stinner wrote: > 2014-02-03 Phil Thompson <phil at riverbankcomputing.com>: >> For example, a string created with a maxchar of 255 (ie. a Latin-1 >> string) >> must contain at least one character in the range 128-255 otherwise >> you get >> an assertion failure. > > Yes, it's the specification of the PEP 393. > >> As it stands, when converting Latin-1 strings in my C extension >> module I >> must first check each character and specify a maxchar of 127 if the >> strings >> happens to only contain ASCII characters. > > Use PyUnicode_FromKindAndData(PyUnicode_1BYTE_KIND, latin1_str, > length) which computes the kind for you. > >> What is the reasoning behind the checks being so strict? > > Different Python functions rely on the exact kind to compare strings. > For example, if you search a latin1 substring in an ASCII string, the > search returns immediatly instead of searching in the string. A > latin1 > string cannot be found in an ASCII string. > > The main reason in the PEP 393 itself, a string must be compact to > not > waste memory. > > Victor Are you saying that code will fail if a particular Latin-1 string just happens not to contains any character greater than 127? I would be very surprised if that was the case. If it isn't the case then I think that particular check shouldn't be made. Phil
- Previous message: [Python-Dev] _PyUnicode_CheckConsistency() too strict?
- Next message: [Python-Dev] _PyUnicode_CheckConsistency() too strict?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list