PEP 249 Compliant error handling
MRAB
python at mrabarnett.plus.com
Tue Oct 17 16:02:29 EDT 2017
More information about the Python-list mailing list
Tue Oct 17 16:02:29 EDT 2017
- Previous message (by thread): PEP 249 Compliant error handling
- Next message (by thread): PEP 249 Compliant error handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 2017-10-17 20:25, Israel Brewster wrote: > >> On Oct 17, 2017, at 10:35 AM, MRAB <python at mrabarnett.plus.com >> <mailto:python at mrabarnett.plus.com>> wrote: >> >> On 2017-10-17 18:26, Israel Brewster wrote: >>> I have written and maintain a PEP 249 compliant (hopefully) DB API >>> for the 4D database, and I've run into a situation where corrupted >>> string data from the database can cause the module to error out. >>> Specifically, when decoding the string, I get a "UnicodeDecodeError: >>> 'utf-16-le' codec can't decode bytes in position 86-87: illegal >>> UTF-16 surrogate" error. This makes sense, given that the string >>> data got corrupted somehow, but the question is "what is the proper >>> way to deal with this in the module?" Should I just throw an error >>> on bad data? Or would it be better to set the errors parameter to >>> something like "replace"? The former feels a bit more "proper" to me >>> (there's an error here, so we throw an error), but leaves the end >>> user dead in the water, with no way to retrieve *any* of the data >>> (from that row at least, and perhaps any rows after it as well). The >>> latter option sort of feels like sweeping the problem under the rug, >>> but does at least leave an error character in the s >> tring to >> l >>> et them know there was an error, and will allow retrieval of any >>> good data. >>> Of course, if this was in my own code I could decide on a >>> case-by-case basis what the proper action is, but since this a >>> module that has to work in any situation, it's a bit more complicated. >> If a particular text field is corrupted, then raising >> UnicodeDecodeError when trying to get the contents of that field as a >> Unicode string seems reasonable to me. >> >> Is there a way to get the contents as a bytestring, or to get the >> contents with a different errors parameter, so that the user has the >> means to fix it (if it's fixable)? > > That's certainly a possibility, if that behavior conforms to the DB > API "standards". My concern in this front is that in my experience > working with other PEP 249 modules (specifically psycopg2), I'm pretty > sure that columns designated as type VARCHAR or TEXT are returned as > strings (unicode in python 2, although that may have been a setting I > used), not bytes. The other complication here is that the 4D database > doesn't use the UTF-8 encoding typically found, but rather UTF-16LE, > and I don't know how well this is documented. So not only is the bytes > representation completely unintelligible for human consumption, I'm > not sure the average end-user would know what decoding to use. > > In the end though, the main thing in my mind is to maintain > "standards" compatibility - I don't want to be returning bytes if all > other DB API modules return strings, or visa-versa for that matter. > There may be some flexibility there, but as much as possible I want to > conform to the majority/standard/whatever > The average end-user might not know which encoding is being used, but providing a way to read the underlying bytes will give a more experienced user the means to investigate and possibly fix it: get the bytes, figure out what the string should be, update the field with the correctly decoded string using normal DB instructions.
- Previous message (by thread): PEP 249 Compliant error handling
- Next message (by thread): PEP 249 Compliant error handling
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list