Version 2 of my patch (mbcs2.patch):
- patch also the encoder: fix ignore/replace depending on the Windows version, support any error handler: encode character per character if encoding in strict mode fails
- Add PyUnicode_DecodeCodePageStateful() and PyUnicode_EncodeCodePage() functions
- Expose these functions as codecs.code_page_decode() and codecs.code_page_encode()
The encoder raises a RuntimeError("recursive call") (ugly message!) if the result of the error handler is a Unicode string that cannot be encoded to the code page.
More TODO:
- write tests using codecs.code_page_decode() and codecs.code_page_encode()
- Fix FIXME (e.g. support surrogates in the encoder) |