With PEPs 538 and 540 implemented for 3.7, my thinking on this has evolved a bit.
A recent discussion on python-ideas [1] also introduced me to the third party library, "ftfy", which offers a wide range of tools for cleaning up improperly decoded data: https://ftfy.readthedocs.io/en/latest/
That includes a lone surrogate fixer: https://ftfy.readthedocs.io/en/latest/#ftfy.fixes.fix_surrogates
So a potential way to go here would be to a section on "Handling Improperly Decoded Text Data" to the codecs module documentation, and include ftfy as a See Also link in that new section.
If folks think that would be a reasonable way to go, then I think the clearest way to handle it would be to close this issue as "later" (which still implies "maybe never", but not as strongly as "rejected" does), and open a new issue for the suggested new section in the docs.
[1] https://mail.python.org/pipermail/python-ideas/2018-January/048583.html |