Message 195937 - Python tracker

Message195937

Author	ncoghlan
Recipients	ncoghlan
Date	2013-08-23.04:02:31
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1377230552.02.0.907422956718.issue18814@psf.upfronthosting.co.za>
In-reply-to

Content
Prompted by issue 18713 and http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some possible utilities we could add to the codecs module to help deal with/debug issues related to surrogate escaped strings: def has_escaped_bytes(s): """Returns true if string contains surrogate escaped bytes""" ... def replace_escaped_bytes(s): """Replaces each surrogate escaped byte with a valid code point""" ... def decode_escaped_bytes(s, nominal_encoding, actual_encoding): """Reinterprets incorrectly decoded text using a new encoding""" return s.encode(nominal_encoding, 'surrogateescape').decode(actual_encoding)

Content

Prompted by issue 18713 and http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some possible utilities we could add to the codecs module to help deal with/debug issues related to surrogate escaped strings:

    def has_escaped_bytes(s):
        """Returns true if string contains surrogate escaped bytes"""
        ...

    def replace_escaped_bytes(s):
        """Replaces each surrogate escaped byte with a valid code point"""
        ...

    def decode_escaped_bytes(s, nominal_encoding, actual_encoding):
        """Reinterprets incorrectly decoded text using a new encoding"""
        return s.encode(nominal_encoding, 'surrogateescape').decode(actual_encoding)

History
Date	User	Action	Args
2013-08-23 04:02:32	ncoghlan	set	recipients: + ncoghlan
2013-08-23 04:02:32	ncoghlan	set	messageid: <1377230552.02.0.907422956718.issue18814@psf.upfronthosting.co.za>
2013-08-23 04:02:31	ncoghlan	link	issue18814 messages
2013-08-23 04:02:31	ncoghlan	create