Message195886
| Author | vstinner |
|---|---|
| Recipients | Arfrever, a.badger, abadger1999, benjamin.peterson, ezio.melotti, lemburg, ncoghlan, pitrou, r.david.murray, serhiy.storchaka, vstinner |
| Date | 2013-08-22.13:18:23 |
| SpamBayes Score | -1.0 |
| Marked as misclassified | Yes |
| Message-id | <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
> The surrogateescape error handler is dangerous with utf-16/32. It can produce globally invalid output.
I don't understand, can you give an example? surrogateescape generate invalid encoded string with any encoding. Example with UTF-8:
>>> b"a\xffb".decode("utf-8", "surrogateescape")
'a\udcffb'
>>> 'a\udcffb'.encode("utf-8", "surrogateescape")
b'a\xffb'
>>> b'a\xffb'.decode("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 1: invalid start byte
So str.encode("utf-8", "surrogateescape") produces an invalid UTF-8 sequence. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2013-08-22 13:18:23 | vstinner | set | recipients: + vstinner, lemburg, ncoghlan, pitrou, abadger1999, benjamin.peterson, ezio.melotti, a.badger, Arfrever, r.david.murray, serhiy.storchaka |
| 2013-08-22 13:18:23 | vstinner | set | messageid: <1377177503.29.0.600213251678.issue18713@psf.upfronthosting.co.za> |
| 2013-08-22 13:18:23 | vstinner | link | issue18713 messages |
| 2013-08-22 13:18:23 | vstinner | create | |