Message159525
| Author | serhiy.storchaka |
|---|---|
| Recipients | ezio.melotti, kennyluck, loewis, serhiy.storchaka, vstinner |
| Date | 2012-04-28.14:36:53 |
| SpamBayes Score | -1.0 |
| Marked as misclassified | Yes |
| Message-id | <1335623814.43.0.997794771221.issue13916@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
The problem is that "surrogatepass" specific to utf-8 and there is no standard way to decode alone surrogates in utf-16.
>>> "\udc80\udc80".encode("utf-16", "surrogatepass").decode("utf-16", "surrogatepass")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf16' codec can't decode bytes in position 2-3: illegal encoding
With utf-32 this "works" only thanks to the bug -- utf-32 allows alone surrogates (issue #12892).
If the "surrogatepass" worked with utf-16 and utf-32, it would be natural to throw ValueError for other encodings. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2012-04-28 14:36:54 | serhiy.storchaka | set | recipients: + serhiy.storchaka, loewis, vstinner, ezio.melotti, kennyluck |
| 2012-04-28 14:36:54 | serhiy.storchaka | set | messageid: <1335623814.43.0.997794771221.issue13916@psf.upfronthosting.co.za> |
| 2012-04-28 14:36:53 | serhiy.storchaka | link | issue13916 messages |
| 2012-04-28 14:36:53 | serhiy.storchaka | create | |