Message148603
| Author | vstinner |
|---|---|
| Recipients | ezio.melotti, gvanrossum, lemburg, loewis, tchrist, vstinner |
| Date | 2011-11-29.20:42:29 |
| SpamBayes Score | 0.0002851445 |
| Marked as misclassified | No |
| Message-id | <1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
Python 3.3 has a strange behaviour:
>>> '\uDBFF\uDFFF'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
>>> '\U0010ffff'.encode('utf-16-le').decode('utf-16-le')
'\U0010ffff'
I would expect text.decode(encoding).encode(encoding)==text or an encode or decode error.
So I agree that the encoder should reject lone surogates. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2011-11-29 20:42:30 | vstinner | set | recipients: + vstinner, lemburg, gvanrossum, loewis, ezio.melotti, tchrist |
| 2011-11-29 20:42:30 | vstinner | set | messageid: <1322599350.13.0.163750536411.issue12892@psf.upfronthosting.co.za> |
| 2011-11-29 20:42:29 | vstinner | link | issue12892 messages |
| 2011-11-29 20:42:29 | vstinner | create | |