Message159362
| Author | vstinner |
|---|---|
| Recipients | Arfrever, Henri.Salo, Huzaifa.Sidhpurwala, asvetlov, benjamin.peterson, ezio.melotti, loewis, pitrou, serhiy.storchaka, vstinner |
| Date | 2012-04-26.11:36:25 |
| SpamBayes Score | -1.0 |
| Marked as misclassified | Yes |
| Message-id | <1335440186.13.0.485206103926.issue14579@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
I ran tests of utf16_error_handling-3.2_4.patch on Python 3.1. Two tests are failing:
- b'\x00\xd8'.decode('utf-16le', 'replace')='\ufffd\ufffd' != '\ufffd'
- b'\xd8\x00'.decode('utf-16be', 'replace')='\ufffd\ufffd' != '\ufffd'
I don't think that the test is correct: UTF-16 should resynchronize as early as possible (ignore the first invalid byte and restart at the following byte), so '\ufffd\ufffd' is the correct answer.
Another examples:
- b'\xd8\x00\x41'.decode('utf-16be', 'replace') should return '�A' (\ufffdA')
- with UTF-8 decoder: (b'\xC3' + '\xe9'.encode('utf-8')).decode('utf-8', 'replace') returns '\ufffd\xe9' |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2012-04-26 11:36:26 | vstinner | set | recipients: + vstinner, loewis, pitrou, benjamin.peterson, ezio.melotti, Arfrever, asvetlov, Henri.Salo, Huzaifa.Sidhpurwala, serhiy.storchaka |
| 2012-04-26 11:36:26 | vstinner | set | messageid: <1335440186.13.0.485206103926.issue14579@psf.upfronthosting.co.za> |
| 2012-04-26 11:36:25 | vstinner | link | issue14579 messages |
| 2012-04-26 11:36:25 | vstinner | create | |