Message102024
| Author | lemburg |
|---|---|
| Recipients | dangra, ezio.melotti, lemburg, sjmachin |
| Date | 2010-03-31.18:07:43 |
| SpamBayes Score | 0.0005580376 |
| Marked as misclassified | No |
| Message-id | <1270058865.03.0.672346954204.issue8271@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
I guess the term "failing" byte somewhat underdefined. Page 95 of the standard PDF (http://www.unicode.org/versions/Unicode5.2.0/ch03.pdf) suggests to "Replace each maximal subpart of an ill-formed subsequence by a single U+FFFD". Fortunately, they explain what they are after: if a subsequent byte in the sequence does not have the high bit set, it's not to be considered part of the UTF-8 sequence of the code point. Implementing that should be fairly straight-forward by adjusting the endinpos variable accordingly. Any takers ? |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-03-31 18:07:45 | lemburg | set | recipients: + lemburg, sjmachin, ezio.melotti, dangra |
| 2010-03-31 18:07:45 | lemburg | set | messageid: <1270058865.03.0.672346954204.issue8271@psf.upfronthosting.co.za> |
| 2010-03-31 18:07:43 | lemburg | link | issue8271 messages |
| 2010-03-31 18:07:43 | lemburg | create | |