Message163706
| Author | ezio.melotti |
|---|---|
| Recipients | Brian.Jones, eric.araujo, eric.smith, ezio.melotti, hp.dekoning, loewis, python-dev |
| Date | 2012-06-24.03:11:35 |
| SpamBayes Score | -1.0 |
| Marked as misclassified | Yes |
| Message-id | <1340507496.32.0.519686741143.issue11113@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
The problem is that the standard allows some charref to end without a ';', but not all of them. So both "Éric" and Éric" will be parsed as "Éric", but only "αcentauri" will result in "αcentauri" -- "&alphacentauri" will be returned unchanged. I'm now working on #15156 to use this dict in HTMLParser, and detecting the ';'-less entities is not easy. A possible solution is to keep the names that are accepted without ',' in a separate (private) dict and expose a function like HTMLParser.unescape that implements all the necessary logic. Regarding ChainMap, the html5 dict should be a superset of the html4 one. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2012-06-24 03:11:36 | ezio.melotti | set | recipients: + ezio.melotti, loewis, eric.smith, eric.araujo, Brian.Jones, python-dev, hp.dekoning |
| 2012-06-24 03:11:36 | ezio.melotti | set | messageid: <1340507496.32.0.519686741143.issue11113@psf.upfronthosting.co.za> |
| 2012-06-24 03:11:35 | ezio.melotti | link | issue11113 messages |
| 2012-06-24 03:11:35 | ezio.melotti | create | |