Message117046
| Author | mrabarnett |
|---|---|
| Recipients | akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr |
| Date | 2010-09-21.11:41:33 |
| SpamBayes Score | 0.00012060844 |
| Marked as misclassified | No |
| Message-id | <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za> |
| In-reply-to |
| Content | |
|---|---|
I use Python 3, where len("\U00010337") == 2 on a narrow build.
Yes, wide Unicode on a narrow build is a problem:
>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]
I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.
I suppose the moral is that if you want to use wide Unicode then you really should use a wide build. |
|
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2010-09-21 11:41:36 | mrabarnett | set | recipients: + mrabarnett, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, r.david.murray, jhalcrow |
| 2010-09-21 11:41:35 | mrabarnett | set | messageid: <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za> |
| 2010-09-21 11:41:33 | mrabarnett | link | issue2636 messages |
| 2010-09-21 11:41:33 | mrabarnett | create | |