Message 117046 - Python tracker

Message117046

Author	mrabarnett
Recipients	akitada, amaury.forgeotdarc, collinwinter, ezio.melotti, georg.brandl, giampaolo.rodola, gregory.p.smith, jaylogan, jhalcrow, jimjjewett, loewis, mark, moreati, mrabarnett, nneonneo, pitrou, r.david.murray, rsc, sjmachin, timehorse, vbr
Date	2010-09-21.11:41:33
SpamBayes Score	0.00012060844
Marked as misclassified	No
Message-id	<1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
In-reply-to

Content
I use Python 3, where len("\U00010337") == 2 on a narrow build. Yes, wide Unicode on a narrow build is a problem: >>> regex.findall("\\U00010337", "a\U00010337bc") [] >>> regex.findall("(?i)\\U00010337", "a\U00010337bc") [] I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated. I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.

Content

I use Python 3, where len("\U00010337") == 2 on a narrow build.

Yes, wide Unicode on a narrow build is a problem:

>>> regex.findall("\\U00010337", "a\U00010337bc")
[]
>>> regex.findall("(?i)\\U00010337", "a\U00010337bc")
[]

I'm not sure how (or whether!) to handle surrogate pairs. It _would_ make things more complicated.

I suppose the moral is that if you want to use wide Unicode then you really should use a wide build.

History
Date	User	Action	Args
2010-09-21 11:41:36	mrabarnett	set	recipients: + mrabarnett, loewis, georg.brandl, collinwinter, gregory.p.smith, jimjjewett, sjmachin, amaury.forgeotdarc, pitrou, nneonneo, giampaolo.rodola, rsc, timehorse, mark, vbr, ezio.melotti, jaylogan, akitada, moreati, r.david.murray, jhalcrow
2010-09-21 11:41:35	mrabarnett	set	messageid: <1285069295.64.0.118863478716.issue2636@psf.upfronthosting.co.za>
2010-09-21 11:41:33	mrabarnett	link	issue2636 messages
2010-09-21 11:41:33	mrabarnett	create