Message 266124 - Python tracker

Message266124

Author	serhiy.storchaka
Recipients	BreamoreBoy, ezio.melotti, kunkku, lemburg, loewis, martin.panter, serhiy.storchaka, vstinner
Date	2016-05-23.06:02:43
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1463983367.37.0.631877574355.issue16182@psf.upfronthosting.co.za>
In-reply-to

Content
Yes, the readline module is broken in Python 3. Underlying C library operates C strings and use locale-depended C functions to split it on Unicode characters. The Python wrapper always uses the UTF-8 encoding for converting between Python strings and C strings. It works only on UTF-8 locales. get_begidx() and get_endidx() don't correctly work at all for non-ASCII data. We should use locale encoding for converting. Proposed patch makes the readline module to use locale depending coding functions instead of default UTF-8. It also corrects indices for get_begidx() and get_endidx().

Content

Yes, the readline module is broken in Python 3. Underlying C library operates C strings and use locale-depended C functions to split it on Unicode characters. The Python wrapper always uses the UTF-8 encoding for converting between Python strings and C strings. It works only on UTF-8 locales. get_begidx() and get_endidx() don't correctly work at all for non-ASCII data. We should use locale encoding for converting.

Proposed patch makes the readline module to use locale depending coding functions instead of default UTF-8. It also corrects indices for get_begidx() and get_endidx().

History
Date	User	Action	Args
2016-05-23 06:02:47	serhiy.storchaka	set	recipients: + serhiy.storchaka, lemburg, loewis, vstinner, ezio.melotti, BreamoreBoy, martin.panter, kunkku
2016-05-23 06:02:47	serhiy.storchaka	set	messageid: <1463983367.37.0.631877574355.issue16182@psf.upfronthosting.co.za>
2016-05-23 06:02:47	serhiy.storchaka	link	issue16182 messages
2016-05-23 06:02:46	serhiy.storchaka	create