Message 283271 - Python tracker

Message283271

Author	vstinner
Recipients	belopolsky, ezio.melotti, jcea, lemburg, sdaoden, serhiy.storchaka, vstinner
Date	2016-12-15.09:53:01
SpamBayes Score	-1.0
Marked as misclassified	Yes
Message-id	<1481795581.59.0.160043064791.issue11322@psf.upfronthosting.co.za>
In-reply-to

Content
It seems like encodings.normalize_encoding() currently has no unit test! Before modifying it, I would prefer to see a few unit tests: * " utf 8 " * "UtF 8" * "utf8\xE9" * etc. Since we are talking about an optimmization, I would like to see a benchmark result before/after. I also would like to test Marc-Andre's idea of exposing the C function _Py_normalize_encoding(). _Py_normalize_encoding() works on a byte string encoded to Latin1. To implement encodings.normalize_encoding(), we might rewrite the function to work on Py_UCS4 character, or have a fast version on char*, and a more generic version for UCS2 and UCS4?

Content

It seems like encodings.normalize_encoding() currently has no unit test! Before modifying it, I would prefer to see a few unit tests:

* " utf 8 "
* "UtF 8"
* "utf8\xE9"
* etc.

Since we are talking about an optimmization, I would like to see a benchmark result before/after. I also would like to test Marc-Andre's idea of exposing the C function _Py_normalize_encoding().

_Py_normalize_encoding() works on a byte string encoded to Latin1. To implement encodings.normalize_encoding(), we might rewrite the function to work on Py_UCS4 character, or have a fast version on char*, and a more generic version for UCS2 and UCS4?

History
Date	User	Action	Args
2016-12-15 09:53:01	vstinner	set	recipients: + vstinner, lemburg, jcea, belopolsky, ezio.melotti, sdaoden, serhiy.storchaka
2016-12-15 09:53:01	vstinner	set	messageid: <1481795581.59.0.160043064791.issue11322@psf.upfronthosting.co.za>
2016-12-15 09:53:01	vstinner	link	issue11322 messages
2016-12-15 09:53:01	vstinner	create