Issue 36671: str.lower() looses character information when working with UTF-8

Issue36671

Created on 2019-04-20 07:02 by Kadam Parikh, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg340563 - (view)	Author: Kadam Parikh (Kadam Parikh)	Date: 2019-04-20 07:02
When converting a particular UTF-8 character "İ" to lowercase, it doesn't behave correctly. It returns two lowercase characters instead of one. This is not as desired. Code: >>> print("\u0130") İ >>> print("\u0130".lower()) i̇ >>>
msg340567 - (view)	Author: SilentGhost (SilentGhost) *	Date: 2019-04-20 07:48
This is the behaviour according to the Unicode standard version 11. This is not an oversight on part of CPython implementation, this character (among others) lowercases to two characters.

History
Date	User	Action	Args
2022-04-11 14:59:14	admin	set	github: 80852
2019-04-20 07:48:26	SilentGhost	set	status: open -> closed nosy: + SilentGhost messages: + msg340567 resolution: not a bug stage: resolved
2019-04-20 07:02:42	Kadam Parikh	create