Issue 36671: str.lower() looses character information when working with UTF-8

Issue36671

Created on 2019-04-20 07:02 by Kadam Parikh, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg340563 - (view) Author: Kadam Parikh (Kadam Parikh) Date: 2019-04-20 07:02
When converting a particular UTF-8 character "İ" to lowercase, it doesn't behave correctly. It returns two lowercase characters instead of one. This is not as desired.

Code:

>>> print("\u0130")
İ
>>> print("\u0130".lower())
i̇
>>>
msg340567 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2019-04-20 07:48
This is the behaviour according to the Unicode standard version 11. This is not an oversight on part of CPython implementation, this character (among others) lowercases to two characters.
History
Date User Action Args
2022-04-11 14:59:14adminsetgithub: 80852
2019-04-20 07:48:26SilentGhostsetstatus: open -> closed

nosy: + SilentGhost
messages: + msg340567

resolution: not a bug
stage: resolved

2019-04-20 07:02:42Kadam Parikhcreate