Issue 35639: Lowecasing Unicode Characters

Issue35639

Created on 2019-01-02 12:03 by kingofsevens, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (2)
msg332865 - (view) Author: Erdem Uney (kingofsevens) Date: 2019-01-02 12:03
assert 'ŞİŞLİ'.lower() == 'şişli'

Lowercasing the capital İ (with a dot on - \u0130) adds a unicode character \u0307 after i and if there is a following character it adds that dot (\u0307) over that character. The behavior is different in Python 2.7.10 where it adds the dot on top of 'i'.

Accord to Unicode Specifications character \u0130 should be converted to character \u0069.
msg332875 - (view) Author: Ma Lin (malin) * Date: 2019-01-02 13:28
please read this discussion
https://bugs.python.org/issue17252

behavior in Python 3.2- is correct for Turkish users.
behavior in Python 3.3+ is correct for non-Turkish users.
History
Date User Action Args
2022-04-11 14:59:09adminsetgithub: 79820
2019-01-04 21:31:09terry.reedysetstatus: open -> closed
superseder: Latin Capital Letter I with Dot Above
resolution: duplicate
stage: resolved
2019-01-03 15:16:28steven.dapranosetnosy: + steven.daprano
2019-01-02 13:28:52malinsetnosy: + malin
messages: + msg332875
2019-01-02 12:03:43kingofsevenscreate