Issue2481
Created on 2008-03-25 14:33 by cito, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| wcsxfrm.diff | saurik, 2012-01-02 02:25 | Python 2.7.2: Unicode locale.strxfrm() | ||
| Messages (7) | |||
|---|---|---|---|
| msg64484 - (view) | Author: Christoph Zwerschke (cito) * | Date: 2008-03-25 14:33 | |
While locale.strcoll seems to work with Unicode strings, locale.strxfrm
gives a UnicodeError. Example:
###
try:
locale.setlocale(locale.LC_ALL, 'de')
except locale.Error: # Windoof
locale.setlocale(locale.LC_ALL, 'german')
s = ['Ägypten', 'Zypern']
print sorted(s, cmp=locale.strcoll) # works
print sorted(s, key=locale.strxfrm) # works
s = [u'Ägypten', u'Zypern']
print sorted(s, cmp=locale.strcoll) # works
print sorted(s, key=locale.strxfrm) # UnicodeError
###
Therefore, it is not possible to sort lists of Unicode strings
effectively. If possible, this should be fixed. If not possible, this
problem should at least be mentioned in the documentation. Currently,
the docs do not indicate that strcoll and strxfrm behave differently
concerning Unicode.
|
|||
| msg64516 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2008-03-25 21:26 | |
FWIW, this is fixed in Python 3.0. |
|||
| msg64518 - (view) | Author: Benjamin Peterson (benjamin.peterson) * ![]() |
Date: 2008-03-25 21:29 | |
Can it be backported? |
|||
| msg64524 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2008-03-25 21:43 | |
Sure, although it probably shouldn't be backported to 2.5. |
|||
| msg150438 - (view) | Author: Jay Freeman (saurik) (saurik) | Date: 2012-01-01 17:57 | |
Given that Python 3.x is still not ready for general use (and when this is discussed people make it quite clear that this is to be expected, and that a many year timeline was originally proposed for the Python 3.0 transition), it seems like this bug fix should have been backported to 2.x at some point in the last four years it has been open. :( |
|||
| msg150448 - (view) | Author: Martin v. Löwis (loewis) * ![]() |
Date: 2012-01-02 00:00 | |
saurik: can you propose a patch? |
|||
| msg150450 - (view) | Author: Jay Freeman (saurik) (saurik) | Date: 2012-01-02 02:25 | |
I have attached a tested patch against Python-2.7.2.tgz (as I do not know how to use hg currently). It should be noted that I also am not 100% certain how the Python build environment works, but the way I added the wcsxfrm test was to add it to configure.in, then run autoheader and autoconf. It also should be noted that the original code called strxfrm and did not check for an error result: neither does my new code (which is mostly based on formulaic modifications of the existing code in addition to educated guesses with regards to coding and formatting standards: feel free to change, obviously). Finally, I noticed while working on this that --enable-unicode=no does not work (there is a check that enforces that it must be either ucs2 or ucs4): seems like an easy fix. That said, I ran into numerous other issues trying to make a non-Unicode build, and in the end gave up. My code looks like it should work, however, were someone to figure out how to build a non-Unicode Python 2.7. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:32 | admin | set | github: 46733 |
| 2020-05-31 12:28:24 | serhiy.storchaka | set | status: open -> closed resolution: out of date stage: resolved |
| 2012-01-02 02:25:58 | saurik | set | files:
+ wcsxfrm.diff keywords: + patch messages: + msg150450 |
| 2012-01-02 00:00:49 | loewis | set | messages: + msg150448 |
| 2012-01-01 17:57:27 | saurik | set | nosy:
+ saurik messages: + msg150438 |
| 2010-08-21 22:59:37 | georg.brandl | set | versions: + Python 2.7, - Python 2.6 |
| 2008-03-25 21:43:09 | loewis | set | messages:
+ msg64524 versions: + Python 2.6, - Python 2.5 |
| 2008-03-25 21:29:18 | benjamin.peterson | set | nosy:
+ benjamin.peterson messages: + msg64518 |
| 2008-03-25 21:26:19 | loewis | set | messages: + msg64516 |
| 2008-03-25 14:59:05 | georg.brandl | set | assignee: loewis nosy: + loewis |
| 2008-03-25 14:33:55 | cito | create | |
