Issue 836035: strftime month name is encoded somehow

Issue836035

Created on 2003-11-04 20:49 by tim_evans, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Messages (10)
msg18893 - (view)	Author: Tim Evans (tim_evans)	Date: 2003-11-04 20:49
On Windows XP, with some locales the month name returned by time.strftime('%B') is encoded somehow. For example: >>> import time, locale >>> locale.setlocale(locale.LC_ALL, '') "Chinese_People's Republic of China.936" >>> time.strftime('%B') '\xca\xae\xd2\xbb\xd4\xc2' >>> time.strftime('%d %B %Y') '05 \xca\xae\xd2\xbb\xd4\xc2 2003' >>> locale.setlocale(locale.LC_ALL, '') 'French_France.1252' >>> time.strftime('%B', (2003,12,1,0,0,0,0,0,0)) 'd\xe9cembre' I'm not sure what encoding the Chinese version is using, but the French is compatible with latin-1. It would appear that the encoding used is locale-dependent. Ideally, the win32 version of time.strftime would call the wide-character version of strftime (called wcsftime) and return a unicode object. I haven't looked at what this does under any other operating system.
msg18894 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-11-05 20:28
Logged In: YES user_id=21627 It always contains a byte string in the locale's encoding; for compatibility, this cannot be changed. On Windows, you can access the encoding as "mbcs". In general, you need to use locale.getpreferredencoding() to find out what encoding this string is in. Closing as not-a-bug.
msg18895 - (view)	Author: Tim Evans (tim_evans)	Date: 2003-11-05 22:45
Logged In: YES user_id=561705 I'm reopening the bug, because that doesn't seem to work: >>> import time, locale >>> locale.setlocale(locale.LC_ALL, '') "Chinese_People's Republic of China.936" >>> x = time.strftime('%B') >>> x '\xca\xae\xd2\xbb\xd4\xc2' >>> x.decode('mbcs') '\xca\xae\xd2\xbb\xd4\xc2' >>> locale.getpreferredencoding() 'cp1252' >>> x.decode('cp1252') '\xca\xae\xd2\xbb\xd4\xc2' The preferred encoding is returned as cp1252, which can't be correct. And niether cp1252 nor mbcs appear to decode the string into anything containing the high-numbered characters I would expect for chinese (neither of them changes the string at all). The following problems (may) exist: 1. locale.getpreferredencoding() doesn't work. 2. The string return by time.strftime() is not mbcs encoded. 3. The documentation for time.strftime() doesn't say how the string is encoded.
msg18896 - (view)	Author: Marc-Andre Lemburg (lemburg) *	Date: 2003-11-06 08:53
Logged In: YES user_id=38388 Tim, there's nothing much we can do about this since the strftime() API is a direct interface to the underlying C lib API. Python simply passes through the arguments to this function and returns whatever teh C lib has to offer. Please refer to the C lib documentation for your platform for details about the encoding being used for the strings. BTW, a simpe table with the month names in your application should nicely solve your problem; addtitionally it gives you full control ove the encoding and wording being used.
msg18897 - (view)	Author: Tim Evans (tim_evans)	Date: 2003-11-06 21:00
Logged In: YES user_id=561705 The windows C lib docs say that calling mbstowcs on the output of strftime (or calling wcsftime instead of strftime) will return the correct wide-character (utf-16?) string. This produces something that looks like it could be correct. Decoding with the 'mbcs' encoding in Python is not equivalent to calling mbstowcs because mbstowcs is locale-dependent. Perhaps it would be a good idea to have time.strftime return a unicode string. As this wouldn't be backward compatible, it could be done via a new function time.ustrftime, or via an optional unicode=True argument to the existing function.
msg18898 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-11-06 21:33
Logged In: YES user_id=21627 Is there any way to find out the encoding that mbstowcs uses?
msg18899 - (view)	Author: Tim Evans (tim_evans)	Date: 2003-11-06 22:21
Logged In: YES user_id=561705 I have looked at the source code for the MS C library (it comes with VC++6) and I believe that that something equivalent to the following is used: char codepage[16]; GetLocaleInfo( GetThreadLocale(), LOCALE_IDEFAULTANSICODEPAGE, codepage, 16); This returns "1252" for "C" locale, and for the chinese locale that I was expirmenting with it returns "936". Python does not have an encoding "cp936", but from C the conversion with an explicit codepage produces the same results as mbstwcs.
msg18900 - (view)	Author: Martin v. Löwis (loewis) *	Date: 2003-11-07 18:56
Logged In: YES user_id=21627 This tells me that we need a function to return the current locale's code page; this should return "cp936" in your case. The fact that Python does not have a codec for cp936 is an independent issue.
msg107343 - (view)	Author: Alexander Belopolsky (belopolsky) *	Date: 2010-06-08 20:29
Is this still an issue in 3.x? With time.strftime() returning unicode, I don't think any encoding issues remain.
msg114295 - (view)	Author: Mark Lawrence (BreamoreBoy) *	Date: 2010-08-18 23:07
Closed as no reply to msg107343.

History
Date	User	Action	Args
2022-04-11 14:56:00	admin	set	github: 39503
2010-08-18 23:07:20	BreamoreBoy	set	status: open -> closed nosy: + BreamoreBoy messages: + msg114295 resolution: out of date
2010-06-08 20:29:12	belopolsky	set	assignee: belopolsky messages: + msg107343 nosy: + belopolsky
2008-08-29 23:56:38	kevinwatters	set	nosy: + kevinwatters
2003-11-04 20:49:39	tim_evans	create