Using more than 7 bit ASCII on windows.
Paul Moore
gustav at morpheus.demon.co.uk
Sun Oct 29 16:26:24 EST 2000
More information about the Python-list mailing list
Sun Oct 29 16:26:24 EST 2000
- Previous message (by thread): Using more than 7 bit ASCII on windows.
- Next message (by thread): Using more than 7 bit ASCII on windows.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sun, 29 Oct 2000 01:23:15 +0200, "Syver Enstad" <syver.enstad at sensewave.com> wrote: >"Mark Hammond" <MarkH at ActiveState.com> wrote in message >news:39FA90D9.7070403 at ActiveState.com... >> > Is there anyone who uses Pythonwin with a different character set on >> > Windows? -- Japanese, Chinese, European and so on. > >> A few people - and they all have trouble :-( > >I read that extended characters should work in the interactive window on the >active state page. But I can't seem to get it to work. Here are some >examples from my interactive window (using python 2.0 with win32all 135 and >the update you mentioned). > >(I keep most my kode under the folder: e:/våre dokumenter/kode/ (the 5th >character should be an a with a ring over in case it doesn't display >correctly on your screen.) > >>>> os.getcwd() >'E:\\V\345re dokumenter' >>>> os.chdir('/') >>>> os.getcwd() >'E:\\' >>>> os.chdir('våre dokumenter') >Traceback (innermost last): > File "<interactive input>", line 1, in ? >OSError: [Errno 2] No such file or directory: 'v\303\245re dokumenter' > >The line above looks very strange to me as it seems that the a with a ring >over is represented by two characters here, when it was only represented >with one when calling getcwd > >>>> os.chdir('v\345re dokumenter') >>>> os.getcwd() >'E:\\v\345re dokumenter' > >The above is the workaround that gets me were I want. Yes, the whole setup for non-ASCII characters seems to be very odd, if not broken. In my case, I am using the Latin-1 character set. If I have a directory called '10£' (not a reasonable name, but OK as an example), then I try to use it, all sorts of odd things happen: >>> s = '10£' >>> s '10\234' # Huh? OK, I see that £ isn't ASCII, and it looks like repr() is # returning an encoding-neutral form. Fair enough... >>> os.chdir('10£') Traceback (most recent call last): File "<stdin>", line 1, in ? OSError: [Errno 2] No such file or directory: '10\234' # Well, yes - but that's what Python called it. I called it '10£', # and that *does* exist. And anyway, '10\234' is '10£', based # on the s example above. # Try Unicode >>> os.chdir(u'10£') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128) # Urk. So what should I have put to get a Unicode string # with the characters '1' '0' '£'? # I'm not planning on manually UTF-8 encoding this! :-( # Let's assume we need to use codecs - it's a lot of work... >>> import codecs >>> enc, dec, sr, sw = codecs.lookup('latin1') >>> dec('10£') (u'10\234', 3) # That didn't get us very far... >>> e2, d2, r2, w2 = codecs.lookup('utf8') >>> e2('10£') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII decoding error: ordinal not in range(128) >>> d2('10£') Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: UTF-8 decoding error: unexpected code byte # Oh, stuff this. I give up :-( So come on then. How should I use Latin-1 characters over 127 in my code? As far as I can see, Unicode has made all of this *harder*, not easier. Looks like the net result is that Latin-1 and the like are now as hard as the multi-byte character sets, rather than making the multi-byte stuff as easy as Latin-1. Someone please tell me I'm wrong, and explain how I should have done this. You'll need to convince me that the fact that >>> os.chdir('10£') doesn't work is not a bug, first... Paul.
- Previous message (by thread): Using more than 7 bit ASCII on windows.
- Next message (by thread): Using more than 7 bit ASCII on windows.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list