Changing the default text codec
Fuzzyman
michael at foord.net
Mon Feb 23 10:21:29 EST 2004
More information about the Python-list mailing list
Mon Feb 23 10:21:29 EST 2004
- Previous message (by thread): Changing the default text codec
- Next message (by thread): Changing the default text codec
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Paul Prescod <paul at prescod.net> wrote in message news:<mailman.193.1077530419.27104.python-list at python.org>... > Fuzzyman wrote: > > Sorry if my terminology is wrong..... but I'm having intermittent > > problems dealing with accented characters in python. (Only from the 8 > > bit latin-1 character set I think..) > > I would say that if you get a 100% failure rate in IDLE and a 100% > success rate from a console program then your problem is not > intermittent but environment specific. If that was the case then I'm sure you'd be right... good not to quibble about terminology eh ;-) (in a few other test cases the success-fail pattern was the opposite way round) > > > For example - if I run my program from IDLE and give it the word > > 'degri' (containing e-acute) then I get the error : > > What do you mean "give it the word". Through raw_input()? Through a file? > Right - it is fetching the words from a Tkinter entry box using the get() method. > However you are getting this information, it seems to me that in IDLE > you are getting a Unicode object rather than an 8-bit string object. > Convert it to an 8-bit string: > > mydata.encode("latin-1") Great - that might do the job. I'll try it. Thanks. > > > if letter in self.valid_letters: > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position > > 26: ordinal not in range(128) > > Something looks suspicious here. I wouldn't expect self.valid_letters to > have a 0x83 character in it because I would expect it to be hard-coded > to ASCII in your program like: > Self.valid_letters *in fact* is string.lowercase - which I thought included the 8 bit latin-1 letters as well. (the letters are converted to lowercase by using the .lower() string method ) > valid_letters = "abcdefghijklmnopqrstuvwxyzABCDEF..." > > On the other hand I wouldn't expect "letter" to have more than one > character so how could it have a problem at position 26? > I'm iterating over the string. > > What I'd like to do is switch by default to an 8 bit codec (latin-1 I > > think ?????) and then offer the user the choice of either mapping the > > accented characters to their nearest equivalent (e-acute to e for > > example) *or* treating them as seperate characters............. > > Why change the default codec rather than explicitly using the codec you > care about? If you want to work in the 8-bit world rather than the > Unicode world, just use the "encode" function on the Unicode object. If > you want to work in the Unicode world. > Great - sounds good. > > I can't work out how to change the default codec (no matter what the > > locale) ? > > I'd advise against fixing the problem in that way. Convert data > appropriately when you bring it from the outside world into the Python > program and ignore the default codec. > > Paul Prescod Thanks for your help. Fuzzyman http://www.voidspace.org.uk/atlantibots/pythonutils.html
- Previous message (by thread): Changing the default text codec
- Next message (by thread): Changing the default text codec
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list