a simple unicode question
Mark Tolonen
metolone+gmane at gmail.com
Wed Oct 21 02:15:45 EDT 2009
More information about the Python-list mailing list
Wed Oct 21 02:15:45 EDT 2009
- Previous message (by thread): a simple unicode question
- Next message (by thread): a simple unicode question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
"George Trojan" <george.trojan at noaa.gov> wrote in message news:hbktk6$8bn$1 at news.nems.noaa.gov... Thanks for all suggestions. It took me a while to find out how to configure my keyboard to be able to type the degree sign. I prefer to stick with pure ASCII if possible. Where are the literals (i.e. u'\N{DEGREE SIGN}') defined? I found http://www.unicode.org/Public/5.1.0/ucd/UnicodeData.txt Is that the place to look? George Scott David Daniels wrote: > Mark Tolonen wrote: >>> Is there a better way of getting the degrees? >> >> It seems your string is UTF-8. \xc2\xb0 is UTF-8 for DEGREE SIGN. If >> you type non-ASCII characters in source code, make sure to declare the >> encoding the file is *actually* saved in: >> >> # coding: utf-8 >> >> s = '''48° 13' 16.80" N''' >> q = s.decode('utf-8') >> >> # next line equivalent to previous two >> q = u'''48° 13' 16.80" N''' >> >> # couple ways to find the degrees >> print int(q[:q.find(u'°')]) >> import re >> print re.search(ur'(\d+)°',q).group(1) >> > > Mark is right about the source, but you needn't write unicode source > to process unicode data. Since nobody else mentioned my favorite way > of writing unicode in ASCII, try: > > IDLE 2.6.3 > >>> s = '''48\xc2\xb0 13' 16.80" N''' > >>> q = s.decode('utf-8') > >>> degrees, rest = q.split(u'\N{DEGREE SIGN}') > >>> print degrees > 48 > >>> print rest > 13' 16.80" N > > And if you are unsure of the name to use: > >>> import unicodedata > >>> unicodedata.name(u'\xb0') > 'DEGREE SIGN' It wouldn't be your favorite way if you were typing Chinese: x = u'我是美国人。' vs. x = u'\N{CJK UNIFIED IDEOGRAPH-6211}\N{CJK UNIFIED IDEOGRAPH-662F}\N{CJK UNIFIED IDEOGRAPH-7F8E}\N{CJK UNIFIED IDEOGRAPH-56FD}\N{CJK UNIFIED IDEOGRAPH-4EBA}\N{IDEOGRAPHIC FULL STOP}' ;^) Mark
- Previous message (by thread): a simple unicode question
- Next message (by thread): a simple unicode question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list