Simple character translation problem
Steffen Ries
steffen.ries at sympatico.ca
Sat Sep 22 09:41:17 EDT 2001
More information about the Python-list mailing list
Sat Sep 22 09:41:17 EDT 2001
- Previous message (by thread): Simple character translation problem
- Next message (by thread): Code Coverage support for unittest?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Martin von Loewis <loewis at informatik.hu-berlin.de> writes: > David Eppstein <eppstein at ics.uci.edu> writes: > > > I have: user input text, in Mac character set encoding > > I want: ASCII with HTML-entities coding the accented characters. > > > > E.g. "café" should become "café". > > Is there code already in Python to do this easily? ... > Or, you could try to use external entities where possible. For that, > please have a look at htmlentitydefs.entitydefs. Using that is not > straight forward: you have to invert the dictionary, and you have to > convert the keys into Unicode keys. For the keys that are > single-character strings (e.g. '\306'), you can use the Unicode > character with the same ordinal. For characters above 255, you have to > convert between the character entity and a Unicode character. Ok, I'll bite: --8<-- _u2html = {} # unicode to html mapping def _make_u2html(): from htmlentitydefs import entitydefs def c2u(c): if len(c) == 1: return unicode(c, 'latin1') if c.startswith('&#'): return unichr(int(c[2:-1])) for entity,val in entitydefs.items(): _u2html[c2u(val)] = "&%s;" % entity def htmlentityEncode(s): """ convert unicode string s to ascii, replace non-ascii characters with html entitydef or "?" """ if not _u2html: _make_u2html() l = [_u2html.get(c, c) for c in s] return ''.join(l).encode('ascii', 'replace') --8<-- >>> htmlentityEncode(u"café") 'café' /steffen -- steffen.ries at sympatico.ca <> Gravity is a myth -- the Earth sucks!
- Previous message (by thread): Simple character translation problem
- Next message (by thread): Code Coverage support for unittest?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list