Re: string u'hyv\xe4' to file as 'hyvä'
Alex Willmer
alex at moreati.org.uk
Mon Dec 27 04:55:47 EST 2010
More information about the Python-list mailing list
Mon Dec 27 04:55:47 EST 2010
- Previous message (by thread): Re: string u'hyv\xe4' to file as 'hyvä'
- Next message (by thread): string u'hyv\xe4' to file as 'hyvä'
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Dec 27, 6:47 am, "Mark Tolonen" <metolone+gm... at gmail.com> wrote: > "gintare" <g.statk... at gmail.com> wrote in message > > In file i find 'hyv\xe4' instead of hyv . > > When you open a file with codecs.open(), it expects Unicode strings to be > written to the file. Don't encode them again. Also, .writelines() expects > a list of strings. Use .write(): > > import codecs > item=u'hyv\xe4' > F=codecs.open('/opt/finnish.txt', 'w+', 'utf8') > F.write(item) > F.close() Gintare, Mark's code is correct. When you are reading the file back make sure you understand what you are seeing: >>> F2 = codecs.open('finnish.txt', 'r', 'utf8') >>> item2 = F2.read() >>> item2 u'hyv\xe4' That might like as though item2 is 7 characters long, and it contains a backslash followed by x, e, 4. However item2 is identical to item, they both contain 4 characters - the final one being a-umlaut. Python has shown the string using a backslash escape, because printing a non- ascii character might fail. You can see this directly, if your Python session is running in a terminal (or GUI) that can handle non-ascii characters: >>> print item2 hyvä
- Previous message (by thread): Re: string u'hyv\xe4' to file as 'hyvä'
- Next message (by thread): string u'hyv\xe4' to file as 'hyvä'
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list