encoding problem with BeautifulSoup - problem when writing parsed text to file
jmfauth
wxjmfauth at gmail.com
Thu Oct 6 13:41:57 EDT 2011
More information about the Python-list mailing list
Thu Oct 6 13:41:57 EDT 2011
- Previous message (by thread): encoding problem with BeautifulSoup - problem when writing parsed text to file
- Next message (by thread): encoding problem with BeautifulSoup - problem when writing parsed text to file
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On 6 oct, 06:39, Greg <gregor.hochsch... at googlemail.com> wrote: > Brilliant! It worked. Thanks! > > Here is the final code for those who are struggling with similar > problems: > > ## open and decode file > # In this case, the encoding comes from the charset argument in a meta > tag > # e.g. <meta charset="iso-8859-2"> > fileObj = open(filePath,"r").read() > fileContent = fileObj.decode("iso-8859-2") > fileSoup = BeautifulSoup(fileContent) > > ## Do some BeautifulSoup magic and preserve unicode, presume result is > saved in 'text' ## > > ## write extracted text to file > f = open(outFilePath, 'w') > f.write(text.encode('utf-8')) > f.close() > or (Python2/Python3) >>> import io >>> with io.open('abc.txt', 'r', encoding='iso-8859-2') as f: ... r = f.read() ... >>> repr(r) u'a\nb\nc\n' >>> with io.open('def.txt', 'w', encoding='utf-8-sig') as f: ... t = f.write(r) ... >>> f.closed True jmf
- Previous message (by thread): encoding problem with BeautifulSoup - problem when writing parsed text to file
- Next message (by thread): encoding problem with BeautifulSoup - problem when writing parsed text to file
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list