ASCII decoding error with xml.dom.minidom
Martin von Loewis
loewis at informatik.hu-berlin.de
Sat Jun 16 14:44:41 EDT 2001
More information about the Python-list mailing list
Sat Jun 16 14:44:41 EDT 2001
- Previous message (by thread): ASCII decoding error with xml.dom.minidom
- Next message (by thread): qt or gtk?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
gustafl at algonet.se (Gustaf Liljegren) writes: > Still got a problem with encoding/decoding errors when working with > xml.dom.minidom. I have run into something I didn't ask for. The DOM module > continues to output everything as Unicode strings, even if the file is a > typical 'plain text' XML file with an ISO 8859-1 encoding attribute in the > XML declaration! XML files are never plain text; all XML files are Unicode. See the XML recommendation for details. The file may be represented in some encoding; a DOM implementation is required to present the contents as Unicode. See the DOM recommendation for details. > The input data comes from two directions: one XML file, where the > input takes the form of Unicode strings as described above, and a > mailbox file, in Latin 1. Content from these two sources should be > mixed together in an XML output file. My guess is that you put byte strings into the DOM tree. You should not do that; instead, you should convert all strings to Unicode before putting them into the tree. You can get away with putting byte strings into the tree when they have all bytes <127. > Ideally, I'd like the output XML file in Latin 1. I wonder if there's an > easy way to decode everything in the DOM object to Latin 1, so that this > won't happen? No, that's not possible. Currently, toxml will return a Unicode string, it is then the caller's responsibility to convert this to UTF-8 (as toxml will not have put an encoding directive into the document). toxml should probably be extended to support various output encodings. Even if it does, the DOM tree still must contain Unicode strings only. Regards, Martin
- Previous message (by thread): ASCII decoding error with xml.dom.minidom
- Next message (by thread): qt or gtk?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list