iterparse and unicode
Fredrik Lundh
fredrik at pythonware.com
Thu Aug 21 01:48:57 EDT 2008
More information about the Python-list mailing list
Thu Aug 21 01:48:57 EDT 2008
- Previous message (by thread): iterparse and unicode
- Next message (by thread): iterparse and unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
George Sakkis wrote: > Thank you both for the suggestions. I made a few more experiments to > understand how iterparse behaves with respect to three dimensions: Spending time researching undefined behaviour is pretty pointless. ET parsers expect byte streams, because that's what XML files are. If you pass it anything else, an ET implementation may attempt to convert that thing to a byte string, run the game "rogue", or do something else that it finds appropriate. > It's interesting that the element text attributes after a successful > parse do not necessarily have the same type, i.e. all be str or all > unicode. I ported some text extraction code from BeautifulSoup (which > handles all text as unicode) and I was surprized to find out that in > xml.etree the returned text's type is not fixed, even within the same > file. Although it's not a bug, having a mixed collection of byte and > unicode strings from the same source makes me somewhat uneasy. If you don't care about memory and execution performance, there are plenty of toolkits that guarantee that you always get Unicode strings. </F>
- Previous message (by thread): iterparse and unicode
- Next message (by thread): iterparse and unicode
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list