Why is xml.dom.minidom so slow?
Bjorn Pettersen
BPettersen at NAREX.com
Thu Jan 2 17:29:21 EST 2003
More information about the Python-list mailing list
Thu Jan 2 17:29:21 EST 2003
- Previous message (by thread): Why is xml.dom.minidom so slow?
- Next message (by thread): Why is xml.dom.minidom so slow?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> From: Martin v. Löwis [mailto:martin at v.loewis.de] > > "Bjorn Pettersen" <BPettersen at NAREX.com> writes: > > > All I'm doing boils down to: > > > > response = rf.nextResponse() > > dom = parseString(response) > > > > in a loop. Am I doing something wrong? > > You have to give more details. What Python version? PyXML or > stock Python? One traditional reason is that people, not > knowingly, have used PyXML xmlproc, which is a pure-Python > parser, instead of Expat. Python 2.2.1 without PyXML. The full code looks like: def test(): from xml.dom.minidom import parseString rf = ResponseFile('c:/data/Testoutput.xml') count = 0 start = time.time() try: while 1: # nextResponse() returns a complete xml # document as a string (throws at eof). response = rf.nextResponse() dom = dom = parseString(response) count += 1 sys.stdout.write('.') except: pass stop = time.time() return count, stop-start If I'm reading the minidom/pulldom files correctly this should use Expat(?) > PyXML 0.8.x has a number of speed improvements for > minidom-with-expat (such as eliminating the SAX driver), and > memory usage improvements (such as interning element and > attribute names). As a test, I tried building my own tree directly from the Expat events. This was about 4 times faster (2.89 accts/sec), but still far from fast enough... I'm starting to think a custom C++ parser might be the way to go (and here I was having such a nice day <sigh>). > > Is there a faster way when all I need is a traversable tree > > structure as the result? > > "All I need" reads quite funny in this context, as producing > a traversable tree is one of the more expensive ways for XML > processing. There are certainly faster ways if you *don't* > need a traversable tree. :-) Unfortunately they're not my requirements. (They go something like: "we will eventually need all the data, so put them in a form that the next step can traverse to put into a DB".) If you think a different approach is better I'm all ears :-) Thanks for the interest. -- bjorn
- Previous message (by thread): Why is xml.dom.minidom so slow?
- Next message (by thread): Why is xml.dom.minidom so slow?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list