DOM to SAX using generators?
Alan Kennedy
alanmk at hotmail.com
Mon Oct 14 11:26:41 EDT 2002
More information about the Python-list mailing list
Mon Oct 14 11:26:41 EDT 2002
- Previous message (by thread): DOM to SAX using generators?
- Next message (by thread): Linux install question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi all, I'm writing some tree (XML) processing stuff at the moment, and working with a variety of SAX processors to process chains of events. One of the things that I do with the trees is to generate SAX events, by doing a recursive descent on the tree. This is very straightforward, and easily implemented. However, I've been itching to make use of generators, and at first this appeared to be a classic case for using generators. DOM trees and generators seem like a very natural fit, and I see that just today Uche Ogbuji announced his article on exactly this topic. http://www-106.ibm.com/developerworks/xml/library/x-tipgenr.html However, when producing SAX events from some types of DOM node, e.g. elements, documents, etc, that contain child nodes, one needs to produce a "start" event, produce events for all of the child nodes, and then produce an "end" event. But when I try to come up with an algorithm to do that using generators, I have trouble. I can have a generator which delivers the nodes in document order, like so def genNodes(node): yield node if node.children: for child in node.children: yield genNodes(child) And then I can have a "genEvents" method, like so def genEvents(node, consumer): if node.nodeType == DOCUMENT_NODE: consumer.startDocument() # What goes here? consumer.endDocument() elif node.nodeType == ELEMENT_NODE: # etc, etc And then lastly I use a simple for loop to iterate over the nodes. for n in genNodes(tree): genEvents(n, consumer) How can I make this work? The obvious way that I can see to generate the events for the child nodes (in the section marked "What goes here") is to recursively visit the children. But that's the wrong approach, since I'll end up processing the children twice: once through the recursive descent and once through the series of nodes returned by the generator. One method that oocurs to me is that I keep a stack of nodes that have been visited so far, and generate the end events as they are popped off the stack (when?). However, this involves creating a data structure that decreases the efficiency of the algorithm. How can I make generators to do this in a way that is more efficient than the simple recursive descent approach? Actually, thinking further about it, another possibility suggests itself: have the generator yield nodes twice, like so: START = 1 END = 2 def genNodes(node): yield START, node if node.children: for child in node.children: yield genNodes(child) yield END, node But that doesn't seem entirely satisfactory either. It appears to double the number of generator calls, thus making it less efficient. Regards, -- alan kennedy ----------------------------------------------------- check http headers here: http://xhaus.com/headers email alan: http://xhaus.com/mailto/alan
- Previous message (by thread): DOM to SAX using generators?
- Next message (by thread): Linux install question
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list