PyXML, Sax, error in processing external entity reference
Uche Ogbuji
uche at ogbuji.net
Sat Feb 28 01:59:58 EST 2004
More information about the Python-list mailing list
Sat Feb 28 01:59:58 EST 2004
- Previous message (by thread): PyXML, Sax, error in processing external entity reference
- Next message (by thread): Python Profiler Trouble
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
David Dorward <dorward at yahoo.com> wrote in message news:<c1j2q8$6pb$1$8300dec7 at news.demon.co.uk>... > I'm attempting to read an XHTML 1.1 file[1], perform some DOM manipulation, > then write the results to a different file. > > I've found myself rather stuck at the first hurdle. > > I have the following: > > from xml.dom.ext.reader import Sax2 > reader = Sax2.Reader() > f = open('dorward.me.uk/sitemap.html', 'r') > doc = reader.fromStream(f) > > (dorward.me.uk/sitemap.html being a local copy of > http://dorward.me.uk/sitemap.html) > > ... which outputs the following: > > Traceback (most recent call last): > File "x.py", line 4, in ? > doc = reader.fromStream(f) > File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py", > line 372, in fromStream > self.parser.parse(s) > File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line > 109, in parse > xmlreader.IncrementalParser.parse(self, source) > File "/usr/lib/python2.3/site-packages/_xmlplus/sax/xmlreader.py", line > 123, in parse > self.feed(buffer) > File "/usr/lib/python2.3/site-packages/_xmlplus/sax/expatreader.py", line > 220, in feed > self._err_handler.fatalError(exc) > File "/usr/lib/python2.3/site-packages/_xmlplus/dom/ext/reader/Sax2.py", > line 340, in fatalError > raise exception > xml.sax._exceptions.SAXParseException: > http://www.w3.org/TR/xhtml-modularization/DTD/xhtml-notations-1.mod:115:0: > error in processing external entity reference > > I'm not sure where I should proceed from here. Is it a bug in my code? In > PyXML? In the DTD itself? What should I do next? The bug is with the W3C. Through a chain of parameter entity refs, it http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd references http://www.w3.org/TR/xhtml-modularization/DTD/xhtml11-model-1.mod, which gives 404 (and yes XML heads, it is in an INCLUDE section so the URI must be traversed unless there's a resoltion through pubID). I'm actually rather amazed at such carelessness by the W3C, but I don't have time to dig further to see if I can figure out how things got broken. I can tell you that you can use minidom or OK with this because it does not even read the external DTD subset: >>> from xml.dom import minidom >>> doc = minidom.parse('sitemap.html') >>> doc <xml.dom.minidom.Document instance at 0x400635ec> >>> Also, 4Suite's cDomlette makes it easy for you to avoid the DTD problem: >>> from Ft.Xml.Domlette import NoExtDtdReader >>> doc = NoExtDtdReader.parseUri("file:sitemap.html") >>> doc <cDocument at 0x0x403ab42c> >>> http://4suite.org http://uche.ogbuji.net/tech/akara/nodes/2003-01-01/domlettes Good luck. --Uche http://uche.ogbuji.net
- Previous message (by thread): PyXML, Sax, error in processing external entity reference
- Next message (by thread): Python Profiler Trouble
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list