htmllib.py and parsing malformed HTML

KC nskhcarlso at bellsouth.net
Mon Sep 1 22:15:51 EDT 2003

Previous message (by thread): htmllib.py and parsing malformed HTML
Next message (by thread): htmllib.py and parsing malformed HTML [SOLVED]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I have written a parser using htmllib.HTMLParser and it functions fine 
unless the HTML is malformed.  For example, is some instances, the 
provider of the HTML leaves out the <TR> tags but includes the </TR> tags.

Apparently, htmllib and more likely sgmllib do not parse an end tag if a 
corresponding start tag was not found.  Does anyone know a way to "fool" 
the parser into handling the end tag is a start tag was not found?

Thanks,

Kevin

Previous message (by thread): htmllib.py and parsing malformed HTML
Next message (by thread): htmllib.py and parsing malformed HTML [SOLVED]
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

More information about the Python-list mailing list