[Python-Dev] HTML parsing: anyone use formatter?
amk at amk.ca
amk at amk.ca
Thu Oct 30 14:27:18 EST 2003
More information about the Python-Dev mailing list
Thu Oct 30 14:27:18 EST 2003
- Previous message: [Python-Dev] Speeding up regular expression compilation
- Next message: [Python-Dev] HTML parsing: anyone use formatter?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
[Crossposted to python-dev, web-sig, and xml-sig. Followups to web-sig at python.org, please.] I'm working on bringing htmllib.py up to HTML 4.01 by adding handlers for all the missing elements. I've currently been adding just empty methods to the HTMLParser class, but the existing methods actually help render the HTML by calling methods on a Formatter object. For example, the definitions for the H1 element look like this: def start_h1(self, attrs): self.formatter.end_paragraph(1) self.formatter.push_font(('h1', 0, 1, 0)) def end_h1(self): self.formatter.end_paragraph(1) self.formatter.pop_font() Question: should I continue supporting this in new methods? This can only go so far; a tag such as <big> or <small> is easy for me to handle, but handling <form> or <frameset> or <table> would require greatly expanding the Formatter class's repertoire. I suppose the more general question is, does anyone use Python's formatter module? Do we want to keep it around, or should htmllib be pushed toward doing just HTML parsing? formatter.py is a long way from being able to handle modern web pages and it would be a lot of work to build a decent renderer. --amk
- Previous message: [Python-Dev] Speeding up regular expression compilation
- Next message: [Python-Dev] HTML parsing: anyone use formatter?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-Dev mailing list