Python web client anyone?
Bill Bell
bill-bell at bill-bell.hamilton.on.ca
Mon Oct 15 07:44:34 EDT 2001
More information about the Python-list mailing list
Mon Oct 15 07:44:34 EDT 2001
- Previous message (by thread): Python web client anyone?
- Next message (by thread): Python web client anyone?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Paul Rubin <phr-n2001d at nightsong.com> wrote, in part: > ... I was looking for something that actually parses the HTML on > the retrieved page like LWP does. I wonder if there's some way to > do that with the XML libraries (though HTML is generally not > well-formed XML ... Paul, If your platform is MSW then you might consider using MSHTML. It's the HTML parser+ that's embedded in IE, and it can be exercised as a COM object. Clearly a product like IE does an excellent job of parsing broken HTML docs and MSHTML is I believe freely distributable. The snag in using MSHTML with Python is that Python is as yet unable to process vtable-based interfaces (which is really needed to use MSHTML)--ref Mark Hammond's remarks of several weeks ago. One way around this problem is to model code on the 'walkall' example provided on MSDN and wrap it in some way to make what you want accessible in Python. I have not investigated what's available for parsing HTML on other platforms. However, the same general strategy (ie, that of exercising one of the best available web clients on the platform) might work in those cases too. Best of luck, Bill "It is the time that you have wasted for your rose that makes your rose so important."--St-Exupery
- Previous message (by thread): Python web client anyone?
- Next message (by thread): Python web client anyone?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list