Web Crawler - Python or Perl?
Stefan Behnel
stefan_ml at behnel.de
Mon Jun 9 17:08:43 EDT 2008
More information about the Python-list mailing list
Mon Jun 9 17:08:43 EDT 2008
- Previous message (by thread): Web Crawler - Python or Perl?
- Next message (by thread): Web Crawler - Python or Perl?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ray Cote wrote: > Beautiful Soup is a bit slower, but it will actually parse some of the > bizarre HTML you'll download off the web. [...] > I don't know if some of the quicker parsers discussed require > well-formed HTML since I've not used them. You may want to consider > using one of the quicker HTML parsers and, when they throw a fit on the > downloaded HTML, drop back to Beautiful Soup -- which usually gets > _something_ useful off the page. So does lxml.html. And if you still feel like needing BS once in a while, there's lxml.html.soupparser. http://codespeak.net/lxml/elementsoup.html Stefan
- Previous message (by thread): Web Crawler - Python or Perl?
- Next message (by thread): Web Crawler - Python or Perl?
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list