read all available pages on a Website
Alex Martelli
aleaxit at yahoo.com
Mon Sep 13 05:09:08 EDT 2004
More information about the Python-list mailing list
Mon Sep 13 05:09:08 EDT 2004
- Previous message (by thread): read all available pages on a Website
- Next message (by thread): read all available pages on a Website
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Leif K-Brooks <eurleif at ecritters.biz> wrote: > Tim Roberts wrote: > > Brad Tilley <bradtilley at usa.net> wrote: > > > >>Is there a way to make urllib or urllib2 read all of the pages on a Web > >>site? > > By the way, there are many web sites for which this sort of behavior is not > > welcome. > > Any site that didn't want to be crawled would most likely use a > robots.txt file, so you could check that before doing the crawl. Python's Tools/webchecker/ directory has just the code you need for all of this. The directory is part of the Python source distribution, but it's all pure Python code, so, if your distribution is binary and omits that directory, just download the Python source distribution, unpack it, and there you are. Alex
- Previous message (by thread): read all available pages on a Website
- Next message (by thread): read all available pages on a Website
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list