More regex help
Support Desk
support.desk.ipg at gmail.com
Wed Sep 24 12:25:02 EDT 2008
More information about the Python-list mailing list
Wed Sep 24 12:25:02 EDT 2008
- Previous message (by thread): Making small executive file for distribution
- Next message (by thread): More regex help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I am working on a python webcrawler, that will extract all links from an html page, and add them to a queue, The problem I am having is building absolute links from relative links, as there are so many different types of relative links. If I just append the relative links to the current url, some websites will send it into a never-ending loop. What I am looking for is a regexp that will extract the root url from any url string I pass to it, such as 'http://example.com/stuff/stuff/morestuff/index.html' Regexp = http:example.com 'http://anotherexample.com/stuff/index.php Regexp = 'http://anotherexample.com/ 'http://example.com/stuff/stuff/ Regext = 'http://example.com'
- Previous message (by thread): Making small executive file for distribution
- Next message (by thread): More regex help
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list