Website data-mining.
SMERSH009
SMERSH009X at gmail.com
Fri Aug 3 23:18:43 EDT 2007
More information about the Python-list mailing list
Fri Aug 3 23:18:43 EDT 2007
- Previous message (by thread): Website data-mining.
- Next message (by thread): Website data-mining.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Aug 3, 7:50 pm, Coogan <pcb2... at columbia.edu> wrote: > Hi-- > > I'm using Python for the first time to make a plug-in for Firefox. > The goal of this plug-in is to take the source code from a website > and use the metadata and body text for different kinds of analysis. > My question is: How can I retrieve data from a website? I'm not even > sure if this is possible through Python. Any help? > > nieu How about this? it will fetch the HTML source of the page. import datetime, time, re, os, sys, traceback, smtplib, string,\ urllib2, urllib, inspect from urllib2 import build_opener, HTTPCookieProcessor, Request opener = build_opener(HTTPCookieProcessor) from urllib import urlencode def urlopen2(url, data=None, user_agent='urlopen2'): """Opens Our URLS """ if hasattr(data, "__iter__"): data = urlencode(data) headers = {'User-Agent' : user_agent} return opener.open(Request(url, data, headers)) ###TESTCASES START HERE### def publishedNotes(): page = urlopen2("http://www.yourURL.com", ()) pageRead = page.read() print pageRead if __name__ == '__main__': publishedNotes() sys.exit()
- Previous message (by thread): Website data-mining.
- Next message (by thread): Website data-mining.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list