Python File Handling: .xls .doc .pdf ???
Gerhard Häring
gerhard.haering at opus-gmbh.net
Fri Feb 7 03:35:43 EST 2003
More information about the Python-list mailing list
Fri Feb 7 03:35:43 EST 2003
- Previous message (by thread): pycrust editor
- Next message (by thread): Python File Handling: .xls .doc .pdf ???
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Ken Favrow <KenFavrow at attbi.com> wrote: > I'm trying to make a somewhat simple search engine, but would need to be > able to read .xls .doc and possibly .pdf for it to be entirely useful. I > just need to be able to see enough content to find keywords. I've already > done it with txt and html. How might I accomplish this with the other > formats?? There are various utilities to convert these formats into plain text: antiword, catdoc, xlHtml, ... Some of these converters produce HTML. But HTML can be easily converted to plain text: $commandline_browser -dump <html-file> where commandline_browser in ('lynx', 'w3m', 'links'). http://www.spocom.com/users/gjohnson/mutt/#office might be of interest to you, as it includes links to all of these utilities. -- Gerhard
- Previous message (by thread): pycrust editor
- Next message (by thread): Python File Handling: .xls .doc .pdf ???
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list