Analyse of PDF (or EPS?)
David Boddie
davidb at mcs.st-and.ac.uk
Tue Nov 25 15:58:13 EST 2003
More information about the Python-list mailing list
Tue Nov 25 15:58:13 EST 2003
- Previous message (by thread): Analyse of PDF (or EPS?)
- Next message (by thread): Analyse of PDF (or EPS?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Johan Holst Nielsen <johan at weknowthewayout.com> wrote in message news:<3fbe00e8$0$95070$edfadb0f at dread11.news.tele.dk>... > David Boddie wrote: > > The full PDF specification is not exactly short, but it's fairly readable. > > Yep... I tried it... but there are no reason to do exactly the same - if > other people already have done that. And time is an issue too ;) Time is always an issue. How much of it do you have? ;-) > > I have a Python library which is able to identify a lot of the structure in simple > > documents, including basic text extraction, but I've become pretty disillusioned > > with it because so much work is required to extract more complex information. > > > > Maybe it's time to stick a license on it and upload it somewhere. > > Well, let me know ;) Maybe I could get an demo or something? That would > be nice :) You may be disappointed, but here it is: http://www.boddie.org.uk/david/Projects/Python/pdftools/ The core of the library was written in a hurry over two years ago; later refinements make it only slightly more robust. It was never really intended for anything other than exploring the structure of PDF files. Basic use: import pdftools file = "MyFile.pdf" doc = pdftools.PDFdocument(file) print "Document uses PDF format version", doc.document_version() pages = doc.count_pages() print "Document contains %i pages." % pages if pages > 123: page123 = doc.read_page(123) contents123 = page123.read_contents() print "The objects found in this page:" print print contents123.contents I've not really dealt with the coordinate system very well. Ideally, it would be trivial to extract all the device-independent positioning information but, whenever I start to look at this, I get distracted. :-) Have fun, and don't expect too much, David
- Previous message (by thread): Analyse of PDF (or EPS?)
- Next message (by thread): Analyse of PDF (or EPS?)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list