Script to extract text from PDF files
Paul Hankin
paul.hankin at gmail.com
Tue Sep 25 15:02:48 EDT 2007
More information about the Python-list mailing list
Tue Sep 25 15:02:48 EDT 2007
- Previous message (by thread): Survey about Visual Annotations for Software Models
- Next message (by thread): Script to extract text from PDF files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Sep 25, 6:41 pm, brad <byte8b... at gmail.com> wrote: > I have a very crude Python script that extracts text from some (and I > emphasize some) PDF documents. On many PDF docs, I cannot extract text, > but this is because I'm doing something wrong. The PDF spec is large and > complex and there are various ways in which to store and encode text. I > wanted to post here and ask if anyone is interested in helping make the > script better which means it should accurately extract text from most > any pdf file... not just some. > > I know the topic of reading/extracting the text from a PDF document > natively in Python comes up every now and then on comp.lang.python... > I've posted about it in the past myself. After searching for other > solutions, I've resorted to attempting this on my own in my spare time. > Using apps external to Python (pdftotext, etc.) is not really an option > for me. If someone knows of a free native Python app that does this now, > let me know and I'll use that instead! Googling for 'pdf to text python' and following the first link gives http://pybrary.net/pyPdf/ -- Paul Hankin
- Previous message (by thread): Survey about Visual Annotations for Software Models
- Next message (by thread): Script to extract text from PDF files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list