Fw: PDF library for reading PDF files
Cameron Laird
claird at lairds.com
Mon Jan 19 08:04:34 EST 2004
More information about the Python-list mailing list
Mon Jan 19 08:04:34 EST 2004
- Previous message (by thread): Fw: PDF library for reading PDF files
- Next message (by thread): Fw: PDF library for reading PDF files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
In article <oxEOb.96911$Vs3.36407 at twister.socal.rr.com>, Robert Kern <rkern at ucsd.edu> wrote: >Cameron Laird wrote: >> In article <Xns9474CBDE9B2D7cpl19ghumspamgourmet at 62.153.159.134>, >> Harald Massa <cpl.19.ghum at spamgourmet.com> wrote: >> >>>>I am looking for a library in Python that would read PDF files and I >>>>could extract information from the PDF with it. I have searched with >>>>google, but only found libraries that can be used to write PDF files. >>> >>>reportlab has a lib called pagecatcher; it is fully supported with python, >>>it is not free. >>> >>>Harald >> >> >> ReportLab's libraries are great things--but they do not "extract >> information from the PDF" in the sense I believe the original >> questioner intended. > >No, but ReportLab (the company) has a product separate from reportlab >(the package) called PageCatcher that does exactly what the OP asked >for. It is not open source, however, and costs a chunk of change. Let's take this one step farther. Two posts now have quite clearly recommended ReportLab's PageCatcher <URL: http://reportlab.com/docs/pagecatcher-ds.pdf >. I completely understand and agree that ReportLab supports a mix of open-source, no-fee, and for-fee products, and that PageCatcher carries a significant license fee. I entirely agree that PageCatcher "read[s] PDF files ... and ... extract[s] information from the PDF with it." HOWEVER, I suspect that what the original questioner meant by his words was some sort of PDF-to-text "extrac- tion" (true?) and, unless PageCatcher has changed a lot since I got my last copy, PDF-to-text is NOT one of its functions. -- Cameron Laird <claird at phaseit.net> Business: http://www.Phaseit.net
- Previous message (by thread): Fw: PDF library for reading PDF files
- Next message (by thread): Fw: PDF library for reading PDF files
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list