reading text in pdf, some working sample code
dieter
dieter at handshake.de
Wed Nov 22 02:37:20 EST 2017
More information about the Python-list mailing list
Wed Nov 22 02:37:20 EST 2017
- Previous message (by thread): reading text in pdf, some working sample code
- Next message (by thread): reading text in pdf, some working sample code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Daniel Gross <grossd18 at gmail.com> writes: > I am new to python and jumped right into trying to read out (english) text > from PDF files. > > I tried various libraries (including slate) You could give "pdfminer" a try. Note, however, that it may not be possible to extract the text: PDF is a generic format which works by mapping character codes to glyphs (i.e. visual symbols); if your PDF uses a special map for this (especially with non standard glyph collections (aka "font"s)), then the text extraction (which in fact extracts sequences of character codes) can give unusable results.
- Previous message (by thread): reading text in pdf, some working sample code
- Next message (by thread): reading text in pdf, some working sample code
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Python-list mailing list