New Python PDF Libraries 2026
last commit 2 days ago kozea/weasyprint 8K +41
added 1 year ago
WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets, etc.
last commit 2 days ago py-pdf/pypdf 9K +24
added 1 year ago
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
last commit 5 days ago ocrmypdf/ocrmypdf 33K +116
added 1 year ago
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted.
last commit 11 months ago vikparuchuri/marker 24K +357
added 1 year ago
Marker PDF converts documents to markdown, JSON, and HTML quickly and accurately.
last commit 11 months ago docling-project/docling 28K +1246
added 1 year ago
Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.
Python AI / ML Libraries PDF Libraries Markdown Libraries Word / Excel Libraries
last commit 11 months ago opendatalab/mineru 32K +1411
added 1 year ago