New Python PDF Libraries 2026

last commit 2 days ago kozea/weasyprint 8K +41

added 1 year ago

WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous statistical reports, invoices, tickets, etc.

Python PDF Libraries

last commit 2 days ago py-pdf/pypdf 9K +24

added 1 year ago

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Python PDF Libraries

last commit 5 days ago ocrmypdf/ocrmypdf 33K +116

added 1 year ago

OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched or copy-pasted.

Python OCR Libraries PDF Libraries

last commit 11 months ago vikparuchuri/marker 24K +357

added 1 year ago

Marker PDF converts documents to markdown, JSON, and HTML quickly and accurately.

Python PDF Libraries Markdown Libraries

last commit 11 months ago docling-project/docling 28K +1246

added 1 year ago

Docling simplifies document processing, parsing diverse formats — including advanced PDF understanding — and providing seamless integrations with the gen AI ecosystem.

Python AI / ML Libraries PDF Libraries Markdown Libraries Word / Excel Libraries

last commit 11 months ago opendatalab/mineru 32K +1411

added 1 year ago

A high-quality tool for convert PDF to Markdown and JSON.

Python Markdown Libraries PDF Libraries