GitHub - Disane87/docudigger: Website scraper for getting invoices automagically as pdf (useful for taxes or DMS)

docudigger - ARCHIVED

This project has been archived and is no longer maintained.

The successor is Scrape Dojo — a much more powerful, flexible, and feature-rich web scraping platform.

Why Scrape Dojo?

Scrape Dojo is the next evolution of docudigger. While docudigger was limited to Amazon invoice scraping, Scrape Dojo is a full-featured, self-hosted web scraping & browser automation platform.

Key Features of Scrape Dojo

Declarative JSON/JSONC Workflows — Define scrapes as code, no more writing Puppeteer scripts manually
25+ Built-in Actions — Navigate, click, type, extract, loop, download, screenshot, and more
Universal Scraping — Not limited to Amazon; scrape any website with customizable workflows
Cron Scheduling & Webhooks — Automate scrapes with cron patterns, webhooks, or startup triggers
Handlebars + JSONata Templates — Dynamic templates and powerful data transformations
Encrypted Secrets — AES-256-CBC at-rest encryption for credentials
Real-time Monitoring — SSE-powered live execution tracking with a modern Angular UI
Authentication & SSO — JWT, OIDC/SSO, MFA/TOTP, API keys
Multi-Database Support — SQLite (default), MySQL, PostgreSQL
Docker-Ready — Easy deployment with Docker Compose
Modern Tech Stack — Built with NestJS, Angular, Puppeteer, TypeScript, and Nx

Get Started with Scrape Dojo

Full documentation: scrape-dojo.com

Original Project

docudigger was a document scraper for getting invoices automatically as PDF (useful for taxes or DMS). It supported Amazon invoice scraping via CLI or Docker.

Author

Marco Franke

License

MIT