feat: add PDF Keyword Highlighter script (closes #478) by SurfyPenguin · Pull Request #522 · wasmerio/Python-Scripts
PR Title
feat: add PDF Keyword Highlighter script (closes #478 )
Summary
Added a new command-line Python script that highlights specified keywords in PDF files using PyMuPDF, complete with a dedicated folder, README, and entry in the main repository README.
Description
This pull request implements a fully featured PDF keyword highlighter as requested in issue #478, creating a new highlighted output file while keeping the original unchanged.
The changes are as follows:
- Created new folder PDF
Highlighter Script/withpdf_highlight.pyand aREADME.md - Implemented efficient keyword highlighting using
page.get_text("words")for fast text extraction - Supported multiple keywords, optional case-sensitive search (
-sflag), and punctuation stripping for accurate matching (e.g.,"keyword;"matches"keyword") - Printed per-page and total highlight statistics in a formatted table
- Updated root
README.mdto add the new script entry in alphabetical order
Checks
in the repository
- Made no changes that degrades the functioning of the repository
- Gave each commit a better title (unlike updated README.md)
in the PR
- Followed the format of the pull_request_template
- Made the Pull Request in a small level (for the creator's wellfare)
- Tested the changes you made
Thank You,
Amartya Anand