DNP

Andrew Friedman

afriedman412 [at] gmail [dot] com

HomeWorkProjectsOpen SourceWritingContent


OPEN SOURCE CONTRIBUTIONS


PDFPlumber

Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables..
  • Implemented recognition of character spacing by fraction of font size (instead of total pixels)
  • Improved and streamlined code for interpreting rotated text and text written in any direction (vertically or horizontally)

FPDF2

Simple PDF generation for Python
  • Expanded capability to convert SVG images to PDF
  • Added support for SVG clipping paths
  • Improved SVG variable interpretation
  • Added proper formatting for multi-index tables

Textacy

Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after.
  • Improved quote detection to look for text within pairs of specific characters, instead of text between any sequential quotation mark-like characters
  • Improved accuracy of attribution and reduced false positives by expanding and adjusting window for attribution
  • Added code to prepare and standardize text for quote detection