Releases · Unstructured-IO/unstructured

0.22.16

0.22.16

Enhancements

  • Formula markdown export (element_to_md / elements_to_md): New keyword-only formula_markdown_style ("auto", "display_math", "plain"; default "auto"). In "auto", display math ($$ ... $$) is used only when the text looks like notation (heuristic score) and contains no $/$$ (avoids breaking Markdown and noisy OCR captions). "display_math" wraps whenever safe (still falls back to plain if $ would corrupt fences). "plain" emits text only. Optional normalize_formula (default True) maps common Unicode operators to LaTeX-like tokens; normalize_formula stays before keyword-only options so positional encoding / no_group_by_page callers are unchanged. Unicode is never mapped to \\sqrt{}. Module constants: FORMULA_MARKDOWN_AUTO, FORMULA_MARKDOWN_DISPLAY_MATH, FORMULA_MARKDOWN_PLAIN.

0.22.15

Security

  • security: fix(deps): upgrade vulnerable transitive dependencies [security]

0.22.14

Enhancements

  • Deduplicate PDF rendering: Remove _render_pdf_pages and delegate to unstructured-inference's convert_pdf_to_image (which already has lazy per-page rendering). Peak memory for path_only=True drops from O(n_pages) to O(1 page) — 97% reduction on a 100-page PDF. Bumps inference dep to >=1.6.2.

0.22.13

Enhancements

  • Speed up standardize_quotes: Replace loop-based character replacement with a single str.translate() call using a pre-computed translation table. Also fixes a pre-existing bug where left smart quotes were never normalized due to duplicate dictionary keys.

0.22.12

What's Changed

  • mem: exclude unused spaCy pipeline components to reduce model memory by @KRRT7 in #4296
  • fix: pdfminer drops extractable text by @qued in #4310

Full Changelog: 0.22.10...0.22.12

0.22.10

0.22.6

What's Changed

  • fix(deps): Update security updates [SECURITY] by @utic-renovate[bot] in #4303
  • fix: Self-contained script for version extraction in release CI by @vladimir-kivi-ds in #4304

Full Changelog: 0.22.4...0.22.6

0.22.4

What's Changed

New Contributors

Full Changelog: 0.21.5...0.22.4

0.21.5

0.21.2

fix: self-install pinned spaCy model at runtime with SHA256 verificat…

0.21.1

0.21.0

0.21.0

Fixes

  • Replace NLTK with spaCy to remediate CVE-2025-14009: NLTK's downloader uses zipfile.extractall() without path validation, enabling RCE via malicious packages (CVSS 10.0, no patch available). spaCy models install as pip packages, eliminating the vulnerable downloader entirely.

0.20.8