[GH-1147] Fix null byte crash in semantic memory ingestion by o-love · Pull Request #1165

[GH-1147] Fix null byte crash in semantic memory ingestion by o-love · Pull Request #1165 · MemMachine/MemMachine

Prevent asyncpg.CharacterNotInRepertoireError crashes caused by null bytes (\x00) in LLM-generated strings inserted into PostgreSQL TEXT columns.

Implements defense-in-depth sanitization at two layers:

Pydantic model layer (early catch): field_validator on SemanticCommand and LLMReducedFeature strips null bytes as soon as LLM output is parsed, before it reaches any storage code.
PostgreSQL storage layer (backstop): A shared sanitize_pg_text() utility is called in add_feature() and update_feature() to strip null bytes from tag, feature, and value fields before insertion — catching any path that bypasses models (e.g. REST API direct inserts).

The sanitize_pg_text utility logs a warning when it strips null bytes so data-quality issues are visible in monitoring.

Fixes #1147

Tests added:

TestSanitizePgText — 10 unit tests covering clean passthrough, null byte stripping (single, multiple, leading, trailing, all-null), warning logging, and no-log for clean strings.
TestSemanticCommandNullByteStripping — 4 tests verifying the Pydantic validator strips null bytes from feature, tag, and value fields on SemanticCommand.
TestLLMReducedFeatureNullByteStripping — 4 tests verifying the same for LLMReducedFeature.
test_add_feature_with_null_bytes_does_not_crash — Regression test confirming null bytes in feature values do not crash storage insertion.

Test Results: 1329 passed, 9 skipped, 0 failures. Ruff lint and format clean.

N/A

Not in scope: Fixing the LLM itself (upstream model behavior), per-feature error recovery in ingestion (separate follow-up), and Neo4j storage sanitization (handles null bytes differently).
The sanitize_pg_text utility is intentionally kept separate from the Pydantic validators so it can serve as a backstop for all PostgreSQL insertion paths, not just model-validated ones.