Define APIs and models for new episodic memory by edwinyyyu · Pull Request #1199 · MemMachine/MemMachine

Purpose of the change

Motivation

  • current DeclarativeMemory does not handle multimodal content
  • current DeclarativeMemory does not handle chunking
  • current DeclarativeMemory produces high fan out in number of Neo4j queries
  • current DeclarativeMemory operations are not very atomic
  • VectorGraphStore is difficult to implement using other databases
  • we have a new VectorStore interface defined previously to move toward a solution
  • tangential: current DeclarativeMemory may face problems with top-k redundancy from vector search

Description

Designed to scale better than existing Neo4j implementation -- need concrete implementation to verify.

Changes:

  • define new extensible data models to support multimodal content and chunking
  • define new APIs to allow for more efficient and atomic operations

Approach to ingestion:

  • derivatives may or may not be deduplicated/consolidated
    • vector search hits a representative, which has multiple segments linked to it
    • some filters may be applied before, some filters may be applied after -- depending on whether the filter changes semantic meaning

Approach to search:

  • derivatives are no longer filterable
  • for low cardinality filters, we can get all entries from DB then do a brute force vector search
  • for high cardinality filters, we can do a vector DB search then post-filter
  • for no filters, we can do a vector DB search

Approach to deletion:

  • API designed with SQL in mind
  • use reference counting, active and purging states
    • purging state necessary as a lock to allow deleting from external DB (vector DB)

Alternatives considered:

  • return UUIDs of vectors to delete when vectors become orphaned: synchronous deletion does not scale well -- we will instead implement a garbage collection job -- delete will be soft until garbage collection job

Decisions to make:

  • whether this will replace existing memory or just be a new, better memory
  • naming

TODO:

  • implement new memory (basically the same basic logic as DeclarativeMemory but should be better)
  • change content to discriminated union or similar

Type of change

Will eventually be breaking or require new API.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (does not change functionality, e.g., code style improvements, linting)
  • Documentation update
  • Project Maintenance (updates to build scripts, CI, etc., that do not affect the main project)
  • Security (improves security without changing functionality)

How Has This Been Tested?

APIs and data models not tested.

  • Unit Test
  • Integration Test
  • End-to-end Test
  • Test Script (please provide)
  • Manual verification (list step-by-step instructions)

Checklist

  • I have signed the commit(s) within this pull request
  • My code follows the style guidelines of this project (See STYLE_GUIDE.md)
  • I have performed a self-review of my own code
  • I have commented my code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added unit tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings

Maintainer Checklist

  • Confirmed all checks passed
  • Contributor has signed the commit(s)
  • Reviewed the code
  • Run, Tested, and Verified the change(s) work as expected