DistributedPrompt (for scaling up RLMs)
Horizontally scalable prompts for Recursive Language Models (RLMs).
DistributedPrompt is a drop-in replacement for Python's str that stores
prompt data across fixed-size shards on disk or S3. Slicing
(prompt[n:n+k]) performs O(1) shard lookup — only the needed data is
fetched. This lets RLMs operate on prompts with 100M+ characters without
holding them in memory.
Quick start
uv sync --all-groups # installs Python 3.12, dev & test deps uv tool install -e . # installs the dprompt CLI on your PATH
Ingest a file
uv run dprompt ingest ../mega_prompt.md --output ./shards/ --shard-size 10_000_000 uv run dprompt info ./shards/
Use from Python / REPL
from distributed_prompt import DistributedPrompt, FileBackend backend = FileBackend("./shards/") prompt = DistributedPrompt(backend) # O(1) shard lookup — only fetches the two shards covering this range print(prompt[10_000_000:10_000_000 + 1000]) print(len(prompt)) # no I/O — reads from metadata print(repr(prompt)) # DistributedPrompt(length=..., shards=..., shard_size=...) # str-like protocol "keyword" in prompt # scans shard-by-shard with overlap prompt.find("needle") # returns character offset or -1
S3 / MinIO backend (WIP)
from distributed_prompt.backends.s3_backend import S3Backend, ingest_to_s3 # Upload local shards to MinIO ingest_to_s3("./shards/", bucket="prompts", prefix="corpus-v1", endpoint_url="http://minio:9000") # Read from S3 backend = S3Backend(bucket="prompts", prefix="corpus-v1", endpoint_url="http://minio:9000") prompt = DistributedPrompt(backend) print(prompt[0:100])
Integration with RLMs
In Zhang & Khattab 2025's Recursive Language Models paper, the agent loop evaluates
Python in a REPL where prompt is a plain string. Replace it with:
# Before (fails for large prompts): prompt = open("huge_file.txt").read() # After: from distributed_prompt import DistributedPrompt, FileBackend prompt = DistributedPrompt(FileBackend("./shards/")) prompt[n:n+k] # works identically — O(1) shard fetch
The RLM's generated code can slice prompt as usual. The DistributedPrompt
object transparently fetches only the needed shards. LM inference latency
(seconds) dominates shard fetch latency (milliseconds) so this is effectively free.
Running tests
References
- Zhang, Alex and Khattab, Omar (2025). "Recursive Language Models." https://alexzhang13.github.io/blog/2025/rlm/