mem: disable ONNX mem_pattern and cpu_mem_arena on inference sessions by KRRT7 · Pull Request #495

mem: disable ONNX mem_pattern and cpu_mem_arena on inference sessions by KRRT7 · Pull Request #495 · Unstructured-IO/unstructured-inference

Set enable_mem_pattern=False and enable_cpu_mem_arena=False on
SessionOptions for both YoloX and Detectron2 ONNX sessions.

These flags control pre-allocation strategies that trade memory for
speed on repeated inference. With both disabled, peak memory drops
~36% (553→351 MB) on the YoloX model with negligible latency impact.

Replace unconditional disable of enable_mem_pattern and
enable_cpu_mem_arena with opt-in via ONNX_DISABLE_MEMORY_ARENA=1.

Default behavior is unchanged (arena enabled, ~15% faster inference).
Setting the env var disables both options, saving ~209 MB idle RSS
per session at a ~15% latency cost.

Env var is read once at module init time.