mem: disable ONNX mem_pattern and cpu_mem_arena on inference sessions by KRRT7 · Pull Request #495 · Unstructured-IO/unstructured-inference
Set enable_mem_pattern=False and enable_cpu_mem_arena=False on SessionOptions for both YoloX and Detectron2 ONNX sessions. These flags control pre-allocation strategies that trade memory for speed on repeated inference. With both disabled, peak memory drops ~36% (553→351 MB) on the YoloX model with negligible latency impact.
Replace unconditional disable of enable_mem_pattern and enable_cpu_mem_arena with opt-in via ONNX_DISABLE_MEMORY_ARENA=1. Default behavior is unchanged (arena enabled, ~15% faster inference). Setting the env var disables both options, saving ~209 MB idle RSS per session at a ~15% latency cost. Env var is read once at module init time.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters