[TensorRT] Fix DDS output bug during engine update by toothache · Pull Request #26272 · microsoft/onnxruntime
apsonawane pushed a commit that referenced this pull request
Oct 17, 2025### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
apsonawane pushed a commit that referenced this pull request
Oct 20, 2025### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
apsonawane added a commit that referenced this pull request
Oct 21, 2025Adds the following commits to the release-1.23.2 branch for ORT 1.23.2: - [TensorRT] Fix DDS output bug during engine update - PR: #26272 - commit id: 00e85dd - Fix shape inference failure with in-memory external data - PR: #26263 - commit id: d955476 - [CUDA] replace 90a-virtual by 90-virtual for forward compatible - PR: #26230 - commit id: b58911f - [QNN-EP] Fix logic flow bug - PR: #26148 - commit id: b282379 - Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread partition opt - PR: #26103 - commit id: 7362518 - Update qMoE spec to support block quantization - PR: #25641 - commit id: 7a8ffa8 - [VitisAI] add new api to VitisAI to save graph as a string - PR: #25602 - commit id: 3361d72 - [[Build] Lock torch, onnxscript and onnx-ir versions to latest] - PR: #26315 - commit id: ea69c4d --------- Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yateng Hong <toothache9010@gmail.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com> Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com> Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com> Co-authored-by: yifei <y.zhou@xilinx.com>
fs-eire pushed a commit that referenced this pull request
Oct 24, 2025### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request
Nov 2, 2025### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.
### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.
Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime
def create_model():
import onnx
from onnx import helper, TensorProto
input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])
nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
transpose_node = helper.make_node(
"Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
)
gathernd_node = helper.make_node(
"GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
)
value_info = [
helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
]
graph = helper.make_graph(
[nonzeros_node, transpose_node, gathernd_node],
"test_graph",
[input],
[output],
value_info=value_info,
)
model = helper.make_model(graph)
onnx.save(model, "model_dds.onnx")
def run_model():
import onnxruntime as ort
import numpy as np
sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])
print("Running with data shape (3,4)")
data = np.random.randn(3, 4).astype(np.float32)
sess.run(None, {"data": data})
print("Running with data shape (5,6)")
data = np.random.randn(5, 6).astype(np.float32)
sess.run(None, {"data": data})
create_model()
run_model()
```
Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters