[TensorRT] Fix DDS output bug during engine update by toothache · Pull Request #26272 · microsoft/onnxruntime

@toothache

@toothache

github-advanced-security[bot]

github-advanced-security[bot]

@toothache

github-advanced-security[bot]

tianleiwu

apsonawane pushed a commit that referenced this pull request

Oct 17, 2025
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.


### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.

Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime

def create_model():
    import onnx
    from onnx import helper, TensorProto

    input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
    output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])

    nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
    transpose_node = helper.make_node(
        "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
    )
    gathernd_node = helper.make_node(
        "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
    )

    value_info = [
        helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
        helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
    ]

    graph = helper.make_graph(
        [nonzeros_node, transpose_node, gathernd_node],
        "test_graph",
        [input],
        [output],
        value_info=value_info,
    )

    model = helper.make_model(graph)
    onnx.save(model, "model_dds.onnx")


def run_model():
    import onnxruntime as ort
    import numpy as np

    sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])

    print("Running with data shape (3,4)")
    data = np.random.randn(3, 4).astype(np.float32)
    sess.run(None, {"data": data})

    print("Running with data shape (5,6)")
    data = np.random.randn(5, 6).astype(np.float32)
    sess.run(None, {"data": data})


create_model()
run_model()
```

Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.

apsonawane pushed a commit that referenced this pull request

Oct 20, 2025
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.


### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.

Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime

def create_model():
    import onnx
    from onnx import helper, TensorProto

    input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
    output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])

    nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
    transpose_node = helper.make_node(
        "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
    )
    gathernd_node = helper.make_node(
        "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
    )

    value_info = [
        helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
        helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
    ]

    graph = helper.make_graph(
        [nonzeros_node, transpose_node, gathernd_node],
        "test_graph",
        [input],
        [output],
        value_info=value_info,
    )

    model = helper.make_model(graph)
    onnx.save(model, "model_dds.onnx")


def run_model():
    import onnxruntime as ort
    import numpy as np

    sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])

    print("Running with data shape (3,4)")
    data = np.random.randn(3, 4).astype(np.float32)
    sess.run(None, {"data": data})

    print("Running with data shape (5,6)")
    data = np.random.randn(5, 6).astype(np.float32)
    sess.run(None, {"data": data})


create_model()
run_model()
```

Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.

apsonawane added a commit that referenced this pull request

Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: #26272
  - commit id: 00e85dd
- Fix shape inference failure with in-memory external data
   - PR: #26263
   - commit id: d955476
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: #26230
  - commit id: b58911f
- [QNN-EP] Fix logic flow bug
  - PR: #26148
  - commit id: b282379
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: #26103
  - commit id: 7362518
- Update qMoE spec to support block quantization
  - PR: #25641
  - commit id: 7a8ffa8
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: #25602
  - commit id: 3361d72
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: #26315
  - commit id: ea69c4d

---------

Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com>
Co-authored-by: yifei <y.zhou@xilinx.com>

fs-eire pushed a commit that referenced this pull request

Oct 24, 2025
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.


### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.

Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime

def create_model():
    import onnx
    from onnx import helper, TensorProto

    input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
    output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])

    nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
    transpose_node = helper.make_node(
        "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
    )
    gathernd_node = helper.make_node(
        "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
    )

    value_info = [
        helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
        helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
    ]

    graph = helper.make_graph(
        [nonzeros_node, transpose_node, gathernd_node],
        "test_graph",
        [input],
        [output],
        value_info=value_info,
    )

    model = helper.make_model(graph)
    onnx.save(model, "model_dds.onnx")


def run_model():
    import onnxruntime as ort
    import numpy as np

    sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])

    print("Running with data shape (3,4)")
    data = np.random.randn(3, 4).astype(np.float32)
    sess.run(None, {"data": data})

    print("Running with data shape (5,6)")
    data = np.random.randn(5, 6).astype(np.float32)
    sess.run(None, {"data": data})


create_model()
run_model()
```

Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.

naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request

Nov 2, 2025
### Description
Fix a bug in the TRT Execution Provider where the DDS output tensor was
not bound after an engine update.


### Motivation and Context
The `dds_output_allocator_map` is not cleared on engine update, so that
it will mis-recognized as a known DDS and will not bind the output
allocation.

Script to reproduce the issue:
```:python
# create an onnx model with:
# inputs: data -> NonZeros(data) -> GatherND -> output
# then run the model with onnxruntime

def create_model():
    import onnx
    from onnx import helper, TensorProto

    input = helper.make_tensor_value_info("data", TensorProto.FLOAT, ["d1", "d2"])
    output = helper.make_tensor_value_info("output", TensorProto.FLOAT, ["nzr"])

    nonzeros_node = helper.make_node("NonZero", ["data"], ["nonzeros"], "nonzeros_node")
    transpose_node = helper.make_node(
        "Transpose", ["nonzeros"], ["nonzeros_t"], "transpose_node"
    )
    gathernd_node = helper.make_node(
        "GatherND", ["data", "nonzeros_t"], ["output"], "gathernd_node"
    )

    value_info = [
        helper.make_tensor_value_info("nonzeros", TensorProto.INT64, [2, "nzr"]),
        helper.make_tensor_value_info("nonzeros_t", TensorProto.INT64, ["nzr", 2]),
    ]

    graph = helper.make_graph(
        [nonzeros_node, transpose_node, gathernd_node],
        "test_graph",
        [input],
        [output],
        value_info=value_info,
    )

    model = helper.make_model(graph)
    onnx.save(model, "model_dds.onnx")


def run_model():
    import onnxruntime as ort
    import numpy as np

    sess = ort.InferenceSession("model_dds.onnx", providers=["TensorrtExecutionProvider", "CUDAExecutionProvider", "CPUExecutionProvider"])

    print("Running with data shape (3,4)")
    data = np.random.randn(3, 4).astype(np.float32)
    sess.run(None, {"data": data})

    print("Running with data shape (5,6)")
    data = np.random.randn(5, 6).astype(np.float32)
    sess.run(None, {"data": data})


create_model()
run_model()
```

Before the change:
> IExecutionContext::enqueueV3: Error Code 3: API Usage Error (Parameter
check failed, condition:
mContext.profileObliviousBindings.at(profileObliviousIndex) ||
getPtrOrNull(mOutputAllocators, profileObliviousIndex). Neither address
or allocator is set for output tensor scores. Call
setOutputTensorAddress, setTensorAddress or setOutputAllocator before
enqueue/execute.) ... Status Message: TensorRT EP execution context
enqueue failed.