ORT 1.23.1 cherrypick 2 by adrianlizarraga · Pull Request #26182 · microsoft/onnxruntime

and others added 10 commits

September 26, 2025 14:11
…25590)

### Description
<!-- Describe your changes. -->

use session id to track them with LogSessionCreation

if we call Run in different threads, we could differentiate them with
thread id given Run is not async

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: hualxie <hualxie@microsoft.com>
### Description

fix WebAssembly build on macOS/arm64 by disable appending
"-Donnxruntime_USE_KLEIDIAI=ON" to the cmake_args

KleidiAI should not be enabled for WebAssembly build.
CPU MoE Kernel
```
name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 16, max_diff: 2.682209014892578e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 1, seq_len: 32, max_diff: 2.980232238769531e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 16, max_diff: 2.980232238769531e-07
.name: SwigluMoEBlock, quant_bits: 0, dtype: FP32, batch: 2, seq_len: 32, max_diff: 4.172325134277344e-07
.MoE CPU kernel time: 15.721677541732786 ms
.
----------------------------------------------------------------------
Ran 5 tests in 30.217s
```
This PR adds block-wise quant kernel for QMoE CPU
This pull request adds new APIs and updates existing ones to improve
memory and device information handling in the ONNX Runtime C# bindings.
The most significant changes introduce methods for fetching memory info
and device info for session inputs/outputs, and add support for shared
allocators and synchronization streams. There are also several updates
and renamings for LoraAdapter delegates and related APIs.

### Memory and Device Info APIs

* Added `GetMemoryInfosForInputs`, `GetMemoryInfosForOutputs`, and
`GetEpDeviceForInputs` methods to `InferenceSession.shared.cs` to fetch
memory info and device info for session inputs/outputs. These methods
utilize new native delegates for retrieving memory and device
information.
* Introduced native delegates in `NativeMethods.shared.cs` for
`OrtSessionGetMemoryInfoForInputs`, `OrtSessionGetMemoryInfoForOutputs`,
and `OrtSessionGetEpDeviceForInputs`, and wired them up in the static
constructor.
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R530-R532)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1312-R1335)

### Shared Allocator and Synchronization Stream Support

* Added delegates and static fields for creating, getting, and releasing
shared allocators, as well as for creating and managing synchronization
streams (`OrtCreateSharedAllocator`, `OrtGetSharedAllocator`,
`OrtReleaseSharedAllocator`, `OrtCreateSyncStreamForEpDevice`,
`OrtSyncStream_GetHandle`, `OrtReleaseSyncStream`).
* Added delegate for copying tensors (`OrtCopyTensors`).

### LoraAdapter API Updates

* Renamed LoraAdapter-related delegates to use the `Ort` prefix
(`OrtCreateLoraAdapter`, `OrtCreateLoraAdapterFromArray`,
`OrtReleaseLoraAdapter`) and updated their usage throughout the
codebase.
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L699-R710)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1561-R1672)
[[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73L1578-R1695)

### MemoryInfo Enhancements

* Added new delegates for creating memory info with more parameters
(`OrtCreateMemoryInfoV2`), and for querying device memory type and
vendor ID (`OrtMemoryInfoGetDeviceMemType`, `OrtMemoryInfoGetVendorId`).
[[1]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R594-R596)
[[2]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1804-R1817)
[[3]](diffhunk://#diff-f9f2aaafc076365917de8ab96628da427d9dd0fd6a214fb9c266733f90d6fc73R1866-R1877)

### Minor API Documentation Update

* Clarified the lifetime of allocators in the documentation, noting they
can be explicitly unregistered.### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
- Regenerates the `input_propagate_to_output.onnx` model used in [this
unit
test](https://github.com/microsoft/onnxruntime/blob/35dcab5088118117acc6086c9b6dd6dd92c7060f/onnxruntime/test/shared_lib/test_inference.cc#L497-L506)
so that it uses an ONNX IR version compatible with ONNX 1.18.0 (i.e., IR
version < 12).
- Adds script `input_propagate_to_output.py` that can be used to
regenerate the `input_propagate_to_output.onnx` model.
- Embed missing weight values that are needed to run the existing
`test_dangling_input_segment_ids.py` script.



### Motivation and Context
The main branch is using ONNX 1.19. However, this unit test also needs
to pass in the `rel-1.23.1` branch, which is still using ONNX 1.18.0.
So, by downgrading the model's IR version, the unit test can run in both
branches.

See original PR that added the test models:
#26021
This PR fixes few unused variables
…n assigned (#26156)

### Description
Fixes segfault in `PluginExecutionProvider::GetCapability()` when the
underlying `OrtEp` tries to claim nodes that have already been assigned
to another EP.


### Motivation and Context
Should log a warning (instead of crashing or throwing an exception) when
a plugin EP tries to claim a node that is already assigned to another
EP.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
### Description
Add a new EP Dynamic option to set HTP performance mode after session creation.

---------

Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>

yuslepukhin

snnn

snnn approved these changes Sep 26, 2025

HectorSVC

apsonawane

@snnn snnn deleted the adrianl/rel-1.23.1-cherrypick-2 branch

September 27, 2025 03:28

This was referenced

Sep 27, 2025

TedThemistokleous added a commit to ROCm/onnxruntime that referenced this pull request

Oct 17, 2025
* ORT 1.23.1 cherrypick 1 [REDO] (microsoft#26140)

### Description
Cherry-pick the following PRs into the ORT 1.23.1 branch:

- Fix Attention GQA implementation on CPU
- **MANUAL MERGE**: see
microsoft#26057
  - main merge date: Sept 15, 11:33am
  - pr: microsoft#25966
  - commit: d530b29
- Address edge GetMemInfo edge cases
  - main merge date: Sept 16, 10:32am
  - pr: microsoft#26021
  - commit: d251f3a
- Implement new Python APIs
  - main merge date: Sept 17, 11:44am
  - pr: microsoft#25999
  - commit: abc63e8
- MemcpyFromHost and MemcpyToHost support for plugin EPs
- **MERGE CONFLICT** on file
onnxruntime/test/optimizer/transpose_optimizer_test.cc. Conflicts with
microsoft#25689
  - main merge date: Sept 23, 10:42am
  - pr: microsoft#26088
  - commit: 4545732
- [TRT RTX EP] Fix bug for generating the correct subgraph in
GetCapability microsoft#26132
  - main merge date: Sept 23, 8:54pm
  - pr: microsoft#26132
  - commit: 72e56e7


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>

* ORT 1.23.1 cherrypick 2 (microsoft#26182)

### Description
Adds the following commits to the `rel-1.23.1` branch for ORT 1.23.1:


- add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart
  - main merge date: July 31, 1:05am
  - pr: microsoft#25590
  - commit: e753643
- [build] fix WebAssembly build on macOS/arm64
  - main merge date: Aug 5, 8:07am
  - pr: microsoft#25653
  - commit: 53f152b
- [CPU] MoE Kernel (microsoft#25958)
  - main merge date: Sept 10, 4:54pm
  - pr: microsoft#25958
  - commit: 930e640
- [CPU] Block-wise QMoE kernel for CPU
  - main merge date: Sept 15, 8:32am
  - pr: microsoft#26009
  - commit: 5d17734
- [C#] Implement missing APIs
  - main merge date: Sept 24, 10:50am
  - pr: microsoft#26101
  - commit: 35dcab5
- Regenerate test model with ONNX IR < 12
  - main merge date: Sept 24, 2:50pm
  - pr: microsoft#26149
  - commit: 88f2652
- [CPU] Fix compilation errors because of unused variables
  - main merge date: Sept 25, 1:21pm
  - pr: microsoft#26147
  - commit: 42fcd71
- [EP ABI] Check if nodes specified in GetCapability() have already been
assigned
  - main merge date: Sept 26, 1:24am
  - pr: microsoft#26156
  - commit: 67d3ba0
- [QNN EP] Add dynamic option to set HTP performance mode
  - main merge date: Sept 26, 11:55am
  - pr: microsoft#26135
  - commit: 6cc40fd

---------

Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>