[CPU] Block-wise QMoE kernel for CPU by apsonawane · Pull Request #26009 · microsoft/onnxruntime

github-advanced-security[bot]

bot found potential problems Sep 10, 2025

@apsonawane

@apsonawane

tianleiwu

tianleiwu

tianleiwu

tianleiwu

tianleiwu

tianleiwu

tianleiwu

tianleiwu

snnn pushed a commit that referenced this pull request

Sep 15, 2025
This PR adds block-wise quant kernel for QMoE CPU

@snnn snnn mentioned this pull request

Sep 15, 2025

adrianlizarraga pushed a commit that referenced this pull request

Sep 24, 2025
This PR adds block-wise quant kernel for QMoE CPU

adrianlizarraga pushed a commit that referenced this pull request

Sep 26, 2025
This PR adds block-wise quant kernel for QMoE CPU

snnn pushed a commit that referenced this pull request

Sep 27, 2025
### Description
Adds the following commits to the `rel-1.23.1` branch for ORT 1.23.1:


- add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart
  - main merge date: July 31, 1:05am
  - pr: #25590
  - commit: e753643
- [build] fix WebAssembly build on macOS/arm64
  - main merge date: Aug 5, 8:07am
  - pr: #25653
  - commit: 53f152b
- [CPU] MoE Kernel (#25958)
  - main merge date: Sept 10, 4:54pm
  - pr: #25958
  - commit: 930e640
- [CPU] Block-wise QMoE kernel for CPU
  - main merge date: Sept 15, 8:32am
  - pr: #26009
  - commit: 5d17734
- [C#] Implement missing APIs
  - main merge date: Sept 24, 10:50am
  - pr: #26101
  - commit: 35dcab5
- Regenerate test model with ONNX IR < 12
  - main merge date: Sept 24, 2:50pm
  - pr: #26149
  - commit: 88f2652
- [CPU] Fix compilation errors because of unused variables
  - main merge date: Sept 25, 1:21pm
  - pr: #26147
  - commit: 42fcd71
- [EP ABI] Check if nodes specified in GetCapability() have already been
assigned
  - main merge date: Sept 26, 1:24am
  - pr: #26156
  - commit: 67d3ba0
- [QNN EP] Add dynamic option to set HTP performance mode
  - main merge date: Sept 26, 11:55am
  - pr: #26135
  - commit: 6cc40fd

---------

Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>

TedThemistokleous added a commit to ROCm/onnxruntime that referenced this pull request

Oct 17, 2025
* ORT 1.23.1 cherrypick 1 [REDO] (microsoft#26140)

### Description
Cherry-pick the following PRs into the ORT 1.23.1 branch:

- Fix Attention GQA implementation on CPU
- **MANUAL MERGE**: see
microsoft#26057
  - main merge date: Sept 15, 11:33am
  - pr: microsoft#25966
  - commit: d530b29
- Address edge GetMemInfo edge cases
  - main merge date: Sept 16, 10:32am
  - pr: microsoft#26021
  - commit: d251f3a
- Implement new Python APIs
  - main merge date: Sept 17, 11:44am
  - pr: microsoft#25999
  - commit: abc63e8
- MemcpyFromHost and MemcpyToHost support for plugin EPs
- **MERGE CONFLICT** on file
onnxruntime/test/optimizer/transpose_optimizer_test.cc. Conflicts with
microsoft#25689
  - main merge date: Sept 23, 10:42am
  - pr: microsoft#26088
  - commit: 4545732
- [TRT RTX EP] Fix bug for generating the correct subgraph in
GetCapability microsoft#26132
  - main merge date: Sept 23, 8:54pm
  - pr: microsoft#26132
  - commit: 72e56e7


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>

* ORT 1.23.1 cherrypick 2 (microsoft#26182)

### Description
Adds the following commits to the `rel-1.23.1` branch for ORT 1.23.1:


- add session_id_ to LogEvaluationStart/Stop, LogSessionCreationStart
  - main merge date: July 31, 1:05am
  - pr: microsoft#25590
  - commit: e753643
- [build] fix WebAssembly build on macOS/arm64
  - main merge date: Aug 5, 8:07am
  - pr: microsoft#25653
  - commit: 53f152b
- [CPU] MoE Kernel (microsoft#25958)
  - main merge date: Sept 10, 4:54pm
  - pr: microsoft#25958
  - commit: 930e640
- [CPU] Block-wise QMoE kernel for CPU
  - main merge date: Sept 15, 8:32am
  - pr: microsoft#26009
  - commit: 5d17734
- [C#] Implement missing APIs
  - main merge date: Sept 24, 10:50am
  - pr: microsoft#26101
  - commit: 35dcab5
- Regenerate test model with ONNX IR < 12
  - main merge date: Sept 24, 2:50pm
  - pr: microsoft#26149
  - commit: 88f2652
- [CPU] Fix compilation errors because of unused variables
  - main merge date: Sept 25, 1:21pm
  - pr: microsoft#26147
  - commit: 42fcd71
- [EP ABI] Check if nodes specified in GetCapability() have already been
assigned
  - main merge date: Sept 26, 1:24am
  - pr: microsoft#26156
  - commit: 67d3ba0
- [QNN EP] Add dynamic option to set HTP performance mode
  - main merge date: Sept 26, 11:55am
  - pr: microsoft#26135
  - commit: 6cc40fd

---------

Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com>
Co-authored-by: xieofxie <xieofxie@126.com>
Co-authored-by: hualxie <hualxie@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Akshay Sonawane <111780983+apsonawane@users.noreply.github.com>
Co-authored-by: quic-tirupath <quic_tirupath@quicinc.com>
Co-authored-by: quic-ashwshan <quic_ashwshan@quicinc.com>