[Mlas] optimize MlasConv using thread partition opt by zoeczy · Pull Request #25255 · microsoft/onnxruntime

added 3 commits

July 2, 2025 11:01
…e groups and batch sizes.

hariharans29 added a commit that referenced this pull request

Oct 16, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

apsonawane pushed a commit that referenced this pull request

Oct 17, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

apsonawane pushed a commit that referenced this pull request

Oct 20, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

apsonawane added a commit that referenced this pull request

Oct 21, 2025
Adds the following commits to the release-1.23.2 branch for ORT 1.23.2:

- [TensorRT] Fix DDS output bug during engine update
  - PR: #26272
  - commit id: 00e85dd
- Fix shape inference failure with in-memory external data
   - PR: #26263
   - commit id: d955476
- [CUDA] replace 90a-virtual by 90-virtual for forward compatible 
  - PR: #26230
  - commit id: b58911f
- [QNN-EP] Fix logic flow bug
  - PR: #26148
  - commit id: b282379
- Internal Dupe of #25255 - [MLAS] Optimize MlasConv using thread
partition opt
  - PR: #26103
  - commit id: 7362518
- Update qMoE spec to support block quantization
  - PR: #25641
  - commit id: 7a8ffa8
- [VitisAI] add new api to VitisAI to save graph as a string
  - PR: #25602
  - commit id: 3361d72
- [[Build] Lock torch, onnxscript and onnx-ir versions to latest]
  - PR: #26315
  - commit id: ea69c4d

---------

Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Yateng Hong <toothache9010@gmail.com>
Co-authored-by: Changming Sun <chasun@microsoft.com>
Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: quic-calvnguy <quic_calvnguy@quicinc.com>
Co-authored-by: quic_calvnguy <quic_calvnguy@quic_inc.com>
Co-authored-by: yifei410 <31260809+yifei410@users.noreply.github.com>
Co-authored-by: yifei <y.zhou@xilinx.com>

JonathanC-ARM pushed a commit to JonathanC-ARM/onnxruntime that referenced this pull request

Oct 24, 2025
…ead partition opt (microsoft#26103)

### Description
This is an internal branch dupe of
microsoft#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
microsoft#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

fs-eire pushed a commit that referenced this pull request

Oct 24, 2025
…tion opt (#26103)

### Description
This is an internal branch dupe of
#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request

Nov 2, 2025
…ead partition opt (microsoft#26103)

### Description
This is an internal branch dupe of
microsoft#25255 + some minor
cosmetic changes to account for Copilot feedback

### Motivation and Context
Improve performance of NCHW Conv - Both grouped convolutions and batched
inputs should benefit from this change. For a detailed understanding of
perf improvement, please refer to the numbers in
microsoft#25255.

Credit to @zoeczy and team for this improvement and code change

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>