Relax WeightBiasQuantization constraint for larger QDQ node group by qti-yuduo · Pull Request #25673 · microsoft/onnxruntime

@HectorSVC added the ep:QNN

issues related to QNN exeution provider

label

Aug 11, 2025

adrianlizarraga

HectorSVC

@qti-yuduo

@qti-yuduo

HectorSVC

snnn pushed a commit that referenced this pull request

Aug 28, 2025
…5673)

### Description
Relax WeightBiasQuantization constraint for larger QDQ node group

### Motivation and Context
The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

@snnn snnn mentioned this pull request

Aug 28, 2025

snnn pushed a commit that referenced this pull request

Aug 29, 2025
- **Relax WeightBiasQuantization constraint for larger QDQ node group
(#25673)**
- **Add cuda graph implementation for NV TRT RTX EP (#25787)**
- **python GPU IO Bindings for NVIDIA  (#25776)**
- **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)**
- **Fix a long standing bug on file memory mapping on windows.
(#25833)**
- **Add API for precompiled model compatibility check using just the
compat info (#25841)**
- **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for
mobile build (#25849)**
- **Add default constructor to Ort::Status. (#25860)**
- #25871
- #25878
- #25884
- #25886
- #25866

gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request

Sep 2, 2025
…crosoft#25673)

### Description
Relax WeightBiasQuantization constraint for larger QDQ node group

### Motivation and Context
The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`).

It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.

qti-yuduo pushed a commit to CodeLinaro/onnxruntime that referenced this pull request

Sep 24, 2025