Relax WeightBiasQuantization constraint for larger QDQ node group by qti-yuduo · Pull Request #25673 · microsoft/onnxruntime
snnn pushed a commit that referenced this pull request
Aug 28, 2025…5673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
snnn
mentioned this pull request
snnn pushed a commit that referenced this pull request
Aug 29, 2025- **Relax WeightBiasQuantization constraint for larger QDQ node group (#25673)** - **Add cuda graph implementation for NV TRT RTX EP (#25787)** - **python GPU IO Bindings for NVIDIA (#25776)** - **Fixes for DynamicQuantizeMatMul and Attention3D tests (#25814)** - **Fix a long standing bug on file memory mapping on windows. (#25833)** - **Add API for precompiled model compatibility check using just the compat info (#25841)** - **Enable ABSL_FLAGS flag registration for onnxruntime_perf_test for mobile build (#25849)** - **Add default constructor to Ort::Status. (#25860)** - #25871 - #25878 - #25884 - #25886 - #25866
gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request
Sep 2, 2025…crosoft#25673) ### Description Relax WeightBiasQuantization constraint for larger QDQ node group ### Motivation and Context The transformer `WeightBiasQuantization` quantizes float weights on `Q -> DQ -> Conv/ConvTranspose/Gemm's Weights -> Q-> DQ` sequence; The check on `Weights -> Q` (`children_nodes.size() != 1 || children_nodes[0]->OpType() != QDQ::QOpName`) is an issue due to it would skip quantization for many common patterns such as unfused activations followed by `Conv` (`DQ - Conv -> ReLU -> Q`). It's actually unnecessary to check ending Q here (the fold can happen anyway without changing model semantics). However, in order to minimize the current behavior change, this PR simply extend the pattern to include single path (no branch), type-preserving path lead to `Q` to enable more quantization support.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters