Enable Free-threaded Python (PEP 703) support for Python 3.13t+ by tianleiwu · Pull Request #26786 · microsoft/onnxruntime
* Revert "Consistent with the configuration in the packaged cmake" (microsoft#26767) Reverts microsoft#26104 It seems like this fix although is correct and necessary, it needs some patches in some other places (like release pipelines and onnxruntime inference examples). We will try and address the actual issue the PR was addressing in a subsequent release. I will re-open all the GH issues that the fix PR closed out so that the actual issue is still tracked. Will reopen: microsoft#24003 microsoft#26186 microsoft#23642 microsoft#25279 microsoft#25242 * Address security issue of loading arbitrary files as external data (microsoft#26776) ### Description Verify external data references in TensorProto specify data location that is under the model directory structure, reject absolute paths and paths that escape the model path. Make the validation function available to bridge based EPs. Expose ExternalDataInfo via a bridge to some EPs that choose to handle the data itself. ### Motivation and Context This is a security concern. * [WebGPU] Implement Split-K on GEMM (microsoft#26751) ### Description This patch implements the `Split-K` optimization on `GEMM`. 1. Support handling `GEMM` in `MatMulFillBiasOrZeroBeforeSplitKProgram`. We need to add `beta` as a new uniform value and all the parameters that are used to handle all the cases of `GEMM` in `MatMulWriteFnSource()` (including the broadcast of `beta` on both dimensions). 2. Support `Split-K` in `GemmProgram::GenerateShaderCode()`. 3. Add cases to `GemmOptimizePackedTest` to test `Split-K` in `GEMM`. ### Motivation and Context With this PR we can achieve about 20% improvement in `florence-2-base-decoder-with-past-fp16` and 10% improvement in `detr-resnet-50-fp16` on Lunar Lake iGPU. * [webgpu] Support broadcast attention_bias (microsoft#26769) Fixed microsoft#26766 * [QNN EP] Reshape: relax check on allowzero when concrete shape without 0 (microsoft#26630) For QNN Reshape op. Relax check to support some cases of `allowzero=1` when we have concrete static shape without 0. Previous check is too limited. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * Replace -infinity by lowest for masked values (microsoft#26055) ### Description softmax implementation in mlas does not expect to see only -infinity as input. The result would be wrong in that case. ### Motivation and Context Fix a numerical issue. --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> * Update protobuf 3.20 to 4.25.8 (microsoft#26809) ### Description Update vulnerable version of protobuf from 3.x to 4.x ### Motivation and Context Resolves underlying compliance issues. * Upgrade cpuinfo version and update some CPU feature detection code (microsoft#26598) ### Description <!-- Describe your changes. --> Upgrade cpuinfo version. Update ARM64 Windows feature detection to use cpuinfo functions which are now implemented. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Additional ARM64 Windows feature detection was implemented in cpuinfo in [this commit](pytorch/cpuinfo@f7b233b). Update ORT to use the newly implemented feature detection. * Enable Free-threaded Python (PEP 703) support for Python 3.13t+ (microsoft#26786) This PR resolves the `RuntimeWarning` encountered when importing `onnxruntime` in free-threaded Python environments (e.g., Python 3.13t, 3.14t). Previously, the module did not explicitly declare that it could run safely without the GIL, causing the interpreter to re-enable the GIL at runtime. ### The Warning ```text RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module 'onnxruntime.capi.onnxruntime_pybind11_state', which has not declared that it can run safely without the GIL. ```` ## Changes ### 1. Build System (`cmake/onnxruntime_python.cmake`) * Added robust detection logic to check if the current Python interpreter is free-threaded. **Detection strategy:** * **Primary:** Check `sysconfig.get_config_var('Py_GIL_DISABLED')` (PEP 703 standard). * **Fallback:** Inspect ABI flags (`ABIFLAGS` or `SOABI`) for the `t` suffix (e.g., `cp313t`), handling cases on Windows where config variables may return `None` or empty strings. * **Caching:** Cache the result as `ORT_PYTHON_FREE_THREADED_DETECTED` (`BOOL`) to avoid re-running detection on every configure. --- ### 2. C++ Source (`onnxruntime/python/onnxruntime_pybind_module.cc`) * Updated the `PYBIND11_MODULE` definition to conditionally enable the GIL-free slot. * Uses `py::mod_gil_not_used()` (available in pybind11 ≥ 2.13) when `Py_GIL_DISABLED` is defined. --- ## Verification **Environment:** * Python 3.14t (Free-threaded) * Windows & Linux ### Before Change ```bash python3.14t -c "import onnxruntime" ``` Resulted in the `RuntimeWarning` shown above. ### After Change 1. **Build log confirms correct detection:** ```text -- Python_EXECUTABLE: D:\py314t\Scripts\python.exe -- Checking for Free-threaded Python environment... -- Py_GIL_DISABLED=1 detected: Enabling free-threaded support for onnxruntime_pybind11_state ``` 2. **Runtime behavior:** ```bash python3.14t -c "import onnxruntime" ``` Runs silently with no warnings, confirming the module loads successfully with the GIL disabled. ## Related Issue * [https://github.com/microsoft/onnxruntime/issues/26780](https://github.com/microsoft/onnxruntime/issues/26780) * fix npm packaging pipeline (microsoft#26810) * Fix ExpandDims axis handling bug. (microsoft#26805) ### Description <!-- Describe your changes. --> Fix some issues with axis handling of ExpandDims. - It was improperly getting the rank with `TensorShape::Size()` which actually gets the number of elements. - The range of valid axis values is actually `[-(rank+1), rank]`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix crash on out-of-range axis values. Now it will error out instead. * [QNN_EP] ConvTranspose not calculating "pad" if "output_shape" is given. (microsoft#26665) ### Description Given "output_shape" in ConvTranspose, appropriate "pad" should be calculated as per the formula mentioned in: https://onnx.ai/onnx/operators/onnx__ConvTranspose.html#convtranspose ### Motivation and Context Current QNN_EP implementation does not handle this case and sets pad to default value (zeroes, no padding). Signed-off-by: ankus <ankus@qti.qualcomm.com> Co-authored-by: ankus-qti <ankus@qti.qualcomm.com> * Add reducemin, reducemax nodes before the 1st consumer of tensor (microsoft#26768) ##Significantly reduces peak memory usage during minmax calibration ##Description During MinMax Calibration, ReduceMin and ReduceMax nodes were added at the end of the node list. As a result, in the topological order of execution these nodes were coming at last since they didn't have any consumer. Now because of this, all the intermediate tensors were not getting freed up, occupying the memory till reducemin, reducemax nodes consume them. This PR aims to reorder the node list such that in topological order reducemin, reducemax nodes are executed before the original first consumer of that tensor. This way the memory will be freed as soon as the original first consumer consumes the tensor. ##Motivation and Context During MinMax calibration for larger LLMs like phi4 14b etc, even 80 gb A100 gpus were not sufficient to do the calibration. It always resulted in CUDA OOM error even before 1st inference completes. This PR aims to address this issue. It aims to significantly reduce the peak memory requirements during minmax calibration. Co-authored-by: Ronak Mahawar <rmahawar@qti.qualcomm.com> * [Quantization] Fix static quantize runner usage. (microsoft#26624) ### Description <!-- Describe your changes. --> - Input pb files were read in incorrect order. - Cause: Python `sorted` was used to acquire sorted input files in order. However, the input files would be in incorrect order since "10" is lexicographically smaller than "2". - Fix: Revise to enumerating indices to read input files. - CumSum's output wasn't quantized. - Cause: CumSum wasn't registered into QDQ registry. - Fix: Register CumSum with QDQDirect8bitOp. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix two issues in `static_quantize_runner` usage. * Normalize protobuf integration from 4.52 to 6.33 (microsoft#26817) Description Update protobuf from version 4.25.1 to 6.33.0 Motivation and Context There is an underlying security vulnerability with 4.25.1. Updating to 6.33.0 will improve our security posture. * WebGPU: Transpose Conv kernels in Prepack (microsoft#26675) Prepack Conv kernels with path-aware transpose decisions, store the transposed kernels for reuse, and add ComputeContextBase helpers for node access and GPU buffer unmapping. * [QNN EP] Upgrade QNN to 2.41.0 (microsoft#26812) ### Description Update Qnn default version to 2.41.0.251128 * [ROCM] Add back --rocm_version and --rocm_home (microsoft#26819) --rocm_version and --rocm_home were removed in microsoft#26712. Add them back for now until AMD updates their pipelines. See microsoft#26801 for the detail. * Bump actions/download-artifact from 6 to 7 (microsoft#26793) Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 6 to 7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/download-artifact/releases">actions/download-artifact's releases</a>.</em></p> <blockquote> <h2>v7.0.0</h2> <h2>v7 - What's new</h2> <blockquote> <p>[!IMPORTANT] actions/download-artifact@v7 now runs on Node.js 24 (<code>runs.using: node24</code>) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>Node.js 24</h3> <p>This release updates the runtime to Node.js 24. v6 had preliminary support for Node 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.</p> <h2>What's Changed</h2> <ul> <li>Update GHES guidance to include reference to Node 20 version by <a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/440">actions/download-artifact#440</a></li> <li>Download Artifact Node24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/415">actions/download-artifact#415</a></li> <li>fix: update <code>@actions/artifact</code> to fix Node.js 24 punycode deprecation by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/451">actions/download-artifact#451</a></li> <li>prepare release v7.0.0 for Node.js 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/download-artifact/pull/452">actions/download-artifact#452</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a> made their first contribution in <a href="https://redirect.github.com/actions/download-artifact/pull/440">actions/download-artifact#440</a></li> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/download-artifact/pull/415">actions/download-artifact#415</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/download-artifact/compare/v6.0.0...v7.0.0">https://github.com/actions/download-artifact/compare/v6.0.0...v7.0.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/download-artifact/commit/37930b1c2abaa49bbe596cd826c3c89aef350131"><code>37930b1</code></a> Merge pull request <a href="https://redirect.github.com/actions/download-artifact/issues/452">#452</a> from actions/download-artifact-v7-release</li> <li><a href="https://github.com/actions/download-artifact/commit/72582b9e0acd370909e83fa4a1fd0fca3ad452d8"><code>72582b9</code></a> doc: update readme</li> <li><a href="https://github.com/actions/download-artifact/commit/0d2ec9d4cbcefe257d822f108de2a1f15f8da9f6"><code>0d2ec9d</code></a> chore: release v7.0.0 for Node.js 24 support</li> <li><a href="https://github.com/actions/download-artifact/commit/fd7ae8fda6dc16277a9ffbc91cdb0eedf156e912"><code>fd7ae8f</code></a> Merge pull request <a href="https://redirect.github.com/actions/download-artifact/issues/451">#451</a> from actions/fix-storage-blob</li> <li><a href="https://github.com/actions/download-artifact/commit/d484700543354b15886d6a52910cf61b7f1d2b27"><code>d484700</code></a> chore: restore minimatch.dep.yml license file</li> <li><a href="https://github.com/actions/download-artifact/commit/03a808050efe42bb6ad85281890afd4e4546672c"><code>03a8080</code></a> chore: remove obsolete dependency license files</li> <li><a href="https://github.com/actions/download-artifact/commit/56fe6d904b0968950f8b68ea17774c54973ed5e2"><code>56fe6d9</code></a> chore: update <code>@actions/artifact</code> license file to 5.0.1</li> <li><a href="https://github.com/actions/download-artifact/commit/8e3ebc4ab4d2e095e5eb44ba1a4a53b6b03976ad"><code>8e3ebc4</code></a> chore: update package-lock.json with <code>@actions/artifact</code><a href="https://github.com/5"><code>@5</code></a>.0.1</li> <li><a href="https://github.com/actions/download-artifact/commit/1e3c4b4d4906c98ab57453c24efefdf16c078044"><code>1e3c4b4</code></a> fix: update <code>@actions/artifact</code> to ^5.0.0 for Node.js 24 punycode fix</li> <li><a href="https://github.com/actions/download-artifact/commit/458627d354794c71bc386c8d5839d20b5885fe2a"><code>458627d</code></a> chore: use local <code>@actions/artifact</code> package for Node.js 24 testing</li> <li>Additional commits viewable in <a href="https://github.com/actions/download-artifact/compare/v6...v7">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/upload-artifact from 5 to 6 (microsoft#26794) Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 5 to 6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's releases</a>.</em></p> <blockquote> <h2>v6.0.0</h2> <h2>v6 - What's new</h2> <blockquote> <p>[!IMPORTANT] actions/upload-artifact@v6 now runs on Node.js 24 (<code>runs.using: node24</code>) and requires a minimum Actions Runner version of 2.327.1. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>Node.js 24</h3> <p>This release updates the runtime to Node.js 24. v5 had preliminary support for Node.js 24, however this action was by default still running on Node.js 20. Now this action by default will run on Node.js 24.</p> <h2>What's Changed</h2> <ul> <li>Upload Artifact Node 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/719">actions/upload-artifact#719</a></li> <li>fix: update <code>@actions/artifact</code> for Node.js 24 punycode deprecation by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/744">actions/upload-artifact#744</a></li> <li>prepare release v6.0.0 for Node.js 24 support by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/upload-artifact/pull/745">actions/upload-artifact#745</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0">https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/upload-artifact/commit/b7c566a772e6b6bfb58ed0dc250532a479d7789f"><code>b7c566a</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/745">#745</a> from actions/upload-artifact-v6-release</li> <li><a href="https://github.com/actions/upload-artifact/commit/e516bc8500aaf3d07d591fcd4ae6ab5f9c391d5b"><code>e516bc8</code></a> docs: correct description of Node.js 24 support in README</li> <li><a href="https://github.com/actions/upload-artifact/commit/ddc45ed9bca9b38dbd643978d88e3981cdc91415"><code>ddc45ed</code></a> docs: update README to correct action name for Node.js 24 support</li> <li><a href="https://github.com/actions/upload-artifact/commit/615b319bd27bb32c3d64dca6b6ed6974d5fbe653"><code>615b319</code></a> chore: release v6.0.0 for Node.js 24 support</li> <li><a href="https://github.com/actions/upload-artifact/commit/017748b48f8610ca8e6af1222f4a618e84a9c703"><code>017748b</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-artifact/issues/744">#744</a> from actions/fix-storage-blob</li> <li><a href="https://github.com/actions/upload-artifact/commit/38d4c7997f5510fcc41fc4aae2a6b97becdbe7fc"><code>38d4c79</code></a> chore: rebuild dist</li> <li><a href="https://github.com/actions/upload-artifact/commit/7d27270e0cfd253e666c44abac0711308d2d042f"><code>7d27270</code></a> chore: add missing license cache files for <code>@actions/core</code>, <code>@actions/io</code>, and mi...</li> <li><a href="https://github.com/actions/upload-artifact/commit/5f643d3c9475505ccaf26d686ffbfb71a8387261"><code>5f643d3</code></a> chore: update license files for <code>@actions/artifact</code><a href="https://github.com/5"><code>@5</code></a>.0.1 dependencies</li> <li><a href="https://github.com/actions/upload-artifact/commit/1df1684032c88614064493e1a0478fcb3583e1d0"><code>1df1684</code></a> chore: update package-lock.json with <code>@actions/artifact</code><a href="https://github.com/5"><code>@5</code></a>.0.1</li> <li><a href="https://github.com/actions/upload-artifact/commit/b5b1a918401ee270935b6b1d857ae66c85f3be6f"><code>b5b1a91</code></a> fix: update <code>@actions/artifact</code> to ^5.0.0 for Node.js 24 punycode fix</li> <li>Additional commits viewable in <a href="https://github.com/actions/upload-artifact/compare/v5...v6">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [QNN-EP] Restrict zero pads validation check to NPU backend (microsoft#26534) ### Description - Pad ops with all zero padding values are allowed in the QNN GPU backend. - Restricting a recently-added validation check in the op builder to apply only to the NPU. * Bump actions/cache from 4 to 5 (microsoft#26795) Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/releases">actions/cache's releases</a>.</em></p> <blockquote> <h2>v5.0.0</h2> <blockquote> <p>[!IMPORTANT] <strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and requires a minimum Actions Runner version of <code>2.327.1</code>.</strong></p> <p>If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <hr /> <h2>What's Changed</h2> <ul> <li>Upgrade to use node24 by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li> <li>Prepare v5.0.0 release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v4.3.0...v5.0.0">https://github.com/actions/cache/compare/v4.3.0...v5.0.0</a></p> <h2>v4.3.0</h2> <h2>What's Changed</h2> <ul> <li>Add note on runner versions by <a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li> <li>Prepare <code>v4.3.0</code> release by <a href="https://github.com/Link"><code>@Link</code></a>- in <a href="https://redirect.github.com/actions/cache/pull/1655">actions/cache#1655</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> made their first contribution in <a href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v4...v4.3.0">https://github.com/actions/cache/compare/v4...v4.3.0</a></p> <h2>v4.2.4</h2> <h2>What's Changed</h2> <ul> <li>Update README.md by <a href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1620">actions/cache#1620</a></li> <li>Upgrade <code>@actions/cache</code> to <code>4.0.5</code> and move <code>@protobuf-ts/plugin</code> to dev depdencies by <a href="https://github.com/Link"><code>@Link</code></a>- in <a href="https://redirect.github.com/actions/cache/pull/1634">actions/cache#1634</a></li> <li>Prepare release <code>4.2.4</code> by <a href="https://github.com/Link"><code>@Link</code></a>- in <a href="https://redirect.github.com/actions/cache/pull/1636">actions/cache#1636</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made their first contribution in <a href="https://redirect.github.com/actions/cache/pull/1620">actions/cache#1620</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v4...v4.2.4">https://github.com/actions/cache/compare/v4...v4.2.4</a></p> <h2>v4.2.3</h2> <h2>What's Changed</h2> <ul> <li>Update to use <code>@actions/cache</code> 4.0.3 package & prepare for new release by <a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a href="https://redirect.github.com/actions/cache/pull/1577">actions/cache#1577</a> (SAS tokens for cache entries are now masked in debug logs)</li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> made their first contribution in <a href="https://redirect.github.com/actions/cache/pull/1577">actions/cache#1577</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/cache/compare/v4.2.2...v4.2.3">https://github.com/actions/cache/compare/v4.2.2...v4.2.3</a></p> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's changelog</a>.</em></p> <blockquote> <h1>Releases</h1> <h2>Changelog</h2> <h3>5.0.1</h3> <ul> <li>Update <code>@azure/storage-blob</code> to <code>^12.29.1</code> via <code>@actions/cache@5.0.1</code> <a href="https://redirect.github.com/actions/cache/pull/1685">#1685</a></li> </ul> <h3>5.0.0</h3> <blockquote> <p>[!IMPORTANT] <code>actions/cache@v5</code> runs on the Node.js 24 runtime and requires a minimum Actions Runner version of <code>2.327.1</code>. If you are using self-hosted runners, ensure they are updated before upgrading.</p> </blockquote> <h3>4.3.0</h3> <ul> <li>Bump <code>@actions/cache</code> to <a href="https://redirect.github.com/actions/toolkit/pull/2132">v4.1.0</a></li> </ul> <h3>4.2.4</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.5</li> </ul> <h3>4.2.3</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in debug logs for cache entries)</li> </ul> <h3>4.2.2</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.2</li> </ul> <h3>4.2.1</h3> <ul> <li>Bump <code>@actions/cache</code> to v4.0.1</li> </ul> <h3>4.2.0</h3> <p>TLDR; The cache backend service has been rewritten from the ground up for improved performance and reliability. <a href="https://github.com/actions/cache">actions/cache</a> now integrates with the new cache service (v2) APIs.</p> <p>The new service will gradually roll out as of <strong>February 1st, 2025</strong>. The legacy service will also be sunset on the same date. Changes in these release are <strong>fully backward compatible</strong>.</p> <p><strong>We are deprecating some versions of this action</strong>. We recommend upgrading to version <code>v4</code> or <code>v3</code> as soon as possible before <strong>February 1st, 2025.</strong> (Upgrade instructions below).</p> <p>If you are using pinned SHAs, please use the SHAs of versions <code>v4.2.0</code> or <code>v3.4.0</code></p> <p>If you do not upgrade, all workflow runs using any of the deprecated <a href="https://github.com/actions/cache">actions/cache</a> will fail.</p> <p>Upgrading to the recommended versions will not break your workflows.</p> <h3>4.1.2</h3> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/actions/cache/commit/9255dc7a253b0ccc959486e2bca901246202afeb"><code>9255dc7</code></a> Merge pull request <a href="https://redirect.github.com/actions/cache/issues/1686">#1686</a> from actions/cache-v5.0.1-release</li> <li><a href="https://github.com/actions/cache/commit/8ff5423e8b66eacab4e638ee52abbd2cb831366a"><code>8ff5423</code></a> chore: release v5.0.1</li> <li><a href="https://github.com/actions/cache/commit/9233019a152bc768059ac1768b8e4403b5da16c1"><code>9233019</code></a> Merge pull request <a href="https://redirect.github.com/actions/cache/issues/1685">#1685</a> from salmanmkc/node24-storage-blob-fix</li> <li><a href="https://github.com/actions/cache/commit/b975f2bb844529e1063ad882c609b224bcd66eb6"><code>b975f2b</code></a> fix: add peer property to package-lock.json for dependencies</li> <li><a href="https://github.com/actions/cache/commit/d0a0e1813491d01d574c95f8d189f62622bbb2ae"><code>d0a0e18</code></a> fix: update license files for <code>@actions/cache</code>, fast-xml-parser, and strnum</li> <li><a href="https://github.com/actions/cache/commit/74de208dcfcbe85c0e7154e7b17e4105fe2554ff"><code>74de208</code></a> fix: update <code>@actions/cache</code> to ^5.0.1 for Node.js 24 punycode fix</li> <li><a href="https://github.com/actions/cache/commit/ac7f1152ead02e89c14b5456d14ab17591e74cfb"><code>ac7f115</code></a> peer</li> <li><a href="https://github.com/actions/cache/commit/b0f846b50b6061d7a2ca6f1a2fea61d4a65d1a16"><code>b0f846b</code></a> fix: update <code>@actions/cache</code> with storage-blob fix for Node.js 24 punycode depr...</li> <li><a href="https://github.com/actions/cache/commit/a7833574556fa59680c1b7cb190c1735db73ebf0"><code>a783357</code></a> Merge pull request <a href="https://redirect.github.com/actions/cache/issues/1684">#1684</a> from actions/prepare-cache-v5-release</li> <li><a href="https://github.com/actions/cache/commit/3bb0d78750a39cefce0c2b5a0a9801052b4359ad"><code>3bb0d78</code></a> docs: highlight v5 runner requirement in releases</li> <li>Additional commits viewable in <a href="https://github.com/actions/cache/compare/v4...v5">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [QNN EP] Fix Clip op with min or max from QDQ (microsoft#26601) ## Motivation QDQ node group selection logic currently navigate `Clip` op to `UnaryNodeGroupSelector`. This isn't properly handling the use case where `Clip` op has `min/max` provided from Q/DQ ops (still constant initializers). <img width="255" height="378" alt="image-2025-11-18-11-49-19-156" src="https://github.com/user-attachments/assets/ec6250ee-68f3-40fa-8f60-93b1a400d5a0" /> ## Changes: - Implement custom NodeGroupSelector so that `Clip` op is properly tagged for backend to consume. - Fix QNN EP `Clip` min/max parsing and perform de-quantize when needed. - Unit tests for both changes. * Address DML crashes in WebNN QDQ subgraph tests (microsoft#26822) ### Description <!-- Describe your changes. --> This change proposes a fix to some DML crashes observed in WebNN QDQ subgraph tests. The root cause of the problem as analyzed in the issue linked below is that the DML EP does not have kernels implemented for the QLinear versions of LeakyRelu and Softmax. The QDQ transformer has logic to fuse a 3-node sequence in the graph to the associated QLinear custom op. There is an operator registry which contains the mapping between EPs and the ops that the transformer consults for computing the quantized variants. The registry currently has the CPU and DML EPs supporting the same sets of operators (thereby assuming that the fused node would also be supported by the EP), so what I did was split out a separate registration for the LeakyRelu and Softmax ops and associate them with the CPU EP, while removing those ops from the prior list. I was trying to also explore other options like seeing if I could somehow check to see if the EP has a supported kernel for the op before proceeding with the assignment- I could not figure out a way to do this, though. I validated this change by building a private version of ORT and running the affected tests in Edge Canary. The crash did not occur for those tests with the fix in place. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This bug causes crashes in the web browser's GPU sandbox process, which is a blocker for WebNN origin trials. The crashes need to be resolved. Fixes microsoft#26531. --------- Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> * [QNN-EP] Update gather op input tensor cast logic. (microsoft#26835) ### Description <!-- Describe your changes. --> Gather op was referring to onnx graph when deciding whether to insert `Cast->int32` on indices. But input tensor is created by QNN and it could already casted into int32. Which cause mismatch and resulting adding redundant Cast. This PR changes Gather Op builder to refer to QNN tenser before adding int64->int32 cast. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It solve QNN-Gather op not to insert redundant Cast->int32. --------- Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * [WebGPU] Fix Conv prepacking mismatch with Im2ColMatMul path detection (microsoft#26833) Fix a bug where Conv kernel prepacking could incorrectly happen when the Im2ColMatMul execution path would be taken at runtime. This caused a "Missing Input" error because: 1. In PrePackInternal, the is_fused template parameter was passed to CanApplyIm2ColMatMulProgram 2. In ComputeInternal, activation_.activation_kind_ != ActivationKind::None was passed instead For fused Conv kernels (is_fused=true) without an actual activation set, this mismatch caused CanApplyIm2ColMatMulProgram to return false during prepacking (allowing prepacking to occur) but true at runtime (selecting the Im2ColMatMul path which needs context.Input(1)). The fix uses the same activation check in both places to ensure consistent path detection between prepack time and compute time. * fix crash in webgpu init when OrtEnv is re-created (microsoft#26836) fixes a crash seen with chrome -> webnn -> webgpu-ep when OrtEnv recreated. The issue is that on OrtEnv destruction we null the default_instance but at init the creation of the default_instance is under std::call_once. * Add Visual Studio 2026 CMake generator support (microsoft#26802) ### Description Add "Visual Studio 18 2026" to the list of supported CMake generators in the build script, matching the CMake 3.32+ release that added VS 2026 support. **Changes:** - Added `"Visual Studio 18 2026"` to `--cmake_generator` choices in `build_args.py` - Updated fuzz testing generator validation to accept both VS 2022 and VS 2026 **Usage:** ```bash ./build.py --cmake_generator "Visual Studio 18 2026" --build_dir build ``` ### Motivation and Context CMake 3.32+ includes Visual Studio 18 2026 generator support. ONNX Runtime's build script only supported up to Visual Studio 17 2022, blocking users with VS 2026 installations. <!-- START COPILOT CODING AGENT SUFFIX --> <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > Recently, Visual Studio 2026 is released and CMake has the support for it: https://cmake.org/cmake/help/latest/generator/Visual%20Studio%2018%202026.html > > However, ONNX Runtime's build script (build.py) does not have the support for it. currently supported cmake generators are: > > ``` > build.py --help > > ... > --cmake_generator {MinGW Makefiles,Ninja,NMake Makefiles,NMake Makefiles JOM,Unix Makefiles,Visual Studio 17 2022,Xcode} > ... > ``` > > Please make a PR to add support for the latest visual studio. </details> <!-- START COPILOT CODING AGENT TIPS --> --- 💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey). --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com> * Proposed changes to model compilation telemetry (microsoft#26804) ### Description <!-- Describe your changes. --> This change adjusts the compilation telemetry to span the start and stop of the operation, with additional recording of various facets of the compilation session and information about the result. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> These changes enable deeper understanding of compilation usage and health, which in turn can help prioritize more investments in making compilation better overall for users of the platform (e.g., more reliable, performant, better tuned options / defaults, etc.). --------- Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> * [EP ABI] Add weight pre-packing support to kernel-based plugin EPs (microsoft#26754) ### Description - Adds C APIs to support pre-packing of const weights for `OrtKernelImpl` implementations. - APIs optionally support sharing of pre-packed weight data (for cpu-accessible memory). - Updates example kernel (Mul) to use new pre-packing API. Tested by existing unit test: https://github.com/microsoft/onnxruntime/blob/549d7415e26e2b3f86c42f86e135bb746caa37b4/onnxruntime/test/autoep/test_execution.cc#L242-L256 ### Motivation and Context The [previous PR](microsoft#26206) added the base APIs that support kernel-based plugin EPs. This PR adds an additional feature that was identified as necessary for the port of WebGPU EP. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> * WebGPU: Use dedicated prepack allocator for kernel prepacking (microsoft#26857) Remove the CreateUnmappedGPUTensor workaround by using a proper prepack allocator that creates unmapped GPU buffers directly to avoid the need to manually unmap buffers after allocation. * [webgpu] refactor initialization of WebGPU Context params (microsoft#26855) ### Description refactor initialization of WebGPU Context params The refactor: - makes all WebGPU options into ~~3~~ 2 classes: - `WebGpuExecutionProviderConfig`: configuration that passed to and stored inside the EP class. - ~~`WebGpuContextCreationParams`: configuration that passed to constructor of class `WebGpuContext`.~~ - ~~`WebGpuContextInitializationParams`: configuration that passed to `WebGpuContext::Initialize()`.~~ - `WebGpuContextConfig`: configuration that passed to construct and initialize `WebGpuContext`. - ensure all instance of the classes are created with default value initialized. - ensure all of the following happens only at one place: - setting default value - parse option - add `WebGpuContextFactory::DefaultContext` to allow "get or create" the default context. ### Motivation and Context - Make code more clean and consistent. --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Signed-off-by: ankus <ankus@qti.qualcomm.com> Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com> Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com> Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com> Co-authored-by: Jiawei Shao <jiawei.shao@intel.com> Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com> Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: eserscor <erscor@microsoft.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: quic-ankus <quic_ankus@quicinc.com> Co-authored-by: ankus-qti <ankus@qti.qualcomm.com> Co-authored-by: rM-planet <145144519+rM-planet@users.noreply.github.com> Co-authored-by: Ronak Mahawar <rmahawar@qti.qualcomm.com> Co-authored-by: minfhong-qti <minfhong@qti.qualcomm.com> Co-authored-by: Jie Chen <jie.a.chen@intel.com> Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com> Co-authored-by: adrastogi <aditya.rastogi@microsoft.com> Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com> Co-authored-by: Mike Hsu <quic_muchhsu@quicinc.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>