Enable Free-threaded Python (PEP 703) support for Python 3.13t+ by tianleiwu · Pull Request #26786

Enable Free-threaded Python (PEP 703) support for Python 3.13t+ by tianleiwu · Pull Request #26786 · microsoft/onnxruntime
* Revert "Consistent with the configuration in the packaged cmake" (microsoft#26767)

Reverts microsoft#26104

It seems like this fix although is correct and necessary, it needs some
patches in some other places (like release pipelines and onnxruntime
inference examples). We will try and address the actual issue the PR was
addressing in a subsequent release. I will re-open all the GH issues
that the fix PR closed out so that the actual issue is still tracked.

Will reopen:

microsoft#24003
microsoft#26186
microsoft#23642
microsoft#25279
microsoft#25242

* Address security issue of loading arbitrary files as external data (microsoft#26776)

### Description
Verify external data references in TensorProto specify data location
that is under the model directory structure, reject absolute paths and
paths that escape the model path.
Make the validation function available to bridge based EPs.
Expose ExternalDataInfo via a bridge to some EPs that choose to handle
the data itself.


### Motivation and Context
This is a security concern.

* [WebGPU] Implement Split-K on GEMM  (microsoft#26751)

### Description
This patch implements the `Split-K` optimization on `GEMM`. 

1. Support handling `GEMM` in `MatMulFillBiasOrZeroBeforeSplitKProgram`.
We need to add `beta` as a new uniform value and all the parameters that
are used to handle all the cases of `GEMM` in `MatMulWriteFnSource()`
(including the broadcast of `beta` on both dimensions).
2. Support `Split-K` in `GemmProgram::GenerateShaderCode()`.
3. Add cases to `GemmOptimizePackedTest` to test `Split-K` in `GEMM`.

### Motivation and Context
With this PR we can achieve about 20% improvement in
`florence-2-base-decoder-with-past-fp16` and 10% improvement in
`detr-resnet-50-fp16` on Lunar Lake iGPU.

* [webgpu] Support broadcast attention_bias (microsoft#26769)

Fixed microsoft#26766

* [QNN EP] Reshape: relax check on allowzero when concrete shape without 0 (microsoft#26630)

For QNN Reshape op. 

Relax check to support some cases of `allowzero=1` when we have concrete
static shape without 0. Previous check is too limited.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* Replace -infinity by lowest for masked values (microsoft#26055)

### Description
softmax implementation in mlas does not expect to see only -infinity as
input. The result would be wrong in that case.



### Motivation and Context
Fix a numerical issue.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* Update protobuf 3.20 to 4.25.8 (microsoft#26809)

### Description
Update vulnerable version of protobuf from 3.x to 4.x


### Motivation and Context
Resolves underlying compliance issues.

* Upgrade cpuinfo version and update some CPU feature detection code (microsoft#26598)

### Description
<!-- Describe your changes. -->

Upgrade cpuinfo version.
Update ARM64 Windows feature detection to use cpuinfo functions which
are now implemented.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Additional ARM64 Windows feature detection was implemented in cpuinfo in
[this
commit](pytorch/cpuinfo@f7b233b).
Update ORT to use the newly implemented feature detection.

* Enable Free-threaded Python (PEP 703) support for Python 3.13t+ (microsoft#26786)

This PR resolves the `RuntimeWarning` encountered when importing
`onnxruntime` in free-threaded Python environments (e.g., Python 3.13t,
3.14t).

Previously, the module did not explicitly declare that it could run
safely without the GIL, causing the interpreter to re-enable the GIL at
runtime.

### The Warning

```text
RuntimeWarning: The global interpreter lock (GIL) has been enabled to load module
'onnxruntime.capi.onnxruntime_pybind11_state', which has not declared that it can
run safely without the GIL.
````

## Changes

### 1. Build System (`cmake/onnxruntime_python.cmake`)

* Added robust detection logic to check if the current Python
interpreter is free-threaded.

**Detection strategy:**

* **Primary:** Check `sysconfig.get_config_var('Py_GIL_DISABLED')` (PEP
703 standard).

* **Fallback:** Inspect ABI flags (`ABIFLAGS` or `SOABI`) for the `t`
suffix (e.g., `cp313t`), handling cases on Windows where config
variables may return `None` or empty strings.

* **Caching:** Cache the result as `ORT_PYTHON_FREE_THREADED_DETECTED`
(`BOOL`) to avoid re-running detection on every configure.

---

### 2. C++ Source (`onnxruntime/python/onnxruntime_pybind_module.cc`)

* Updated the `PYBIND11_MODULE` definition to conditionally enable the
GIL-free slot.
* Uses `py::mod_gil_not_used()` (available in pybind11 ≥ 2.13) when
`Py_GIL_DISABLED` is defined.

---

## Verification

**Environment:**

* Python 3.14t (Free-threaded)
* Windows & Linux

### Before Change

```bash
python3.14t -c "import onnxruntime"
```

Resulted in the `RuntimeWarning` shown above.

### After Change

1. **Build log confirms correct detection:**

```text
-- Python_EXECUTABLE: D:\py314t\Scripts\python.exe
-- Checking for Free-threaded Python environment...
-- Py_GIL_DISABLED=1 detected: Enabling free-threaded support for onnxruntime_pybind11_state
```

2. **Runtime behavior:**

```bash
python3.14t -c "import onnxruntime"
```

Runs silently with no warnings, confirming the module loads successfully
with the GIL disabled.

## Related Issue

*
[https://github.com/microsoft/onnxruntime/issues/26780](https://github.com/microsoft/onnxruntime/issues/26780)

* fix npm packaging pipeline (microsoft#26810)

* Fix ExpandDims axis handling bug. (microsoft#26805)

### Description
<!-- Describe your changes. -->

Fix some issues with axis handling of ExpandDims.
- It was improperly getting the rank with `TensorShape::Size()` which
actually gets the number of elements.
- The range of valid axis values is actually `[-(rank+1), rank]`.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix crash on out-of-range axis values. Now it will error out instead.

* [QNN_EP] ConvTranspose not calculating "pad" if "output_shape" is given. (microsoft#26665)

### Description
Given "output_shape" in ConvTranspose, appropriate "pad" should be
calculated as per the formula mentioned in:
https://onnx.ai/onnx/operators/onnx__ConvTranspose.html#convtranspose



### Motivation and Context
Current QNN_EP implementation does not handle this case and sets pad to
default value (zeroes, no padding).

Signed-off-by: ankus <ankus@qti.qualcomm.com>
Co-authored-by: ankus-qti <ankus@qti.qualcomm.com>

* Add reducemin, reducemax nodes before the 1st consumer of tensor (microsoft#26768)

##Significantly reduces peak memory usage during minmax calibration

##Description
During MinMax Calibration, ReduceMin and ReduceMax nodes were added at
the end of the node list. As a result, in the topological order of
execution these nodes were coming at last since they didn't have any
consumer. Now because of this, all the intermediate tensors were not
getting freed up, occupying the memory till reducemin, reducemax nodes
consume them. This PR aims to reorder the node list such that in
topological order reducemin, reducemax nodes are executed before the
original first consumer of that tensor. This way the memory will be
freed as soon as the original first consumer consumes the tensor.

##Motivation and Context
During MinMax calibration for larger LLMs like phi4 14b etc, even 80 gb
A100 gpus were not sufficient to do the calibration. It always resulted
in CUDA OOM error even before 1st inference completes. This PR aims to
address this issue. It aims to significantly reduce the peak memory
requirements during minmax calibration.

Co-authored-by: Ronak Mahawar <rmahawar@qti.qualcomm.com>

* [Quantization] Fix static quantize runner usage. (microsoft#26624)

### Description
<!-- Describe your changes. -->
- Input pb files were read in incorrect order.
- Cause: Python `sorted` was used to acquire sorted input files in
order. However, the input files would be in incorrect order since "10"
is lexicographically smaller than "2".
  - Fix: Revise to enumerating indices to read input files.
- CumSum's output wasn't quantized.
  - Cause: CumSum wasn't registered into QDQ registry.
  - Fix: Register CumSum with QDQDirect8bitOp.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix two issues in `static_quantize_runner` usage.

* Normalize protobuf integration from 4.52 to 6.33 (microsoft#26817)

Description

Update protobuf from version 4.25.1 to 6.33.0

Motivation and Context

There is an underlying security vulnerability with 4.25.1. Updating to
6.33.0 will improve our security posture.

* WebGPU: Transpose Conv kernels in Prepack (microsoft#26675)

Prepack Conv kernels with path-aware transpose decisions, store the
transposed kernels for reuse, and add ComputeContextBase helpers for
node access and GPU buffer unmapping.

* [QNN EP] Upgrade QNN to 2.41.0 (microsoft#26812)

### Description

Update Qnn default version to 2.41.0.251128

* [ROCM] Add back --rocm_version and --rocm_home (microsoft#26819)

--rocm_version and --rocm_home were removed in
microsoft#26712. Add them back for
now until AMD updates their pipelines.

See microsoft#26801 for the
detail.

* Bump actions/download-artifact from 6 to 7 (microsoft#26793)

Bumps
[actions/download-artifact](https://github.com/actions/download-artifact)
from 6 to 7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/download-artifact/releases">actions/download-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v7.0.0</h2>
<h2>v7 - What's new</h2>
<blockquote>
<p>[!IMPORTANT]
actions/download-artifact@v7 now runs on Node.js 24 (<code>runs.using:
node24</code>) and requires a minimum Actions Runner version of 2.327.1.
If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<h3>Node.js 24</h3>
<p>This release updates the runtime to Node.js 24. v6 had preliminary
support for Node 24, however this action was by default still running on
Node.js 20. Now this action by default will run on Node.js 24.</p>
<h2>What's Changed</h2>
<ul>
<li>Update GHES guidance to include reference to Node 20 version by <a
href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a>
in <a
href="https://redirect.github.com/actions/download-artifact/pull/440">actions/download-artifact#440</a></li>
<li>Download Artifact Node24 support by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/download-artifact/pull/415">actions/download-artifact#415</a></li>
<li>fix: update <code>@actions/artifact</code> to fix Node.js 24
punycode deprecation by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/download-artifact/pull/451">actions/download-artifact#451</a></li>
<li>prepare release v7.0.0 for Node.js 24 support by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/download-artifact/pull/452">actions/download-artifact#452</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/patrikpolyak"><code>@patrikpolyak</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/download-artifact/pull/440">actions/download-artifact#440</a></li>
<li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/download-artifact/pull/415">actions/download-artifact#415</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/download-artifact/compare/v6.0.0...v7.0.0">https://github.com/actions/download-artifact/compare/v6.0.0...v7.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/download-artifact/commit/37930b1c2abaa49bbe596cd826c3c89aef350131"><code>37930b1</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/452">#452</a>
from actions/download-artifact-v7-release</li>
<li><a
href="https://github.com/actions/download-artifact/commit/72582b9e0acd370909e83fa4a1fd0fca3ad452d8"><code>72582b9</code></a>
doc: update readme</li>
<li><a
href="https://github.com/actions/download-artifact/commit/0d2ec9d4cbcefe257d822f108de2a1f15f8da9f6"><code>0d2ec9d</code></a>
chore: release v7.0.0 for Node.js 24 support</li>
<li><a
href="https://github.com/actions/download-artifact/commit/fd7ae8fda6dc16277a9ffbc91cdb0eedf156e912"><code>fd7ae8f</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/download-artifact/issues/451">#451</a>
from actions/fix-storage-blob</li>
<li><a
href="https://github.com/actions/download-artifact/commit/d484700543354b15886d6a52910cf61b7f1d2b27"><code>d484700</code></a>
chore: restore minimatch.dep.yml license file</li>
<li><a
href="https://github.com/actions/download-artifact/commit/03a808050efe42bb6ad85281890afd4e4546672c"><code>03a8080</code></a>
chore: remove obsolete dependency license files</li>
<li><a
href="https://github.com/actions/download-artifact/commit/56fe6d904b0968950f8b68ea17774c54973ed5e2"><code>56fe6d9</code></a>
chore: update <code>@actions/artifact</code> license file to 5.0.1</li>
<li><a
href="https://github.com/actions/download-artifact/commit/8e3ebc4ab4d2e095e5eb44ba1a4a53b6b03976ad"><code>8e3ebc4</code></a>
chore: update package-lock.json with <code>@actions/artifact</code><a
href="https://github.com/5"><code>@5</code></a>.0.1</li>
<li><a
href="https://github.com/actions/download-artifact/commit/1e3c4b4d4906c98ab57453c24efefdf16c078044"><code>1e3c4b4</code></a>
fix: update <code>@actions/artifact</code> to ^5.0.0 for Node.js 24
punycode fix</li>
<li><a
href="https://github.com/actions/download-artifact/commit/458627d354794c71bc386c8d5839d20b5885fe2a"><code>458627d</code></a>
chore: use local <code>@actions/artifact</code> package for Node.js 24
testing</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/download-artifact/compare/v6...v7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/download-artifact&package-manager=github_actions&previous-version=6&new-version=7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/upload-artifact from 5 to 6 (microsoft#26794)

Bumps
[actions/upload-artifact](https://github.com/actions/upload-artifact)
from 5 to 6.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/upload-artifact/releases">actions/upload-artifact's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.0</h2>
<h2>v6 - What's new</h2>
<blockquote>
<p>[!IMPORTANT]
actions/upload-artifact@v6 now runs on Node.js 24 (<code>runs.using:
node24</code>) and requires a minimum Actions Runner version of 2.327.1.
If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<h3>Node.js 24</h3>
<p>This release updates the runtime to Node.js 24. v5 had preliminary
support for Node.js 24, however this action was by default still running
on Node.js 20. Now this action by default will run on Node.js 24.</p>
<h2>What's Changed</h2>
<ul>
<li>Upload Artifact Node 24 support by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/719">actions/upload-artifact#719</a></li>
<li>fix: update <code>@actions/artifact</code> for Node.js 24 punycode
deprecation by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/744">actions/upload-artifact#744</a></li>
<li>prepare release v6.0.0 for Node.js 24 support by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/upload-artifact/pull/745">actions/upload-artifact#745</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0">https://github.com/actions/upload-artifact/compare/v5.0.0...v6.0.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/upload-artifact/commit/b7c566a772e6b6bfb58ed0dc250532a479d7789f"><code>b7c566a</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/745">#745</a>
from actions/upload-artifact-v6-release</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/e516bc8500aaf3d07d591fcd4ae6ab5f9c391d5b"><code>e516bc8</code></a>
docs: correct description of Node.js 24 support in README</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/ddc45ed9bca9b38dbd643978d88e3981cdc91415"><code>ddc45ed</code></a>
docs: update README to correct action name for Node.js 24 support</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/615b319bd27bb32c3d64dca6b6ed6974d5fbe653"><code>615b319</code></a>
chore: release v6.0.0 for Node.js 24 support</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/017748b48f8610ca8e6af1222f4a618e84a9c703"><code>017748b</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/upload-artifact/issues/744">#744</a>
from actions/fix-storage-blob</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/38d4c7997f5510fcc41fc4aae2a6b97becdbe7fc"><code>38d4c79</code></a>
chore: rebuild dist</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/7d27270e0cfd253e666c44abac0711308d2d042f"><code>7d27270</code></a>
chore: add missing license cache files for <code>@actions/core</code>,
<code>@actions/io</code>, and mi...</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/5f643d3c9475505ccaf26d686ffbfb71a8387261"><code>5f643d3</code></a>
chore: update license files for <code>@actions/artifact</code><a
href="https://github.com/5"><code>@5</code></a>.0.1 dependencies</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/1df1684032c88614064493e1a0478fcb3583e1d0"><code>1df1684</code></a>
chore: update package-lock.json with <code>@actions/artifact</code><a
href="https://github.com/5"><code>@5</code></a>.0.1</li>
<li><a
href="https://github.com/actions/upload-artifact/commit/b5b1a918401ee270935b6b1d857ae66c85f3be6f"><code>b5b1a91</code></a>
fix: update <code>@actions/artifact</code> to ^5.0.0 for Node.js 24
punycode fix</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/upload-artifact/compare/v5...v6">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-artifact&package-manager=github_actions&previous-version=5&new-version=6)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [QNN-EP] Restrict zero pads validation check to NPU backend (microsoft#26534)

### Description
- Pad ops with all zero padding values are allowed in the QNN GPU
backend.
- Restricting a recently-added validation check in the op builder to
apply only to the NPU.

* Bump actions/cache from 4 to 5 (microsoft#26795)

Bumps [actions/cache](https://github.com/actions/cache) from 4 to 5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/releases">actions/cache's
releases</a>.</em></p>
<blockquote>
<h2>v5.0.0</h2>
<blockquote>
<p>[!IMPORTANT]
<strong><code>actions/cache@v5</code> runs on the Node.js 24 runtime and
requires a minimum Actions Runner version of
<code>2.327.1</code>.</strong></p>
<p>If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<hr />
<h2>What's Changed</h2>
<ul>
<li>Upgrade to use node24 by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1630">actions/cache#1630</a></li>
<li>Prepare v5.0.0 release by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1684">actions/cache#1684</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4.3.0...v5.0.0">https://github.com/actions/cache/compare/v4.3.0...v5.0.0</a></p>
<h2>v4.3.0</h2>
<h2>What's Changed</h2>
<ul>
<li>Add note on runner versions by <a
href="https://github.com/GhadimiR"><code>@GhadimiR</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li>
<li>Prepare <code>v4.3.0</code> release by <a
href="https://github.com/Link"><code>@Link</code></a>- in <a
href="https://redirect.github.com/actions/cache/pull/1655">actions/cache#1655</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/GhadimiR"><code>@GhadimiR</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1642">actions/cache#1642</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4...v4.3.0">https://github.com/actions/cache/compare/v4...v4.3.0</a></p>
<h2>v4.2.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Update README.md by <a
href="https://github.com/nebuk89"><code>@nebuk89</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1620">actions/cache#1620</a></li>
<li>Upgrade <code>@actions/cache</code> to <code>4.0.5</code> and move
<code>@protobuf-ts/plugin</code> to dev depdencies by <a
href="https://github.com/Link"><code>@Link</code></a>- in <a
href="https://redirect.github.com/actions/cache/pull/1634">actions/cache#1634</a></li>
<li>Prepare release <code>4.2.4</code> by <a
href="https://github.com/Link"><code>@Link</code></a>- in <a
href="https://redirect.github.com/actions/cache/pull/1636">actions/cache#1636</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/nebuk89"><code>@nebuk89</code></a> made
their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1620">actions/cache#1620</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4...v4.2.4">https://github.com/actions/cache/compare/v4...v4.2.4</a></p>
<h2>v4.2.3</h2>
<h2>What's Changed</h2>
<ul>
<li>Update to use <code>@actions/cache</code> 4.0.3 package &amp;
prepare for new release by <a
href="https://github.com/salmanmkc"><code>@salmanmkc</code></a> in <a
href="https://redirect.github.com/actions/cache/pull/1577">actions/cache#1577</a>
(SAS tokens for cache entries are now masked in debug logs)</li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a href="https://github.com/salmanmkc"><code>@salmanmkc</code></a>
made their first contribution in <a
href="https://redirect.github.com/actions/cache/pull/1577">actions/cache#1577</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/actions/cache/compare/v4.2.2...v4.2.3">https://github.com/actions/cache/compare/v4.2.2...v4.2.3</a></p>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/actions/cache/blob/main/RELEASES.md">actions/cache's
changelog</a>.</em></p>
<blockquote>
<h1>Releases</h1>
<h2>Changelog</h2>
<h3>5.0.1</h3>
<ul>
<li>Update <code>@azure/storage-blob</code> to <code>^12.29.1</code> via
<code>@actions/cache@5.0.1</code> <a
href="https://redirect.github.com/actions/cache/pull/1685">#1685</a></li>
</ul>
<h3>5.0.0</h3>
<blockquote>
<p>[!IMPORTANT]
<code>actions/cache@v5</code> runs on the Node.js 24 runtime and
requires a minimum Actions Runner version of <code>2.327.1</code>.
If you are using self-hosted runners, ensure they are updated before
upgrading.</p>
</blockquote>
<h3>4.3.0</h3>
<ul>
<li>Bump <code>@actions/cache</code> to <a
href="https://redirect.github.com/actions/toolkit/pull/2132">v4.1.0</a></li>
</ul>
<h3>4.2.4</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.5</li>
</ul>
<h3>4.2.3</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.3 (obfuscates SAS token in
debug logs for cache entries)</li>
</ul>
<h3>4.2.2</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.2</li>
</ul>
<h3>4.2.1</h3>
<ul>
<li>Bump <code>@actions/cache</code> to v4.0.1</li>
</ul>
<h3>4.2.0</h3>
<p>TLDR; The cache backend service has been rewritten from the ground up
for improved performance and reliability. <a
href="https://github.com/actions/cache">actions/cache</a> now integrates
with the new cache service (v2) APIs.</p>
<p>The new service will gradually roll out as of <strong>February 1st,
2025</strong>. The legacy service will also be sunset on the same date.
Changes in these release are <strong>fully backward
compatible</strong>.</p>
<p><strong>We are deprecating some versions of this action</strong>. We
recommend upgrading to version <code>v4</code> or <code>v3</code> as
soon as possible before <strong>February 1st, 2025.</strong> (Upgrade
instructions below).</p>
<p>If you are using pinned SHAs, please use the SHAs of versions
<code>v4.2.0</code> or <code>v3.4.0</code></p>
<p>If you do not upgrade, all workflow runs using any of the deprecated
<a href="https://github.com/actions/cache">actions/cache</a> will
fail.</p>
<p>Upgrading to the recommended versions will not break your
workflows.</p>
<h3>4.1.2</h3>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/actions/cache/commit/9255dc7a253b0ccc959486e2bca901246202afeb"><code>9255dc7</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1686">#1686</a>
from actions/cache-v5.0.1-release</li>
<li><a
href="https://github.com/actions/cache/commit/8ff5423e8b66eacab4e638ee52abbd2cb831366a"><code>8ff5423</code></a>
chore: release v5.0.1</li>
<li><a
href="https://github.com/actions/cache/commit/9233019a152bc768059ac1768b8e4403b5da16c1"><code>9233019</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1685">#1685</a>
from salmanmkc/node24-storage-blob-fix</li>
<li><a
href="https://github.com/actions/cache/commit/b975f2bb844529e1063ad882c609b224bcd66eb6"><code>b975f2b</code></a>
fix: add peer property to package-lock.json for dependencies</li>
<li><a
href="https://github.com/actions/cache/commit/d0a0e1813491d01d574c95f8d189f62622bbb2ae"><code>d0a0e18</code></a>
fix: update license files for <code>@actions/cache</code>,
fast-xml-parser, and strnum</li>
<li><a
href="https://github.com/actions/cache/commit/74de208dcfcbe85c0e7154e7b17e4105fe2554ff"><code>74de208</code></a>
fix: update <code>@actions/cache</code> to ^5.0.1 for Node.js 24
punycode fix</li>
<li><a
href="https://github.com/actions/cache/commit/ac7f1152ead02e89c14b5456d14ab17591e74cfb"><code>ac7f115</code></a>
peer</li>
<li><a
href="https://github.com/actions/cache/commit/b0f846b50b6061d7a2ca6f1a2fea61d4a65d1a16"><code>b0f846b</code></a>
fix: update <code>@actions/cache</code> with storage-blob fix for
Node.js 24 punycode depr...</li>
<li><a
href="https://github.com/actions/cache/commit/a7833574556fa59680c1b7cb190c1735db73ebf0"><code>a783357</code></a>
Merge pull request <a
href="https://redirect.github.com/actions/cache/issues/1684">#1684</a>
from actions/prepare-cache-v5-release</li>
<li><a
href="https://github.com/actions/cache/commit/3bb0d78750a39cefce0c2b5a0a9801052b4359ad"><code>3bb0d78</code></a>
docs: highlight v5 runner requirement in releases</li>
<li>Additional commits viewable in <a
href="https://github.com/actions/cache/compare/v4...v5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/cache&package-manager=github_actions&previous-version=4&new-version=5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [QNN EP] Fix Clip op with min or max from QDQ (microsoft#26601)

## Motivation
QDQ node group selection logic currently navigate `Clip` op to
`UnaryNodeGroupSelector`. This isn't properly handling the use case
where `Clip` op has `min/max` provided from Q/DQ ops (still constant
initializers).

<img width="255" height="378" alt="image-2025-11-18-11-49-19-156"
src="https://github.com/user-attachments/assets/ec6250ee-68f3-40fa-8f60-93b1a400d5a0"
/>

## Changes:

- Implement custom NodeGroupSelector so that `Clip` op is properly
tagged for backend to consume.
- Fix QNN EP `Clip` min/max parsing and perform de-quantize when needed.
- Unit tests for both changes.

* Address DML crashes in WebNN QDQ subgraph tests  (microsoft#26822)

### Description
<!-- Describe your changes. -->
This change proposes a fix to some DML crashes observed in WebNN QDQ
subgraph tests. The root cause of the problem as analyzed in the issue
linked below is that the DML EP does not have kernels implemented for
the QLinear versions of LeakyRelu and Softmax. The QDQ transformer has
logic to fuse a 3-node sequence in the graph to the associated QLinear
custom op. There is an operator registry which contains the mapping
between EPs and the ops that the transformer consults for computing the
quantized variants.

The registry currently has the CPU and DML EPs supporting the same sets
of operators (thereby assuming that the fused node would also be
supported by the EP), so what I did was split out a separate
registration for the LeakyRelu and Softmax ops and associate them with
the CPU EP, while removing those ops from the prior list.

I was trying to also explore other options like seeing if I could
somehow check to see if the EP has a supported kernel for the op before
proceeding with the assignment- I could not figure out a way to do this,
though.

I validated this change by building a private version of ORT and running
the affected tests in Edge Canary. The crash did not occur for those
tests with the fix in place.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This bug causes crashes in the web browser's GPU sandbox process, which
is a blocker for WebNN origin trials. The crashes need to be resolved.

Fixes microsoft#26531.

---------

Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>

* [QNN-EP] Update gather op input tensor cast logic. (microsoft#26835)

### Description
<!-- Describe your changes. -->
Gather op was referring to onnx graph when deciding whether to insert
`Cast->int32` on indices. But input tensor is created by QNN and it
could already casted into int32. Which cause mismatch and resulting
adding redundant Cast. This PR changes Gather Op builder to refer to QNN
tenser before adding int64->int32 cast.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
It solve QNN-Gather op not to insert redundant Cast->int32.

---------

Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* [WebGPU] Fix Conv prepacking mismatch with Im2ColMatMul path detection (microsoft#26833)

Fix a bug where Conv kernel prepacking could incorrectly happen when the
Im2ColMatMul execution path would be taken at runtime. This caused a
"Missing Input" error because:

1. In PrePackInternal, the is_fused template parameter was passed to
CanApplyIm2ColMatMulProgram
2. In ComputeInternal, activation_.activation_kind_ !=
ActivationKind::None was passed instead

For fused Conv kernels (is_fused=true) without an actual activation set,
this mismatch caused CanApplyIm2ColMatMulProgram to return false during
prepacking (allowing prepacking to occur) but true at runtime (selecting
the Im2ColMatMul path which needs context.Input(1)).

The fix uses the same activation check in both places to ensure
consistent path detection between prepack time and compute time.

* fix crash in webgpu init when OrtEnv is re-created (microsoft#26836)

fixes a crash seen with chrome -> webnn -> webgpu-ep when OrtEnv
recreated.
The issue is that on OrtEnv destruction we null the default_instance but
at init the creation of the default_instance is under std::call_once.

* Add Visual Studio 2026 CMake generator support (microsoft#26802)

### Description

Add "Visual Studio 18 2026" to the list of supported CMake generators in
the build script, matching the CMake 3.32+ release that added VS 2026
support.

**Changes:**
- Added `"Visual Studio 18 2026"` to `--cmake_generator` choices in
`build_args.py`
- Updated fuzz testing generator validation to accept both VS 2022 and
VS 2026

**Usage:**
```bash
./build.py --cmake_generator "Visual Studio 18 2026" --build_dir build
```

### Motivation and Context

CMake 3.32+ includes Visual Studio 18 2026 generator support. ONNX
Runtime's build script only supported up to Visual Studio 17 2022,
blocking users with VS 2026 installations.

<!-- START COPILOT CODING AGENT SUFFIX -->



<!-- START COPILOT ORIGINAL PROMPT -->



<details>

<summary>Original prompt</summary>

> Recently, Visual Studio 2026 is released and CMake has the support for
it:
https://cmake.org/cmake/help/latest/generator/Visual%20Studio%2018%202026.html
> 
> However, ONNX Runtime's build script (build.py) does not have the
support for it. currently supported cmake generators are:
> 
> ```
> build.py --help
> 
> ...
> --cmake_generator {MinGW Makefiles,Ninja,NMake Makefiles,NMake
Makefiles JOM,Unix Makefiles,Visual Studio 17 2022,Xcode}
> ...
> ```
> 
> Please make a PR to add support for the latest visual studio.


</details>



<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in
our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>

* Proposed changes to model compilation telemetry (microsoft#26804)

### Description
<!-- Describe your changes. -->
This change adjusts the compilation telemetry to span the start and stop
of the operation, with additional recording of various facets of the
compilation session and information about the result.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
These changes enable deeper understanding of compilation usage and
health, which in turn can help prioritize more investments in making
compilation better overall for users of the platform (e.g., more
reliable, performant, better tuned options / defaults, etc.).

---------

Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>

* [EP ABI] Add weight pre-packing support to kernel-based plugin EPs (microsoft#26754)

### Description
- Adds C APIs to support pre-packing of const weights for
`OrtKernelImpl` implementations.
- APIs optionally support sharing of pre-packed weight data (for
cpu-accessible memory).
- Updates example kernel (Mul) to use new pre-packing API. Tested by
existing unit test:
https://github.com/microsoft/onnxruntime/blob/549d7415e26e2b3f86c42f86e135bb746caa37b4/onnxruntime/test/autoep/test_execution.cc#L242-L256


### Motivation and Context
The [previous PR](microsoft#26206)
added the base APIs that support kernel-based plugin EPs. This PR adds
an additional feature that was identified as necessary for the port of
WebGPU EP.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* WebGPU: Use dedicated prepack allocator for kernel prepacking (microsoft#26857)

Remove the CreateUnmappedGPUTensor workaround by using a proper prepack
allocator that creates unmapped GPU buffers directly to avoid the need
to manually unmap buffers after allocation.

* [webgpu] refactor initialization of WebGPU Context params (microsoft#26855)

### Description

refactor initialization of WebGPU Context params

The refactor:
- makes all WebGPU options into ~~3~~ 2 classes:
- `WebGpuExecutionProviderConfig`: configuration that passed to and
stored inside the EP class.
- ~~`WebGpuContextCreationParams`: configuration that passed to
constructor of class `WebGpuContext`.~~
- ~~`WebGpuContextInitializationParams`: configuration that passed to
`WebGpuContext::Initialize()`.~~
- `WebGpuContextConfig`: configuration that passed to construct and
initialize `WebGpuContext`.
- ensure all instance of the classes are created with default value
initialized.
- ensure all of the following happens only at one place:
  - setting default value
  - parse option
- add `WebGpuContextFactory::DefaultContext` to allow "get or create"
the default context.

### Motivation and Context

- Make code more clean and consistent.

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Signed-off-by: ankus <ankus@qti.qualcomm.com>
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Mu-Chein Hsu <quic_muchhsu@quicinc.com>
Co-authored-by: Hariharan Seshadri <shariharan91@gmail.com>
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
Co-authored-by: Jiawei Shao <jiawei.shao@intel.com>
Co-authored-by: Jiajia Qin <jiajiaqin@microsoft.com>
Co-authored-by: qti-yuduo <yuduow@qti.qualcomm.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: eserscor <erscor@microsoft.com>
Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: quic-ankus <quic_ankus@quicinc.com>
Co-authored-by: ankus-qti <ankus@qti.qualcomm.com>
Co-authored-by: rM-planet <145144519+rM-planet@users.noreply.github.com>
Co-authored-by: Ronak Mahawar <rmahawar@qti.qualcomm.com>
Co-authored-by: minfhong-qti <minfhong@qti.qualcomm.com>
Co-authored-by: Jie Chen <jie.a.chen@intel.com>
Co-authored-by: Jeff Kilpatrick <jkilpatrick@qti.qualcomm.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: qti-mattsinc <mattsinc@qti.qualcomm.com>
Co-authored-by: adrastogi <aditya.rastogi@microsoft.com>
Co-authored-by: Aditya Rastogi <adityar@ntdev.microsoft.com>
Co-authored-by: Mike Hsu <quic_muchhsu@quicinc.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: fs-eire <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>