Convert Initializers to OrtValues Phase 2 by yuslepukhin · Pull Request #25320 · microsoft/onnxruntime

added 2 commits

July 7, 2025 19:35
Fix reshaping for external weights
Fix Fusion Helper

@jywu-msft jywu-msft changed the title Convert Initializers to OrtValues Convert Initializers to OrtValues Phase 2

Jul 8, 2025

adrianlizarraga

skottmckay

@yuslepukhin

@yuslepukhin

edgchen1

@yuslepukhin

adrianlizarraga

adrianlizarraga

adrianlizarraga added a commit that referenced this pull request

Jul 24, 2025
…tValues (#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to #25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>

RyanMetcalfeInt8 pushed a commit to RyanMetcalfeInt8/onnxruntime that referenced this pull request

Jul 29, 2025
…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>

wcy123 pushed a commit to wcy123/onnxruntime that referenced this pull request

Aug 1, 2025

This was referenced

Aug 1, 2025

carzh pushed a commit that referenced this pull request

Aug 7, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for #23979

adrianlizarraga pushed a commit that referenced this pull request

Aug 8, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <mingyue@amd.com>

adrianlizarraga pushed a commit that referenced this pull request

Aug 9, 2025
### Description

It is related to #25320 #23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With #25320 #23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <mingyue@amd.com>

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request

Aug 11, 2025
### Description

Make protobuf weights refer to OrtValues on load.
Create OrtValues for initializers that are loaded from ORT format for
uniformity.
Create OrtValues for ORT format initializers.
Adjust exporting Graph::ToGraphProto() so it does not export in memory
references in external data.
Make CoreML process external data including in memory references so it
can copy it.

### Motivation and Context
Follow up for microsoft#23979

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request

Aug 11, 2025
…tValues (microsoft#25482)

### Description
- Adds APIs to get information (file path, file offset, byte size) for
initializers with data in external files. This allows EPs to do their
own custom memory-mapping of initializer data. By default, EPs that
don't have specific requirements can still use
`ValueInfo_GetInitializerValue` to get an `OrtValue` with memory-mapped
initializer data.
- Updates `OrtGraph` to only load `OrtValue` for external initializers
on demand. This prevents having to memory map all external initializers
before the first call to `OrtEp::GetCapability`.

Follow up to microsoft#25320

New API functions:

| Function | Summary|
|-----------|--------------|
| `ValueInfo_GetExternalInitializerInfo` | Get
`OrtExternalInitializerInfo` from `OrtValueInfo` (or `NULL`). Must be
released with `ReleaseExternalInitializerInfo`|
| `ReleaseExternalInitializerInfo` | Releases the
`OrtExternalInitializerInfo` instance |
| `ExternalInitializerInfo_GetFilePath` | Returns the relative path to
the file that stores the initializer's data |
| `ExternalInitializerInfo_GetFileOffset` | Returns the byte offset
within the file where the initializer's data is stored |
| `ExternalInitializerInfo_GetByteSize` | Returns the size in bytes of
the initializer's data within the file |


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>

sanketkaleoss pushed a commit to sanketkaleoss/onnxruntime that referenced this pull request

Aug 11, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <mingyue@amd.com>

gedoensmax pushed a commit to gedoensmax/onnxruntime that referenced this pull request

Sep 2, 2025
### Description

It is related to microsoft#25320 microsoft#23979. Enable tensor raw data sharing for
externalized tensor proto with kTensorProtoMemoryAddressTag

### Motivation and Context

With microsoft#25320 microsoft#23979, all initialized tensor protos are associated with
OrtValue, VitisiAI EP need to adapt to this change.

Co-authored-by: mingyue <mingyue@amd.com>

@snnn snnn mentioned this pull request

Oct 8, 2025

snnn pushed a commit that referenced this pull request

Oct 8, 2025
When Constant nodes have tensors larger than 127 bytes, they are converted
to OrtValues with in-memory external data for efficiency. However, ONNX
shape inference rejects TensorProtos with data_location=EXTERNAL, as it
cannot distinguish between in-memory and file-based external data.

This fix modifies InferenceContextImpl::getInputData() to detect in-memory
external data and materialize it into a temporary TensorProto with embedded
data that ONNX shape inference can process.

Fixes #26261

The issue was introduced in commit 3b97d79 (PR #25320) which converted
large initializers to OrtValues. This regression caused models with Constant
nodes having tensors just over 127 bytes to fail loading with shape
inference errors.

Changes:
- Modified getInputData() to check for in-memory external data using
  utils::HasExternalDataInMemory()
- When detected, retrieves the OrtValue and creates a temporary TensorProto
  with embedded data (use_tensor_buffer=false)
- Added temp_tensor_protos_ member to store these temporary protos so they
  outlive the shape inference call

@snnn snnn mentioned this pull request

Oct 8, 2025

2 tasks

yuslepukhin added a commit that referenced this pull request

Oct 14, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>

apsonawane pushed a commit that referenced this pull request

Oct 17, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>

apsonawane pushed a commit that referenced this pull request

Oct 20, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>

fs-eire pushed a commit that referenced this pull request

Oct 24, 2025
## Description

Fixes #26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR #25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue #26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>

yuslepukhin added a commit that referenced this pull request

Oct 30, 2025

naomiOvad pushed a commit to naomiOvad/onnxruntime that referenced this pull request

Nov 2, 2025
…6263)

## Description

Fixes microsoft#26261

This PR resolves a regression introduced in v1.23.0 where models with
Constant nodes containing tensors larger than 127 bytes fail to load
with a shape inference error.

### Root Cause

Commit 3b97d79 (PR microsoft#25320) introduced an optimization to convert
large Constant node tensors (> 127 bytes) into OrtValues with in-memory
external data references for better memory management. However, ONNX
shape inference cannot distinguish between in-memory and file-based
external data, and rejects any TensorProto with `data_location =
EXTERNAL`.

### The Fix

Modified `InferenceContextImpl::getInputData()` to:
1. Detect tensors with in-memory external data using
`utils::HasExternalDataInMemory()`
2. Retrieve the corresponding OrtValue
3. Create a temporary TensorProto with embedded data (not external
reference)
4. Provide this temporary proto to ONNX shape inference

This allows ONNX shape inference to access the actual tensor data
without rejecting it as external.

### Memory Impact

This fix introduces a minor and temporary increase in memory usage
during the model loading phase.

- **When:** The additional memory is allocated only when the shape
inference engine needs to access the data of a constant tensor that is
larger than 127 bytes. This is a one-time event during the initial
analysis of the model.
- **What:** The fix creates a temporary in-memory copy of the tensor
data.
- **Duration:** This temporary copy is released as soon as shape
inference is complete.

The impact on the overall peak memory usage of the application is
expected to be negligible. The memory usage during inference is not
affected. While it is theoretically possible for the temporary tensor to
be large if a multi-gigabyte constant tensor is used for shape
inference, this is a highly unlikely scenario in practice for
well-designed models.

### Testing

- Tested with the problematic model from issue microsoft#26261
- All optimization levels now work correctly (DISABLE_ALL, BASIC,
EXTENDED, ALL)
- Unit tests to be added

### Changes

- **onnxruntime/core/graph/graph.cc**: 
  - Modified `getInputData()` method in `InferenceContextImpl` class
- Added `temp_tensor_protos_` member to store temporary TensorProtos
during shape inference

## TODO

- [ ] Add unit tests
- [ ] Run full test suite

---------

Co-authored-by: Dmitri Smirnov <dmitrism@microsoft.com>