Add proper broadcasting logic to constant_value operator family. by mzient · Pull Request #6104 · NVIDIA/DALI
Greptile Overview
Greptile Summary
This PR fixes incorrect broadcasting behavior in the constant_value operator family (Full, FullLike, etc.). Previously, data was repeated along the fastest (innermost) dimension regardless of shape, producing unexpected patterns for non-trivial input shapes.
Key changes:
- Introduced
RepeatInnerfunction for efficient 1D tiling with optimizations for power-of-2 sizes - Added
Broadcastfunction for general n-dimensional numpy-style broadcasting using stride-based iteration - Optimized dispatch: Uses fast
RepeatInnerpath when input is effectively 1D (scalar or whenin_shape.back() == volume(in_shape)), falls back to generalBroadcastfor complex shapes - The stride calculation correctly handles right-to-left alignment (numpy semantics), setting stride=0 for broadcasted dimensions
Testing:
- Comprehensive new tests covering scalar inputs, simple 1D broadcasting (including power-of-2 inner extents), and complex multi-dimensional broadcasting along various axes
- All tests validate against numpy's
np.fullbehavior for correctness
Confidence Score: 4/5
- This PR is safe to merge - it fixes a bug in broadcasting logic with proper test coverage validating against numpy behavior.
- Score reflects well-tested bug fix with comprehensive test cases. The implementation follows established patterns in the codebase (stride-based broadcasting) and uses existing utility functions. Minor deduction because the code uses
assertin production code which may not halt in release builds, though the assertion condition is verified earlier inCanBroadcastShapes. - The
constant_value.ccimplementation is the primary focus - the broadcasting stride calculation and recursive broadcast function should be verified for correctness with edge cases.
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| dali/operators/generic/constant_value.cc | 4/5 | Adds numpy-style broadcasting logic for the constant_value operator family. Introduces RepeatInner for efficient 1D tiling and Broadcast for general n-dimensional broadcasting with stride-based iteration. The code correctly handles scalar inputs, 1D-like inputs (fast path), and general multi-dimensional broadcasting cases. |
| dali/test/python/operator_1/test_constant_value.py | 5/5 | Adds comprehensive test cases for the new broadcasting behavior. Tests cover scalar inputs, simple 1D broadcasting (including power-of-2 inner extents), and complex multi-dimensional broadcasting along various axes. All tests compare against numpy's np.full for correctness verification. |
Sequence Diagram
sequenceDiagram
participant Client
participant RunImpl as ConstantValue::RunImpl
participant RepeatInner
participant Broadcast
participant CalcStrides
Client->>RunImpl: Execute operator
RunImpl->>RunImpl: Get output shapes and input fill_value
alt Input is scalar or 1D-like (in_shape.back() == volume(in_shape))
RunImpl->>RepeatInner: Fast path tiling
RepeatInner->>RepeatInner: Check if in_size is power of 2
alt Power of 2
RepeatInner->>RepeatInner: Use bitwise AND for index
else Not power of 2
RepeatInner->>RepeatInner: Use block copy with remainder
end
else General n-dimensional broadcasting
RunImpl->>RunImpl: Calculate input strides (0 for broadcast dims)
RunImpl->>CalcStrides: Calculate output strides
RunImpl->>Broadcast: Recursive broadcast with strides
Broadcast->>Broadcast: Iterate over dimensions
Note over Broadcast: stride=0 causes repetition<br/>along broadcast dimensions
end
RunImpl->>Client: Return filled tensor