Add proper broadcasting logic to constant_value operator family. by mzient · Pull Request #6104 · NVIDIA/DALI

Greptile Overview

Greptile Summary

This PR fixes incorrect broadcasting behavior in the constant_value operator family (Full, FullLike, etc.). Previously, data was repeated along the fastest (innermost) dimension regardless of shape, producing unexpected patterns for non-trivial input shapes.

Key changes:

  • Introduced RepeatInner function for efficient 1D tiling with optimizations for power-of-2 sizes
  • Added Broadcast function for general n-dimensional numpy-style broadcasting using stride-based iteration
  • Optimized dispatch: Uses fast RepeatInner path when input is effectively 1D (scalar or when in_shape.back() == volume(in_shape)), falls back to general Broadcast for complex shapes
  • The stride calculation correctly handles right-to-left alignment (numpy semantics), setting stride=0 for broadcasted dimensions

Testing:

  • Comprehensive new tests covering scalar inputs, simple 1D broadcasting (including power-of-2 inner extents), and complex multi-dimensional broadcasting along various axes
  • All tests validate against numpy's np.full behavior for correctness

Confidence Score: 4/5

  • This PR is safe to merge - it fixes a bug in broadcasting logic with proper test coverage validating against numpy behavior.
  • Score reflects well-tested bug fix with comprehensive test cases. The implementation follows established patterns in the codebase (stride-based broadcasting) and uses existing utility functions. Minor deduction because the code uses assert in production code which may not halt in release builds, though the assertion condition is verified earlier in CanBroadcastShapes.
  • The constant_value.cc implementation is the primary focus - the broadcasting stride calculation and recursive broadcast function should be verified for correctness with edge cases.

Important Files Changed

File Analysis

Filename Score Overview
dali/operators/generic/constant_value.cc 4/5 Adds numpy-style broadcasting logic for the constant_value operator family. Introduces RepeatInner for efficient 1D tiling and Broadcast for general n-dimensional broadcasting with stride-based iteration. The code correctly handles scalar inputs, 1D-like inputs (fast path), and general multi-dimensional broadcasting cases.
dali/test/python/operator_1/test_constant_value.py 5/5 Adds comprehensive test cases for the new broadcasting behavior. Tests cover scalar inputs, simple 1D broadcasting (including power-of-2 inner extents), and complex multi-dimensional broadcasting along various axes. All tests compare against numpy's np.full for correctness verification.

Sequence Diagram

sequenceDiagram
    participant Client
    participant RunImpl as ConstantValue::RunImpl
    participant RepeatInner
    participant Broadcast
    participant CalcStrides

    Client->>RunImpl: Execute operator
    RunImpl->>RunImpl: Get output shapes and input fill_value
    
    alt Input is scalar or 1D-like (in_shape.back() == volume(in_shape))
        RunImpl->>RepeatInner: Fast path tiling
        RepeatInner->>RepeatInner: Check if in_size is power of 2
        alt Power of 2
            RepeatInner->>RepeatInner: Use bitwise AND for index
        else Not power of 2
            RepeatInner->>RepeatInner: Use block copy with remainder
        end
    else General n-dimensional broadcasting
        RunImpl->>RunImpl: Calculate input strides (0 for broadcast dims)
        RunImpl->>CalcStrides: Calculate output strides
        RunImpl->>Broadcast: Recursive broadcast with strides
        Broadcast->>Broadcast: Iterate over dimensions
        Note over Broadcast: stride=0 causes repetition<br/>along broadcast dimensions
    end
    
    RunImpl->>Client: Return filled tensor
Loading