Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0. by mzient · Pull Request #6071 · NVIDIA/DALI

greptile-apps[bot]

bot reviewed Oct 21, 2025

@mzient mzient changed the title Always use proper CUDA stream for GPU Tensor(List) copy. Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0.

Oct 21, 2025

JanuszL

mzient

klecki

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>
Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

@mzient

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

@mzient

…stream.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

@mzient

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mdabek-nvidia pushed a commit to mdabek-nvidia/DALI that referenced this pull request

Nov 27, 2025
…ream 0. (NVIDIA#6071)

* Use proper stream in TensorList and Tensor copy.
* Fix usages of DynamicScratchpad.
* Set non-host stream when setting last input for repeat-last inputs.
* Fix C API tests.
* Refactor copy stream/device selection

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

* Review: copy_to_external.
* WAR device selection for H2D and D2H copy and copy with special stream.

Known issues: D2D copy across devices doesn't work with buffers allocated with VMM API, regardless of permissions used in `cuMemSetAccess`.

---------
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>