Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0. by mzient · Pull Request #6071

Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0. by mzient · Pull Request #6071 · NVIDIA/DALI

bot reviewed Oct 21, 2025

mzient changed the title ~~Always use proper CUDA stream for GPU Tensor(List) copy.~~ Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0.

Oct 21, 2025

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

…stream.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mdabek-nvidia pushed a commit to mdabek-nvidia/DALI that referenced this pull request

Nov 27, 2025

…ream 0. (NVIDIA#6071)

* Use proper stream in TensorList and Tensor copy.
* Fix usages of DynamicScratchpad.
* Set non-host stream when setting last input for repeat-last inputs.
* Fix C API tests.
* Refactor copy stream/device selection

Signed-off-by: Michał Zientkiewicz <mzient@gmail.com>

* Review: copy_to_external.
* WAR device selection for H2D and D2H copy and copy with special stream.

Known issues: D2D copy across devices doesn't work with buffers allocated with VMM API, regardless of permissions used in `cuMemSetAccess`.

---------
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>