Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0. by mzient · Pull Request #6071 · NVIDIA/DALI
bot reviewed Oct 21, 2025
mzient
changed the title
Always use proper CUDA stream for GPU Tensor(List) copy.
Always use proper CUDA stream for GPU Tensor(List) copy. Don't use stream 0.
mdabek-nvidia pushed a commit to mdabek-nvidia/DALI that referenced this pull request
Nov 27, 2025…ream 0. (NVIDIA#6071) * Use proper stream in TensorList and Tensor copy. * Fix usages of DynamicScratchpad. * Set non-host stream when setting last input for repeat-last inputs. * Fix C API tests. * Refactor copy stream/device selection Signed-off-by: Michał Zientkiewicz <mzient@gmail.com> * Review: copy_to_external. * WAR device selection for H2D and D2H copy and copy with special stream. Known issues: D2D copy across devices doesn't work with buffers allocated with VMM API, regardless of permissions used in `cuMemSetAccess`. --------- Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters