[feat] Add `SeqlenBalancedSampler` and enhance `StreamingDataset` support by NINGBENZHE · Pull Request #70 · Ascend/TransferQueue
- Add SeqlenBalancedSampler based on Karmarkar-Karp algorithm to balance sequence lengths across DP ranks for GRPO training - Add streaming mode support for StreamingDataset via should_check_consumption_status parameter - Add polling_mode sampler cache lookup in controller to avoid redundant sampling when data is insufficient - Replace print() with logger.info() in controller - Downgrade 1D tensor warnings to info level in client and metadata - Add comprehensive unit tests for SeqlenBalancedSampler and KarmarkarKarp Signed-off-by: 宁本哲 <ningbenzhe@xiaohongshu.com>
0oshowero0
changed the title
[Feat] Add SeqlenBalancedSampler and enhance StreamingDataset support
[feat] Add SeqlenBalancedSampler and enhance StreamingDataset support
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters