[feat] Add `SeqlenBalancedSampler` and enhance `StreamingDataset` support by NINGBENZHE · Pull Request #70 · Ascend/TransferQueue

@NINGBENZHE

- Add SeqlenBalancedSampler based on Karmarkar-Karp algorithm to balance
  sequence lengths across DP ranks for GRPO training
- Add streaming mode support for StreamingDataset via
  should_check_consumption_status parameter
- Add polling_mode sampler cache lookup in controller to avoid redundant
  sampling when data is insufficient
- Replace print() with logger.info() in controller
- Downgrade 1D tensor warnings to info level in client and metadata
- Add comprehensive unit tests for SeqlenBalancedSampler and KarmarkarKarp

Signed-off-by: 宁本哲 <ningbenzhe@xiaohongshu.com>

@0oshowero0 0oshowero0 changed the title [Feat] Add SeqlenBalancedSampler and enhance StreamingDataset support [feat] Add SeqlenBalancedSampler and enhance StreamingDataset support

Apr 1, 2026

0oshowero0

0oshowero0

0oshowero0

0oshowero0

0oshowero0

0oshowero0

@NINGBENZHE

Signed-off-by: 宁本哲 <ningbenzhe@xiaohongshu.com>

@NINGBENZHE

Signed-off-by: 宁本哲 <ningbenzhe@xiaohongshu.com>

0oshowero0