THUKEG
GLM (General Language Model)
Python 3.4k 345
slime is an LLM post-training framework for RL Scaling.
Python 4.5k 598
An optimized deep prompt tuning strategy comparable to fine-tuning across scales and tasks
Python 2.1k 207
ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search (NeurIPS 2024)
Python 692 50
RL Scaling and Test-Time Scaling (ICML'25)
114 1
Scaling Agentic Reinforcement Learning with a Multi-Turn, Multi-Task Framework
Python 230 16