hijkzzz - Overview
🔭 I'm a RLer + NLPer/2 + MLSyser/2.
An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)
Python 9.1k 889
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
6.9k 369
Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)
Python 711 134
A Multi-threaded Implementation of AlphaZero (C++)
Python 387 49
Multi-agent PPO with noise (97% win rates on Hard scenarios of SMAC)
Python 76 6
Convolutional Neural Network with CUDA (MNIST 99.23%)
C++ 196 40