hijkzzz - Overview

🔭 I'm a RLer + NLPer/2 + MLSyser/2.

Pinned Loading

An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

Python 9.1k 889
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6.9k 369
Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

Python 711 134
A Multi-threaded Implementation of AlphaZero (C++)

Python 387 49
Multi-agent PPO with noise (97% win rates on Hard scenarios of SMAC)

Python 76 6
Convolutional Neural Network with CUDA (MNIST 99.23%)

C++ 196 40