hijkzzz - Overview

View hijkzzz's full-sized avatar

Block or report hijkzzz

🔭 I'm a RLer + NLPer/2 + MLSyser/2.

Jian Hu's GitHub stats

Pinned Loading

  1. An Easy-to-use, Scalable and High-performance Agentic RL Framework based on Ray (PPO & DAPO & REINFORCE++ & TIS & vLLM & Ray & Async RL)

    Python 9.1k 889

  2. A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

    6.9k 369

  3. Fine-tuned MARL algorithms on SMAC (100% win rates on most scenarios)

    Python 711 134

  4. A Multi-threaded Implementation of AlphaZero (C++)

    Python 387 49

  5. Multi-agent PPO with noise (97% win rates on Hard scenarios of SMAC)

    Python 76 6

  6. Convolutional Neural Network with CUDA (MNIST 99.23%)

    C++ 196 40