danesherbs - Overview

Pinned Loading

  1. Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

    Python 18.2k 2.9k

  2. OpenAI Frontier Evals

    Python 1.2k 143

  3. MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

    Python 1.4k 236

  4. BitBlaster-16 is a 16-bit computer built from scratch using only NAND gates and data flip-flops as primitives! :)

    Python 2

  5. Want to get better at making better estimates under uncertainty? No? Well, now you can!

    Python 4

  6. Implementation of OpenAI's "Learning to Summarize with Human Feedback"

    Jupyter Notebook 7 1