JunShern - Overview

Skip to content

Navigation Menu

Sign in

Appearance settings

Pinned Loading

  1. MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

    Python 1.5k 235

  2. Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

    Python 18.2k 2.9k

  3. Exploring Few-Shot Adaptation of Language Models with Tables

    Jupyter Notebook 24 1

  4. Skeleton/template for ML research codebase.

    Python 2

  5. Explorable tutorial for concepts in algorithmic music composition using p5.js-sound.

    JavaScript 36 8