JunShern - Overview

Pinned Loading

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

Python 1.5k 235
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 18.2k 2.9k
Exploring Few-Shot Adaptation of Language Models with Tables

Jupyter Notebook 24 1
Skeleton/template for ML research codebase.

Python 2
Explorable tutorial for concepts in algorithmic music composition using p5.js-sound.

JavaScript 36 8