HanFa - Overview
Navigation Menu
Pinned Loading
-
Concise proof-of-concep of speculative decoding for causal LMs. It compares standard sampling vs speculative decoding, and reports tokens/sec and average tokens accepted per draft call.
Python
HanFa - Overview
Concise proof-of-concep of speculative decoding for causal LMs. It compares standard sampling vs speculative decoding, and reports tokens/sec and average tokens accepted per draft call.
Python