Announcing: 𝗪𝗪-𝗣𝗚𝗗 — 𝗪𝗲𝗶𝗴𝗵𝘁𝗪𝗮𝘁𝗰𝗵𝗲𝗿 𝗣𝗿𝗼𝗷𝗲𝗰𝘁𝗲𝗱 𝗚𝗿𝗮𝗱𝗶𝗲𝗻𝘁 𝗗𝗲𝘀𝗰𝗲𝗻𝘁 🚀 I just released WW-PGD, a small PyTorch add-on that wraps standard optimizers…

There is a deep connection between the open-source weightwatcher tool, which implements ideas from the theory of Heavy Tailed Self-Regularization…

Recently, Meta released LLama3.2 1B and 3B Instruct Fine Tuned LLM. To mixed reviews. On the one hand, it’s ranking…

Are you Fine-Tuning an open-source LLMs ? Like Llama, Mistral, or Qwen? A That is, Instruct Fine Tuning. Whether you…

Double Descent (DD) is something that has surprised statisticians, computer scientists, and deep learning practitioners–but it was known in the…

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There:…

Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other…

Evaluating LLMs is hard. Especially when you don’t have a lot of test data.In the last post, we saw how to…

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen…

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks…