Machine Learning Manuscripts

Research that shapes intelligent systems

Deep dives into machine learning, neural architectures, and AI research — written for those who want to understand, not just follow.

Read Latest Get the Newsletter

Papers Covered

Topics

∞

Hours of Reading

All 6 AI Safety 1 Computer Vision 1 Deep Learning 3 NLP 1 Uncategorized 0

Latest

Mixture of Experts – How Modern LLMs Scale Without Proportional Cost

Featured

Gemini Mistral

Mixture of Experts – How Modern LLMs Scale Without Proportional Cost

A useful mental model for thinking about large language models is that they are very large lookup tables. During training, they compress an enormous amount of information about…

MLM Papers

Jun 8, 2026 6 min read

Recent Manuscripts

alignment fine-tuning

RLHF Explained – How Language Models Learn to Follow Instructions

If you used an early GPT model – the kind available before 2022 – and asked it to explain something clearly, it would often respond by continuing your…

Jun 8, 2026 6 min read

Read

backpropagation gradients

The Vanishing Gradient Problem – Why Deep Networks Forget Where They Came From

There is a paradox buried in the design of deep neural networks. The deeper you make a network, the more expressive it should become – more layers means…

Jun 8, 2026 6 min read

Read

diffusion models generative AI

Diffusion Models Explained – From Noisy Images to State-of-the-Art Generation

For most of the 2010s, if you wanted to generate photorealistic images with a neural network, you worked with GANs – Generative Adversarial Networks. The results were impressive.…

Jun 6, 2026 6 min read

Read

factuality hallucination

Why Your Model Hallucinates – And What Researchers Are Actually Doing About It

If you’ve used a large language model for anything factual – a date, a citation, a person’s biography – you’ve probably encountered it: the model answers with complete…

Jun 6, 2026 5 min read

Read

architecture attention

Attention Is All You Need – But Do We Understand Why?

In 2017, a Google Brain paper with a quietly confident title rewired how the world thinks about sequence modeling. Attention Is All You Need didn’t just introduce a…

Jun 6, 2026 4 min read

Read

Research that shapes intelligent systems

Latest

Mixture of Experts – How Modern LLMs Scale Without Proportional Cost

Recent Manuscripts

RLHF Explained – How Language Models Learn to Follow Instructions

The Vanishing Gradient Problem – Why Deep Networks Forget Where They Came From

Diffusion Models Explained – From Noisy Images to State-of-the-Art Generation

Why Your Model Hallucinates – And What Researchers Are Actually Doing About It

Attention Is All You Need – But Do We Understand Why?

Stay at the frontier