Ship AI Without the GPU Bill
4 papers on making models smaller, faster, and cheaper. Directly relevant to every PM deciding what to build on.
Edward Hu et al.
LoRA slashes fine-tuning costs by 10,000x and GPUs by 3x while preserving quality on large language models.
Why this paper
Fine-tune any model with 10,000x fewer trainable parameters. The technique behind every custom model your team might build.
flash attention — coming soon
3x faster training, lower memory — why Llama-2, GPT-4, and Claude use custom attention kernels in production.
Albert Q. Jiang et al.
Mixtral 8x7B revolutionizes efficiency, beating Llama 2 70B while using only 12.9B parameters per token.
Why this paper
Sparse expert routing: 47B total params but only 13B active per token. The key to cost-efficient scale.
Hugo Touvron et al.
Llama 2 outperforms open-source chat models, challenging its closed-source rivals in safety and dialogue optimization.
Why this paper
Open source at scale. Llama-2 changed what's possible for product teams without $100k GPU budgets.
Unlock the full analysis for each paper
Deep-dive articles, expert annotations, PM action plans, and interactive experiments — all for $6/mo.
Go Pro — $6/mo