AI Research Digest·Vol. 2026 / Wk 12·62 Papers · 9 Fields

Weekly Research Digest

AI Papers for Product Managers.

The seminal AI research, distilled into actionable product insights every week.

62 Curated Papers9 Categories0 Must-Reads

Go deeper for $6/mo

Full article, action plan, use cases, quiz — every paper, every week.

Personalize your feed — tell us what you work on and we'll rank papers by relevance.

Set up profile →

[ Curated Learning Paths ]

Paper of the Day
ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

ReAct fuses reasoning and acting in LLMs, enabling real-time interaction with external tools for superior results.

ReasoningAgentsTool Use Pro Deep Dive 2 min read
Free — Full Access

Try before you subscribe

These papers are fully unlocked — deep dive, action plan, simulator, and quiz included. No account needed.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI

DeepSeek-R1 uses RL to supercharge reasoning in LLMs, rivaling OpenAI with no supervised fine-tuning.

ReasoningTrainingFree PreviewJan 2025 2 min read

LoRA: Low-Rank Adaptation of Large Language Models

Edward Hu, Yelong Shen, Phillip Wallis et al.

LoRA slashes fine-tuning costs by 10,000x and GPUs by 3x while preserving quality on large language models.

EfficiencyTrainingFree PreviewOct 2021 2 min read

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao, Dian Yu, Jeffrey Zhao et al.

Tree of Thoughts enhances language models by enabling strategic, multi-path reasoning for complex problem solving.

ReasoningFree Preview 2 min read

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

ReAct fuses reasoning and acting in LLMs, enabling real-time interaction with external tools for superior results.

ReasoningAgentsTool UseFree Preview 2 min read

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans et al.

Chain-of-Thought Prompting elevates reasoning in LLMs, outperforming finetuned GPT-3 on complex math tasks.

ReasoningFree Preview 2 min read

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu et al.

Train AI with its own feedback to reduce need for human labels and increase precision in behavior control.

AlignmentSafetyFree PreviewDec 2022 2 min read

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang et al.

InstructGPT outperforms GPT-3 using human feedback, showing size isn't everything in AI models.

AlignmentTrainingFree PreviewMar 2022 2 min read

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan et al.

Larger language models offer more sample efficiency, enabling better results with smaller datasets and fixed compute resources.

ScalingTrainingFree PreviewJan 2020 2 min read

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar et al.

Transformers revolutionize AI by ditching recurrence and convolutions, shining with sheer parallelizable efficiency.

ArchitectureScalingFree PreviewJun 2017 2 min read
Timeline
15 papers

Reasoning

The science behind o1, o3 — the fastest-growing product line in AI.

OpenAI o3 System Card

OpenAI

o3 achieves human-level reasoning, setting new AI benchmarks and exceeding 99.8% of competitive programmers.

ReasoningSafetyScaling ProApr 2025 2 min⚖ compare

QwQ-32B: Embracing the Intelligence Era

Qwen Team, Alibaba Group

QwQ-32B matches 671B param models using RL, revolutionizing size-efficiency in AI reasoning.

ReasoningOpen SourceTraining ProMar 2025 2 min⚖ compare

Gemini 2.5 Pro Technical Report

Google DeepMind

Gemini 2.5 Pro tops major AI benchmarks with a novel thinking mode and unprecedented 1M token context.

ReasoningMultimodalScaling ProMar 2025 2 min⚖ compare

Gemini 2.5 Pro Technical Report

Google DeepMind

Gemini 2.5 Pro pushes boundaries with unparalleled reasoning and multimodal capabilities, redefining AI benchmarks globally.

ReasoningMultimodalScaling ProMar 2025 2 min⚖ compare

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

ByteDance Seed, Qiying Yu, Zheng Zhang et al.

DAPO: Raising the bar in LLM training with open-source reinforcement learning breakthroughs.

ReasoningTrainingOpen Source ProMar 2025 2 min⚖ compare

Claude 3.7 Sonnet: Extended Thinking

Anthropic

Claude 3.7 Sonnet redefines AI reasoning with extended thinking, outperforming the competition on complex tasks like coding.

ReasoningAlignmentSafety ProFeb 2025 2 min⚖ compare

Kimi k1.5: Scaling Reinforcement Learning with LLMs

Kimi Team, Moonshot AI

Long-context RL brings LLMs closer to true reasoning, enhancing AI's problem-solving abilities.

ReasoningTrainingScaling ProJan 2025 2 min⚖ compare

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

DeepSeek-AI

DeepSeek-R1 uses RL to supercharge reasoning in LLMs, rivaling OpenAI with no supervised fine-tuning.

ReasoningTrainingFree PreviewJan 2025 2 min⚖ compare

OpenAI o1: Learning to Reason with LLMs

OpenAI

OpenAI o1 redefines AI reasoning, matching PhD-level performance in science and programming challenges.

ReasoningTrainingScaling ProSep 2024 2 min⚖ compare

Scaling LLM Test-Time Compute Optimally

Charlie Snell, Jaehoon Lee, Kelvin Xu, Aviral Kumar

Smaller models can beat larger ones by optimizing test-time compute for problem difficulty.

ReasoningScalingTraining ProAug 2024 2 min⚖ compare

Let's Verify Step by Step

Hunter Lightman, Vineet Kosaraju, Yura Burda et al.

Process supervision beats outcome supervision in AI reasoning accuracy—think 78.2% vs 72.4% success in math tasks.

ReasoningAlignmentTraining ProMay 2023 2 min⚖ compare

Sparks of Artificial General Intelligence: Early Experiments with GPT-4

Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan et al.

GPT-4 edges closer to AGI, excelling in diverse tasks from law to vision.

ReasoningMultimodalSafety ProMar 2023 2 min⚖ compare

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans et al.

Self-consistency in language models improves reasoning performance by over 17% on complex tasks.

ReasoningScaling Pro 2 min⚖ compare

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Shunyu Yao, Dian Yu, Jeffrey Zhao et al.

Tree of Thoughts enhances language models by enabling strategic, multi-path reasoning for complex problem solving.

ReasoningFree Preview 2 min⚖ compare

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans et al.

Chain-of-Thought Prompting elevates reasoning in LLMs, outperforming finetuned GPT-3 on complex math tasks.

ReasoningFree Preview 2 min⚖ compare
8 papers

Multimodal

Vision, image generation, and omni-models — GPT-4o and Sora's foundations.

Llama 4: The Frontier of Multimodal Intelligence

Meta AI

Llama 4 sets new standards in open-source AI with powerful multimodal capabilities and unmatched context window.

MultimodalOpen SourceArchitectureMoE ProApr 2025 2 min⚖ compare

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Google DeepMind

Gemini 1.5 Pro sets a new benchmark with near-perfect retrieval across millions of tokens.

MultimodalArchitectureMoE ProDec 2023 2 min⚖ compare

GPT-4 Technical Report

OpenAI

GPT-4: Human-like performance on professional exams signals a new era of AI collaboration.

MultimodalArchitecture ProMar 2023 2 min⚖ compare

High-Resolution Image Synthesis with Latent Diffusion Models

Robin Rombach, Andreas Blattmann, Dominik Lorenz et al.

Latent space diffusion cuts AI image generation from 100s of GPU days to a fraction while retaining quality.

MultimodalArchitecture ProDec 2022 2 min⚖ compare

Robust Speech Recognition via Large-Scale Weak Supervision

Alec Radford, Jong Wook Kim, Tao Xu et al.

Whisper approaches human-level speech accuracy using vast weakly supervised audio data from the internet.

MultimodalTrainingOpen Source ProDec 2022 2 min⚖ compare

Flamingo: a Visual Language Model for Few-Shot Learning

Jean-Baptiste Alayrac, Jeff Donahue, Pauline Luc et al.

Flamingo redefines few-shot learning by outperforming extensively fine-tuned models with minimal task-specific data.

MultimodalArchitecture ProApr 2022 2 min⚖ compare

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol et al.

Hierarchical models boost image generation diversity without losing realism, even matching styles like a digital Picasso.

Multimodal ProFeb 2021 2 min⚖ compare

Learning Transferable Visual Models From Natural Language Supervision

Alec Radford, Jong Wook Kim, Chris Hallacy et al.

CLIP bridges vision and language, unlocking powerful image models without traditional labeled datasets.

Multimodal ProFeb 2021 2 min⚖ compare
13 papers

Architecture

The bedrock — every AI PM is expected to speak fluently about these.

Phi-4 Technical Report

Marah Abdin, Jyoti Aneja, Harkirat Behl et al.

Phi-4 sets a new standard using synthetic data to match GPT-4o's STEM skills with fewer parameters.

ArchitectureEfficiencyOpen SourceReasoning ProDec 2024 2 min⚖ compare

DeepSeek-V3 Technical Report

DeepSeek-AI

DeepSeek-V3 matches GPT-4o with less compute; frontier AI on non-frontier budgets.

ArchitectureMoEEfficiencyOpen Source ProDec 2024 2 min⚖ compare

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan et al.

Phi-3-mini puts a GPT-3.5 rival in your pocket, thanks to better data, not more parameters.

ArchitectureEfficiencyOpen Source ProApr 2024 2 min⚖ compare

Mixtral of Experts

Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux et al.

Mixtral 8x7B revolutionizes efficiency, beating Llama 2 70B while using only 12.9B parameters per token.

ArchitectureMoEEfficiency ProJan 2024 1 min⚖ compare

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Albert Gu, Tri Dao

Mamba models outpace Transformers with 5x throughput and linear scaling for long-sequence tasks.

ArchitectureEfficiency ProDec 2023 1 min⚖ compare

Fast Inference from Transformers via Speculative Decoding

Yaniv Leviathan, Matan Kalman, Yossi Matias

Speculative decoding accelerates Transformer inference by 2-3x with identical output quality.

ArchitectureEfficiency ProMay 2023 2 min⚖ compare

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

William Fedus, Barret Zoph, Noam Shazeer

Switch Transformers scale models to trillion parameters with efficient sparsity and faster pre-training.

ArchitectureMoEScalingEfficiency ProJan 2021 2 min⚖ compare

Language Models are Few-Shot Learners

Tom Brown, Benjamin Mann, Nick Ryder et al.

GPT-3 scales up to 175 billion parameters, acing tasks with few examples and no fine-tuning.

ArchitectureScaling ProMay 2020 2 min⚖ compare

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan et al.

Larger language models offer more sample efficiency, enabling better results with smaller datasets and fixed compute resources.

ScalingTrainingFree PreviewJan 2020 2 min⚖ compare

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

BERT revolutionizes NLP by learning context from both directions, improving accuracy across key benchmarks.

ArchitectureTraining ProOct 2018 2 min⚖ compare

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar et al.

Transformers revolutionize AI by ditching recurrence and convolutions, shining with sheer parallelizable efficiency.

ArchitectureScalingFree PreviewJun 2017 2 min⚖ compare

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré

FlashAttention accelerates Transformers by 15% and cuts memory demand, revolutionizing long-sequence efficiency.

ArchitectureEfficiency Pro 2 min⚖ compare

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus et al.

RAG models redefine NLP by combining retrieval and generation, achieving state-of-the-art boosts in open domain QA tasks.

RAGArchitecture Pro 2 min⚖ compare
5 papers

Open Source

Open-weight models reshaping the competitive landscape and what ships in products.

Qwen2.5 Technical Report

Qwen Team, Alibaba Group

Qwen2.5-72B rivals GPT-4o, redefining open-source AI capabilities in STEM and multilingual tasks.

Open SourceArchitectureScaling ProSep 2024 2 min⚖ compare

Gemma 2: Improving Open Language Models at a Practical Size

Google DeepMind

Gemma 2 matches bigger closed models in performance with smaller, efficient open architectures.

Open SourceArchitectureEfficiency ProJun 2024 2 min⚖ compare

The Llama 3 Herd of Models

Meta AI

Llama 3 pushes boundaries with a massive 405B-parameter model supporting 128K token context.

Open SourceArchitectureMultimodal ProApr 2024 2 min⚖ compare

Mistral 7B

Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch et al.

Mistral 7B shatters barriers by outperforming larger models like Llama 2 13B with just 7 billion parameters.

Open SourceArchitectureEfficiency ProOct 2023 2 min⚖ compare

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone et al.

Llama 2 outperforms open-source chat models, challenging its closed-source rivals in safety and dialogue optimization.

Open SourceSafety ProJul 2023 2 min⚖ compare
6 papers

Alignment

Making models helpful, harmless, and honest — the core product differentiator.

GRPO: Group Relative Policy Optimization for Reasoning

DeepSeek-AI

GRPO halves RL training resource needs for advanced reasoning in AI, making it a standard approach by 2025.

AlignmentReasoningTraining ProFeb 2024 2 min⚖ compare

Constitutional AI: Harmlessness from AI Feedback

Yuntao Bai, Saurav Kadavath, Sandipan Kundu et al.

Train AI with its own feedback to reduce need for human labels and increase precision in behavior control.

AlignmentSafetyFree PreviewDec 2022 2 min⚖ compare

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang et al.

InstructGPT outperforms GPT-3 using human feedback, showing size isn't everything in AI models.

AlignmentTrainingFree PreviewMar 2022 2 min⚖ compare

Learning to Summarize with Human Feedback

Nisan Stiennon, Long Ouyang, Jeff Wu et al.

Reinforcement learning aligns AI summarization with human preferences, outperforming GPT-3.

AlignmentTraining ProSep 2020 2 min⚖ compare

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, Oleg Klimov

PPO simplifies RL, optimizing AI training with fewer resources and boosting performance across top tech firms.

AlignmentTraining ProJul 2017 2 min⚖ compare

Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Rafael Rafailov, Archit Sharma, Eric Mitchell et al.

DPO makes chatbots more predictable by turning language models into reward models without complex RL training.

AlignmentEfficiency Pro 2 min⚖ compare
10 papers

Agents

The frontier — Operator, Deep Research, Codex agents.

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

Carlos E. Jimenez, John Yang, Alexander Wettig et al.

Current AI models barely scratch the surface in solving real-world software issues from GitHub.

AgentsTool Use ProOct 2023 2 min⚖ compare

AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation

Qingyun Wu, Gagan Bansal, Jieyu Zhang et al.

AutoGen empowers multi-agent LLM apps with interactive, customizable agent conversations enhancing development flexibility.

AgentsTool Use ProAug 2023 2 min⚖ compare

Voyager: An Open-Ended Embodied Agent with Large Language Models

Guanzhi Wang, Yuqi Xie, Yunfan Jiang et al.

Voyager sets a new standard in AI autonomy by outpacing previous models in Minecraft with 15.3x tech advances.

AgentsTool UseReasoning ProMay 2023 2 min⚖ compare

Generative Agents: Interactive Simulacra of Human Behavior

Joon Sung Park, Joseph C. O'Brien, Carrie J. Cai et al.

Generative agents simulate life-like human behavior, making AI feel more authentic and engaging.

AgentsReasoning ProApr 2023 2 min⚖ compare

Reflexion: Language Agents with Verbal Reinforcement Learning

Noah Shinn, Federico Cassano, Ashwin Gopinath et al.

Reflexion enables language agents to learn from feedback without costly retraining, enhancing decision-making efficiency.

AgentsReasoning ProMar 2023 2 min⚖ compare

Competition-Level Code Generation with AlphaCode

Yujia Li, David Choi, Junyoung Chung et al.

AlphaCode ranks in top 54.3% of competitive programmers, showcasing AI's coding prowess.

AgentsReasoning ProFeb 2022 2 min⚖ compare

Evaluating Large Language Models Trained on Code

Mark Chen, Jerry Tworek, Heewoo Jun et al.

Codex rewrites the future of code with a 70.2% success rate, leaving GPT-3's 0% in the dust.

AgentsTool Use ProJul 2021 2 min⚖ compare

AgentBench: Evaluating LLMs as Agents

Xiao Liu, Hao Yu, Hanchen Zhang et al.

AgentBench shows LLMs like GPT-4 excel at acting autonomously, outpacing open-source rivals significantly.

Agents Pro 2 min⚖ compare

Toolformer: Language Models Can Teach Themselves to Use Tools

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì et al.

Toolformer empowers language models to smartly use APIs, rivaling larger models’ performance with fewer resources.

AgentsTool Use Pro 2 min⚖ compare

ReAct: Synergizing Reasoning and Acting in Language Models

Shunyu Yao, Jeffrey Zhao, Dian Yu et al.

ReAct fuses reasoning and acting in LLMs, enabling real-time interaction with external tools for superior results.

ReasoningAgentsTool UseFree Preview 2 min⚖ compare
1 papers

Scaling

Previously featured papers from our weekly automation.

Emergent Abilities of Large Language Models

Jason Wei, Yi Tay, Rishi Bommasani et al.

Larger language models develop unexpected skills, challenging our predictions and scaling strategies.

ScalingReasoning ProJun 2022 2 min⚖ compare
2 papers

Training

How models are built — scaling laws, data mixtures, and what makes them capable.

LoRA: Low-Rank Adaptation of Large Language Models

Edward Hu, Yelong Shen, Phillip Wallis et al.

LoRA slashes fine-tuning costs by 10,000x and GPUs by 3x while preserving quality on large language models.

EfficiencyTrainingFree PreviewOct 2021 2 min⚖ compare

Training Compute-Optimal Large Language Models

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch et al.

Training models with balanced size and tokens outperforms bloated giants like GPT-3 and Megatron.

ScalingTrainingEfficiency Pro 2 min⚖ compare
2 papers

Safety

Preparedness Framework and safety culture — central to every product decision.

Measuring Massive Multitask Language Understanding

Dan Hendrycks, Collin Burns, Steven Basart et al.

GPT-3 narrows gap to human-level multitask performance with 20% boost over chance on MMLU benchmark.

SafetyReasoning ProSep 2020 2 min⚖ compare

TruthfulQA: Measuring How Models Mimic Human Falsehoods

Stephanie Lin, Jacob Hilton, Owain Evans

Larger AI models may not mean more truthful results, contradicting the bigger-is-better narrative.

Safety Pro 2 min⚖ compare

Stay ahead of the curve.

Every Sunday, get the top AI papers distilled into actionable product insights. Full methodological breakdowns, industry impact analysis, and exclusive deep dives.