Back to Reading List
[Reasoning]·PAP-JQX34T·March 18, 2026

OpenAI o1: Learning to Reason with LLMs

OpenAI

4 min readReasoningTrainingScaling

Core Insight

OpenAI o1 redefines AI reasoning, matching PhD-level performance in science and programming challenges.

Origin Story

arXiv preprintOpenAIIlya Sutskever, Dario Amodei et al.

The Room

A small group of researchers at OpenAI, late at night. Their office, dimly lit by the glow of monitors, is filled with stacks of papers and empty coffee cups. The team is restless, dissatisfied with the limits of AI's reasoning capabilities. They want more than just pattern recognition; they crave understanding.

The Bet

Instead of incremental improvements, they decided to push large language models to reason like humans. It was a risky gamble, considering the complexity of human reasoning. Doubts lingered, especially during late-night debugging sessions when the models didn’t behave as expected. They wondered if they'd gone too far, if the ambition had outpaced the tools.

The Blast Radius

Without this work, tools like ChatGPT and Codex wouldn't exist in their current form, reshaping how we interact with AI in everyday life. The authors continued to pioneer AI advancements, some leading projects at OpenAI, others inspiring new research directions across the globe.

ChatGPTCodex

Knowledge Prerequisites

git blame for knowledge

To fully understand OpenAI o1: Learning to Reason with LLMs, trace this dependency chain first. Papers in our library are linked — click to read them.

DIRECT PREREQIN LIBRARY
Attention Is All You Need

Understanding the transformer architecture is crucial since it's the foundational model for large language models like o1.

transformerattention mechanismself-attention
DIRECT PREREQIN LIBRARY
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Familiarity with BERT's pre-training techniques will help in understanding how large language models are built and fine-tuned.

masked language modelpre-trainingfine-tuning
DIRECT PREREQIN LIBRARY
Scaling Laws for Neural Language Models

This paper provides insights on how scaling language models affects performance, which is essential to grasp the benefits of o1's size and capabilities.

scaling lawsmodel performanceparameter size
DIRECT PREREQIN LIBRARY
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Understanding chain-of-thought prompting is key to knowing how o1 uses reasoning steps at inference.

chain-of-thought promptingreasoning processmulti-step reasoning
DIRECT PREREQIN LIBRARY
Proximal Policy Optimization Algorithms

Proximal Policy Optimization is related to reinforcement learning methods used in training models like o1.

reinforcement learningpolicy gradientoptimization

YOU ARE HERE

OpenAI o1: Learning to Reason with LLMs

By the Numbers

89th percentile

Codeforces performance

3 domains

Exceeded PhD-level accuracy

1000x

More efficient problem solving

90%

Accuracy in GPQA tasks

In Plain English

OpenAI o1 is a language model trained to think like PhD students in complex science and programming tasks. It scores in the 89th percentile on Codeforces and excels in physics, chemistry, and biology benchmarks.

Explained Through an Analogy

Imagine a chess grandmaster pondering their moves for hours before executing a flawless strategy in seconds. OpenAI o1 mirrors this by conceiving complex solutions internally before engaging in dialogue.

Go deeper for $6/mo

Everything a PM needs to turn this paper into a competitive edge — in under 10 minutes.

  • 2-page deep-dive article
  • Highlighted key passages
  • Expert-mode reading layer
  • PM Action Plan — 3 moves
  • Use cases for your product
  • Meeting talking points
  • Interactive paper simulator
  • Test Your Edge quiz

Already subscribed?

Log in

How grounded is this content?

Metrics are computed from available source text only — abstract, summary, and impact fields ingested into this system. Full paper PDF is not ingested; numerical claims that originate from within the paper body will not appear in these scores.

Source Richness100%

8 of 8 content fields populated. More fields = better-grounded generation.

Source Depth~247 words

Total source text analyzed by the model. Includes extended deep-dive summary — high confidence.

Number Grounding1 / 4

Key statistics whose numeric values appear verbatim in ingested source text. Unverified stats may originate from the full paper body.

Quote Traceability3 / 3

Key passages whose significant vocabulary (≥4-char words) overlap ≥35% with source text. Measures lexical traceability, not semantic accuracy.

Methodology: Number grounding uses regex digit extraction against source text. Quote traceability uses token set intersection on content words stripped of stop-words. Neither metric validates semantic correctness or factual accuracy against the original paper. For full verification, cross-reference with the original paper via the arXiv link above.