Back to Reading List
[Reasoning]·PAP-GRSEED·March 17, 2026

Self-Consistency Improves Chain of Thought Reasoning in Language Models

Xuezhi Wang, Jason Wei, Dale Schuurmans et al.

4 min readReasoningScaling

Core Insight

Self-consistency in language models improves reasoning performance by over 17% on complex tasks.

Origin Story

arXiv preprint, February 2022Google ResearchXuezhi Wang, Jason Wei et al.

The Room

In a bustling Google Research lab, the team gathers around a whiteboard, surrounded by stacks of papers and empty coffee cups. They're grappling with the limits of current language models, frustrated by their inability to reason through complex tasks without stumbling. The room buzzes with a mix of tension and curiosity, as they ponder how to push beyond these boundaries.

The Bet

While many were focused on fine-tuning existing models, this team made a bold move: they decided to explore multiple reasoning paths simultaneously. It was a risky venture, and there were moments of doubt — especially when earlier tests showed little improvement. But in a last-minute push, they realized a unique way to harness self-consistency, transforming skepticism into cautious optimism.

The Blast Radius

Without this paper, the refined reasoning capabilities in models like PaLM and ChatGPT might not exist. These advances have reshaped how AI tackles complex, nuanced tasks. The authors continue to influence the field, with some leading innovative projects at Google, while others have ventured into new AI frontiers.

PaLMChatGPT

Knowledge Prerequisites

git blame for knowledge

To fully understand Self-Consistency Improves Chain of Thought Reasoning in Language Models, trace this dependency chain first. Papers in our library are linked — click to read them.

DIRECT PREREQIN LIBRARY
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Understanding how chain-of-thought prompts can encourage reasoning capabilities in language models is crucial before analyzing improvements made through self-consistency.

chain-of-thought promptingreasoning in language modelsprompt engineering
DIRECT PREREQIN LIBRARY
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

A comprehension of transformer architectures and their pre-training is essential for understanding advanced models that involve reasoning processes.

transformer architecturebidirectional transformerlanguage understanding
DIRECT PREREQIN LIBRARY
Training Compute-Optimal Large Language Models

Insights into the principles of training large language models efficiently lay the groundwork for understanding improvements in their reasoning capabilities.

training efficiencycompute-optimal modelslarge-scale model training
DIRECT PREREQIN LIBRARY
Scaling Laws for Neural Language Models

Knowledge of scaling laws helps in recognizing how model performance relates to size and training resources, foundational for grasping further optimizations like self-consistency.

scaling lawsmodel performanceresource efficiency
DIRECT PREREQIN LIBRARY
Attention Is All You Need

This seminal paper introduces the attention mechanisms that underpin most modern language models, providing the basis for any advanced exploration of their features, including reasoning.

attention mechanismtransformersself-attention

YOU ARE HERE

Self-Consistency Improves Chain of Thought Reasoning in Language Models

By the Numbers

17.9%

performance improvement on GSM8K

12.2%

performance improvement on AQuA

11.0%

performance improvement on SVAMP

In Plain English

The paper introduces '', a novel decoding strategy enhancing in s. By sampling diverse paths, it improves task performance: 17.9% on GSM8K and 12.2% on AQuA.

Explained Through an Analogy

Imagine solving a Rubik's Cube not by rushing to one solution, but by exploring different sequences until the colors magically align. This is self-consistency: embracing trial and diversity to uncover truth.

Go deeper for $6/mo

Everything a PM needs to turn this paper into a competitive edge — in under 10 minutes.

  • 2-page deep-dive article
  • Highlighted key passages
  • Expert-mode reading layer
  • PM Action Plan — 3 moves
  • Use cases for your product
  • Meeting talking points
  • Interactive paper simulator
  • Test Your Edge quiz

Already subscribed?

Log in

How grounded is this content?

Metrics are computed from available source text only — abstract, summary, and impact fields ingested into this system. Full paper PDF is not ingested; numerical claims that originate from within the paper body will not appear in these scores.

Source Richness100%

8 of 8 content fields populated. More fields = better-grounded generation.

Source Depth~228 words

Total source text analyzed by the model. Includes extended deep-dive summary — high confidence.

Number Grounding3 / 3

Key statistics whose numeric values appear verbatim in ingested source text. Unverified stats may originate from the full paper body.

Quote Traceability3 / 3

Key passages whose significant vocabulary (≥4-char words) overlap ≥35% with source text. Measures lexical traceability, not semantic accuracy.

Methodology: Number grounding uses regex digit extraction against source text. Quote traceability uses token set intersection on content words stripped of stop-words. Neither metric validates semantic correctness or factual accuracy against the original paper. For full verification, cross-reference with the original paper via the arXiv link above.