Back to Reading List
[Agents]·PAP-KWQQAJ·March 17, 2026

Reflexion: Language Agents with Verbal Reinforcement Learning

Noah Shinn, Federico Cassano, Ashwin Gopinath et al.

4 min readAgentsReasoning

Core Insight

Reflexion enables language agents to learn from feedback without costly retraining, enhancing decision-making efficiency.

Origin Story

arXiv preprint, April 2023OpenAINoah Shinn

The Room

A handful of researchers at OpenAI, pushing the boundaries of language models in 2023. The lab buzzes with anticipation and the hum of machines, but there's an undercurrent of impatience. They are weary of the inefficiencies—each refinement requiring a costly retraining cycle. They dream of a system that can learn more like a human, taking in feedback and adjusting on the fly.

The Bet

While others focused on scaling up data and computing power, this team gambled on something different: verbal reinforcement. The notion felt risky, almost whimsical, as if they were teaching a machine to self-correct with words alone. There were moments of doubt, especially when initial results were mixed, but they pressed on, believing that a more efficient learning process was just within reach.

The Blast Radius

Without this paper, adaptive language agents that can learn from verbal feedback might still be a distant goal. The potential for real-time learning systems in conversational AI would be stunted. The authors have since moved into diverse roles, some continuing their exploration of AI efficiencies, while others are venturing into new territories in tech innovation.

Adaptive Agents with ReflexionVerbal AI Decision Systems

Knowledge Prerequisites

git blame for knowledge

To fully understand Reflexion: Language Agents with Verbal Reinforcement Learning, trace this dependency chain first. Papers in our library are linked — click to read them.

DIRECT PREREQIN LIBRARY
Attention Is All You Need

This paper introduces the transformer architecture, which is foundational for understanding how modern language models, such as those used for language agents, are structured.

self-attentiontransformer modelpositional encoding
DIRECT PREREQIN LIBRARY
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Understanding BERT's approach to pre-training large language models is crucial for grasping how language models can be fine-tuned for specific tasks.

masked language modelingbidirectional contextfine-tuning
DIRECT PREREQIN LIBRARY
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

This paper explores using reinforcement learning to enhance reasoning abilities in large language models, a key technique in Reflexion.

reinforcement learningreasoning enhancementreward signals
DIRECT PREREQIN LIBRARY
Proximal Policy Optimization Algorithms

Understanding PPO is necessary since it is a common reinforcement learning algorithm potentially used in the Reflexion framework.

policy gradientclipping mechanismtrust region
DIRECT PREREQIN LIBRARY
ReAct: Synergizing Reasoning and Acting in Language Models

This paper introduces methods for combining reasoning and action in language models, which are relevant for language agents that learn via interaction.

reasoning-execution loopinteraction-driven learningagent

YOU ARE HERE

Reflexion: Language Agents with Verbal Reinforcement Learning

In Plain English

Reflexion agents improve decision-making by using verbal feedback stored in memory, bypassing model retraining. This method significantly outperforms baseline models in sequential decision-making, coding, and language reasoning tasks.

Explained Through an Analogy

Imagine an intern remembering each piece of advice given by a mentor, subtly adapting instead of needing a full retraining camp each time. Reflexion lets AI agents act like this savvy intern, using past conversations to guide future interactions intelligently.

Go deeper for $6/mo

Everything a PM needs to turn this paper into a competitive edge — in under 10 minutes.

  • 2-page deep-dive article
  • Highlighted key passages
  • Expert-mode reading layer
  • PM Action Plan — 3 moves
  • Use cases for your product
  • Meeting talking points
  • Interactive paper simulator
  • Test Your Edge quiz

Already subscribed?

Log in

How grounded is this content?

Metrics are computed from available source text only — abstract, summary, and impact fields ingested into this system. Full paper PDF is not ingested; numerical claims that originate from within the paper body will not appear in these scores.

Source Richness88%

7 of 8 content fields populated. More fields = better-grounded generation.

Source Depth~216 words

Total source text analyzed by the model. Includes extended deep-dive summary — high confidence.

Methodology: Number grounding uses regex digit extraction against source text. Quote traceability uses token set intersection on content words stripped of stop-words. Neither metric validates semantic correctness or factual accuracy against the original paper. For full verification, cross-reference with the original paper via the arXiv link above.