Less is More: Recursive Reasoning with Tiny Networks

Paper at a Glance

Paper Title: Less is More: Recursive Reasoning with Tiny Networks
Authors: Alexia Jolicoeur-Martineau
Affiliation: Samsung SAIL Montréal
Published in: arXiv 2025
Link to Paper: https://arxiv.org/abs/2510.04871

The Gist of It: TL;DR

In one sentence: This paper introduces the Tiny Recursive Model (TRM), a simple and remarkably parameter-efficient approach that uses a single, 2-layer network to recursively improve its own answers, achieving state-of-the-art results on hard reasoning puzzles where even massive Large Language Models (LLMs) fail.

Why It Matters: The Big Picture

While Large Language Models (LLMs) have revolutionized AI, they have a well-known Achilles’ heel: complex, multi-step reasoning. On tasks like solving Sudoku puzzles or the abstract patterns in ARC-AGI, their auto-regressive nature means a single wrong token can derail the entire solution. Methods like Chain-of-Thought (CoT) have helped, but they are often expensive and unreliable.

Recently, a novel method called the Hierarchical Reasoning Model (HRM) proposed an alternative: using two small, recurrent networks to iteratively refine a solution. It showed great promise but was complicated, relying on two separate models and some shaky theoretical assumptions.

This paper asks a powerful question: can we achieve the same or better results by radically simplifying this recursive idea? The answer, it turns out, is a resounding yes. The proposed Tiny Recursive Model (TRM) shows that for certain hard problems, a much smaller, simpler, and more elegant approach doesn’t just compete—it dominates.

The Core Idea: How It Works

1. The Problem They’re Solving

The previous model, HRM, was a step in the right direction but had several issues. It used two different networks (a “low-level” and “high-level” one), justified by complex biological arguments. It relied on a “1-step gradient approximation” based on the assumption that the network reached a fixed point, which wasn’t guaranteed. And its training was inefficient, requiring extra forward passes. TRM is designed to strip away all this complexity and get to the core of what makes recursive reasoning work.

2. The Key Innovation

The central idea of TRM is iterative refinement with a single, tiny network. Instead of separate networks for different “hierarchies” of reasoning, TRM uses just one small model. This model takes three things as input:

The initial question (x).
Its current best guess for the answer (y).
A latent “reasoning” state (z), which acts like a scratchpad or internal chain of thought.

The model then repeatedly loops, first thinking (updating z) and then acting (updating y), progressively improving its answer.

3. The Method, Step-by-Step

The TRM process, illustrated in Figure 1 of the paper, is a nested loop of refinement.

Initialization: The model starts with the embedded question x, an initial (often blank) answer y, and an initial reasoning state z.
Inner Loop: Recursive Reasoning: For a set number of steps (n, e.g., 6 times), the network updates its latent state z. It takes the question, the current answer, and the current reasoning state as input and produces a new, improved reasoning state. This is where the model “thinks” about the problem.
z_new = network(x, y_current, z_current)
Prediction Update: After the inner loop completes, the model makes a new prediction for the answer y based on the final reasoning state z and the previous answer.
y_new = network(y_current, z_final)
Outer Loop: Deep Supervision: This entire process (steps 2 and 3) is one full “reasoning cycle.” TRM repeats this cycle up to 16 times (N_sup). After each cycle, a loss is calculated between the predicted answer y and the true answer. Crucially, gradients are only backpropagated through the last cycle. The refined y and z are then detached from the computation graph and fed as the starting point for the next cycle. This technique, called deep supervision, allows the model to learn as if it were an extremely deep network without the prohibitive memory costs.

This recursive process allows the model to find and correct its own mistakes, progressively converging on the correct solution in a highly parameter-efficient way.

Key Experimental Results

TRM’s performance is remarkable, especially given its size (only 5-7M parameters for most experiments).

Sudoku-Extreme: TRM achieves an incredible 87.4% accuracy, crushing the previous HRM’s 55.0% and far surpassing LLMs, which score 0.0% on this difficult, small-data benchmark (Table 4).
Maze-Hard & ARC-AGI: Using a version with self-attention for larger grids, TRM consistently outperforms HRM and most LLMs. It scores 85.3% on Maze-Hard (vs. 74.5% for HRM) and achieves 44.6% on ARC-AGI-1 and 7.8% on ARC-AGI-2, significantly better than HRM’s 40.3% and 5.0% (Tables 4 & 5).
Less is More: The ablation studies in Table 1 are the heart of the paper. They show that a single network is better than two, 2 layers are better than 4, and backpropagating through the entire reasoning cycle is far superior to HRM’s 1-step approximation (improving Sudoku accuracy from 56.5% to 87.4%).

A Critical Look: Strengths & Limitations

Strengths / Contributions

Radical Simplicity and Efficiency: TRM replaces the complex, biologically-inspired design of HRM with a single, tiny network. This not only makes the model more elegant but also drastically reduces the parameter count (7M vs. 27M) while improving performance.
State-of-the-Art on Hard Reasoning: The paper demonstrates that for structured reasoning tasks with limited data, a focused, recursive approach can massively outperform the brute-force scale of LLMs. It’s a compelling proof-of-concept for an alternative AI paradigm.
Strong Empirical Grounding: Unlike its predecessor, TRM’s design choices are meticulously justified through comprehensive ablation studies. The paper systematically shows why each simplification (single network, 2 layers, full backpropagation) leads to better results.

Limitations / Open Questions

Task-Specific Architecture: The best-performing model for Sudoku (fixed grid size) used an MLP-based architecture, while the models for Maze and ARC (variable grid sizes) required self-attention. This suggests that the architecture is not one-size-fits-all and needs to be tailored to the problem.
Supervised and Deterministic Nature: TRM is trained via supervised learning to produce a single, deterministic output. This makes it unsuitable for problems with multiple correct answers or for tasks that require generative, diverse outputs.
Generalization to Broader Domains: TRM excels in constrained, puzzle-like environments. It remains an open question how this recursive reasoning approach would scale or apply to the open-ended, messy world of natural language understanding where LLMs thrive.

Contribution Level: Significant Improvement. This work takes a promising but complex idea (HRM) and refines it into a much simpler, more powerful, and better-justified model. It provides a strong and clear demonstration that for certain classes of problems, architectural ingenuity can be far more effective than just scaling up parameters.

Conclusion: Potential Impact

“Less is More” delivers a clear and impactful message: in the race to build more intelligent systems, the answer isn’t always a bigger model. The Tiny Recursive Model (TRM) provides a compelling blueprint for how small, efficient networks can achieve sophisticated reasoning through iterative self-correction.

This research should inspire developers working on specialized AI for science, logistics, and verification, where provably correct, step-by-step reasoning is paramount. The next frontier for this line of work could be exploring how to make these recursive models more generative or how to integrate their focused reasoning capabilities into the broader knowledge base of LLMs.