← All posts

Introducing IRG: A Protocol for Persistent, Structured AI Reasoning

TL;DR

We’re releasing the Iterative Reasoning Graph specification — an open protocol for AI systems that reason in explicit, persistent, revisable structures rather than ephemeral token streams. The reasoning doesn’t disappear after the response. It lives in a graph you can inspect, replay, and audit. GitHub: arcus-labs/IRG-spec

The Problem

Modern AI systems reason in isolated episodes.

A model generates a response. The reasoning evaporates. You see the output but not the process. If something went wrong, you can’t point to where. If you want to improve it, you retrain or re-prompt and hope.

The techniques we’ve developed to address this — chain-of-thought, self-critique, tool orchestration — are workarounds, not architecture. They happen inside the prompt or get logged after the fact. They don’t persist. They don’t revise. They don’t compound.

What’s missing is a first-class representation of reasoning itself.

What IRG Is

An Iterative Reasoning Graph is an explicit, evolving graph of executable reasoning nodes.

Each node represents a reasoning operation: a draft, a critique, a fact-check, a revision, a risk assessment, a decision to abstain. Edges encode why nodes exist — dependency, refinement, invalidation, supersession.

This isn’t a log of what happened. It’s a structure that governs what happens next.

When an IRG node executes, it can:

The graph persists across time. Reasoning survives beyond a single response. Errors can be traced to specific nodes. Corrections are local — you fix the subgraph, not the whole system.

Why Graphs, Why Iteration

Graphs because reasoning isn’t linear. Real thinking branches, backtracks, and revises. A chain-of-thought is a transcript. A graph is a map.

Iteration because getting it right the first time is the exception. Good reasoning involves drafting, checking, critiquing, and refining. IRG makes iteration a first-class primitive, not something you simulate by prompting the model again.

A canonical IRG cycle:

  1. Clarify — surface missing assumptions
  2. Draft — generate an initial response
  3. Evaluate — fact-check, critique, assess coherence
  4. Predict — estimate downstream risks and failure modes
  5. Revise — apply targeted fixes
  6. Converge or iterate — stop when stable, abstain when stuck

Each step is a node. The whole process is inspectable.

What This Changes

Debugging

Today: “The model hallucinated.”
With IRG: “Node 7 (fact-check) passed incorrectly because retrieval returned outdated sources. Node 9 (revision) didn’t trigger because confidence stayed above threshold.”

You can see exactly where reasoning failed and why.

Correction

Today: Fix a problem by retraining, prompt engineering, or hoping the next run is better.
With IRG: Edit the affected node. Invalidate downstream nodes. Re-run the subgraph. The fix is local, immediate, and interpretable.

Safety

Today: “We have content filters.” No evidence of what actually happened at inference time.
With IRG: Every safety check is a node. You can prove it ran, show what it evaluated, explain why it passed or failed. Safety becomes auditable by architecture, not by assertion.

Long-horizon reasoning

Today: Context window is the ceiling. Reasoning can’t persist or compound across sessions.
With IRG: The graph survives. Pick up where you left off. Build on prior reasoning instead of starting fresh.

Minimal Compliance

The spec defines six requirements for IRG compliance:

  1. Persistent reasoning structure — reasoning survives beyond a single inference
  2. Executable nodes — reasoning steps are units that can run, revise, or invalidate
  3. Explicit relations — edges encode dependency, refinement, invalidation
  4. Iterative revision — critique and revision are first-class, not simulated
  5. Inspectability — the graph is exportable and auditable
  6. Termination semantics — convergence, abstention, and failure are explicit states

Systems that rely solely on linear chain-of-thought, stateless self-critique, or prompt-only orchestration don’t qualify — even if they exhibit iterative behavior. The structure has to be real.

What IRG Is Not

Not chain-of-thought. CoT is linear, ephemeral, and generated incidentally during inference. IRG is graph-structured, persistent, and explicitly authored.

Not a prompting technique. The IRG exists outside the prompt boundary. It governs model invocation, not the other way around.

Not a model architecture. IRG is model-agnostic. It can orchestrate transformers, diffusion models, symbolic systems, or external tools.

Not fine-tuning. IRG doesn’t modify weights. It encodes corrections into structure, not parameters.

Not just a log. Logs record what happened. IRG determines what happens next.

Relationship to EIE

Last week we released EIE — a protocol for measuring epistemic integrity in AI systems.

The relationship is simple:

IRG provides architectural affordances that may improve epistemic behavior — explicit verification nodes, revision under critique, abstention as a first-class outcome. But IRG doesn’t guarantee good epistemic behavior. Poorly designed nodes or biased evaluators can still produce bad outcomes.

EIE measures whether the system actually behaved well, regardless of architecture. You can run EIE on IRG systems and non-IRG systems alike. If IRG is doing its job, EIE scores should reflect it.

What’s Coming

The spec released today defines the protocol. Next comes the implementation layer.

Reason is a cognitive engineering language that compiles to IRG. Instead of hand-wiring graphs, you write structured reasoning strategies — “thoughts” — that declare what operations to perform, what to verify, when to revise, when to stop.

Reason compiles to IRG. IRG executes against models and tools. The trace is fully auditable.

We’ll have more to share soon. If you want early access, reach out.

Why Open

We’re releasing IRG under CC-BY-4.0 because reasoning infrastructure shouldn’t be proprietary.

If AI systems are going to be trusted in high-stakes contexts — medicine, law, finance, security — the way they reason needs to be inspectable, comparable, and governable. That requires shared protocols, not walled gardens.

We’ll compete on implementation. The spec is open.

Get Involved

The spec is v0.3. We’re looking for:

IRG treats reasoning as what it is: a structure that persists, evolves, and governs. Not a side effect of token generation. Not a log you review after the fact. A first-class artifact you can inspect, debug, and improve.

That’s the foundation. Now we build on it.