Closed-Loop RAG: Architecting self-correction via real-time metric orchestration

Dec 2025

The "Fragility" Problem in Open-Loop RAG

We often notice in our projects that standard RAG systems break easily. The standard “Naive RAG” setup is an open-loop chain:

Input -> Retrieval -> Generation -> Output

The fragility stems from the fact that Retrieval and Generation are stochastic.

Retrieval is stochastic

Vector similarity
Approximate nearest neighbour search
Embedding noise
Randomness in ranking
Context window limits - small embedding differences can change the top-k documents

Generation is stochastic

Token sampling
Temperature
Model randomness
Prompt sensitivity - LLM can hallucinate or drift unpredictably

If the retriever returns weak or irrelevant chunks, the generator will likely produce a weak answer. Even when retrieval is effective, the LLM can still hallucinate or stray off-topic. In an open loop, these errors silently reach the user.

To take RAG from a demo to something an enterprise can trust, we can incorporate a basic idea from Control Theory - feedback.

A reliable RAG system can’t be a linear pipeline. It should be a process that monitors itself and fixes issues on the fly.

That’s where Agentic AI becomes a control layer - detecting deviations and correcting them immediately. This article breaks the Self-Healing RAG design into three phases:

Phase 1: Rigorous instrumentation

You can’t heal what you can’t see. For a RAG system to fix itself, we need to measure the health of the Retriever and the Generator separately. They fail for different reasons, so their metrics must differ accordingly.

1. Retriever metrics

These metrics tell us whether the LLM has any chance at all. If the context is noisy or incomplete, the rest of the system cannot compensate.

Context precision: What portion of the retrieved chunks were relevant to the query? Low precision means the LLM is forced to read noise.
Context recall: Did we retrieve enough information to answer the question? Critical for multi-hop or reasoning-heavy tasks.
Hit rate: How often the correct document appears in the top-k results.

But how do we measure precision without ground truth?

This is a practical problem because, at runtime, we don’t have a human-verified “correct” document to compare against. So, instead of classical IR precision, we use a real-time relevance check.

In simple terms, we let a small LLM act as a judge and ask it: “Is this chunk useful for answering the user's question?” If the judge says “yes,” we count it. If it says “no,” we treat it as noise.

This becomes our runtime precision:

Precision = Number of relevant chunks / Total chunks retrieved

This approach works surprisingly well in enterprise settings.

Some teams also combine this with lightweight similarity checks or re-rankers, but the basic idea remains the same - estimate precision through semantic relevance, not ground truth.

2. Generator metrics

Even with perfect retrieval, the model can drift, hallucinate, or misinterpret the user. So, we measure the generator independently.

Faithfulness (Grounded-ness): Are the claims in the answer supported by the retrieved context?
Answer relevance: Does the answer address the user’s question?

These are the same metrics used in frameworks like RAGAS and DeepEval and can be computed quickly using “LLM-as-a-Judge” techniques.

Phase 2: The diagnosis (Thresholding and logic)

Once the system collects these signals, a State Monitor checks whether retriever and generator metrics cross certain thresholds (e.g., precision > 0.7).

By observing which combination of metrics fails, the system can pinpoint the cause of the problem. This leads to a simple failure matrix –

Scenario	Retriever Score	Generator Score	Diagnosis
A	Low	N/A	The LLM never had the right context. Retriever failure.
B	High	Low (Faithfulness)	LLM hallucinated despite good context.
C	High	Low (Relevance)	LLM misunderstood intent or drifted.

Each diagnosis triggers a different healing process.

Phase 3: Agentic intervention (The self-heal)

This is where the system becomes truly autonomous. Instead of returning a bad answer or logging an error, it dispatches an agent to fix the underlying cause and try again before the user ever sees the failure.

Scenario A: healing the retriever (Search agent)

Trigger: Low precision or recall

Actions the agent can take:

Query transformation (HyDE): Create a hypothetical “ideal answer,” convert it to an embedding, and re-run retrieval.
Query expansion: Break the query into sub-questions to improve recall.
Metadata filtering: Restrict retrieval (e.g., by date, category) to reduce noise.

Scenario B: Healing the generator (Critic agent)

Trigger: Low faithfulness or relevance

Actions:

Self-correction loop: Ask the LLM to rewrite its answer using only the provided context, with citations.
Context pruning: Remove low-quality chunks and regenerate with a tighter context window.
Model switching: Route difficult queries to a more capable model temporarily.

Most systems today are reactive - they fix errors once they detect them. The future is proactive RAG, where the system improves itself.

Every time an agent self-heals, a high-quality dataset of “retrieval + generation failures” get updated -

Which agent intervened
What failure occurred
Which query patterns trigger the failure
Which documents or chunks were responsible
How the fix was applied

Which will lead to self-improvement actions like -

Updating the vector store when retrieval fails,
Refining the embedding model,
Adjusting prompts based on repeated patterns.

Self-Healing will fix errors in real time (short term). Self-Improvement will use those errors to make the entire RAG system more innovative and more reliable over time (long term).

Disclaimer

Fractal Analytics Limited (the “Company”) is proposing, subject to receipt of requisite approvals, market conditions and other considerations, to make an initial public offer of its equity shares and has filed a draft red herring prospectus (“DRHP”) with the Securities and Exchange Board of India (“SEBI”). The DRHP is available on the website of our Company at Fractal Analytics, the SEBI at www.sebi.gov.in as well as on the websites of the BRLMs, and the websites of the stock exchange(s) at ww.nseindia.com and www.bseindia.com, respectively. Any potential investor should note that investment in equity shares involves a high degree of risk and for details relating to such risk, see “Risk Factors” of the RHP, when available. Potential investors should not rely on the DRHP for any investment decision.

Disclaimer

Stay up to date with insights, news, and updates.

Subscribe for more content