ResearchFebruary 15, 2026·5 min read

The MIT Research Behind Delibera's Architecture

Where multi-model deliberation came from, and why it took 10 months of research before we shipped anything.

The Research Question

Can you build an AI system whose output is more accurate than any of its constituent models — not by averaging, not by ensemble voting, but by adversarial deliberation?

The honest answer, before we started, was: we didn't know.

The Setup

Three independent frontier models, each given the same matter but a distinct analytical mandate. Structured rounds of cross-critique. Mandatory dissent. Gap-driven search for external verification.

We evaluated against the Phare Hallucination Benchmark and a proprietary adversarial misinformation suite. The results, after iteration, were consistent and surprising.

The Result

82.6% hallucination resistance on Phare (350 samples)
13.1% hallucination rate on adversarial misinformation (791 samples) vs.

22.3% for the next-best model — a 41% reduction

The only system tested that expressed uncertainty (26% of challenging cases)

The interesting finding wasn't just lower hallucination — it was that the improvement came from the *disagreement* between models, not from any single model getting smarter. The dissent produced the accuracy.

What it means for you

Two things:

When three independent models agree, the answer is very likely correct.
When they disagree, that disagreement is itself valuable — and it's usually

what you need to make a better decision.

Delibera preserves both: the consensus where it exists, and the disagreement where it doesn't.

Want to see this in practice?

Request a live briefing. We’ll run a real deliberation you can keep.

Request a Briefing