When people talk about progress in AI, they usually talk about bigger models. More parameters. More data. More compute.
That path has produced incredible results. But it also hides a deeper question we kept running into while building and using AI systems every day: What does it actually mean for an AI system to understand something well enough to be trusted?
Query Classification
Spawning Agents
Research Agent
Analyst Agent
Validator Agent
Divergent Thinker
Building Consensus
Final Synthesis
Intelligence Is Not Singular
One of the things that became clear very early is that no single model, no matter how strong, is consistently right in all situations. Even the best systems disagree with themselves when asked the same question in different ways. They miss edge cases. They sound confident when they should not.
Humans solved this problem a long time ago. Not by becoming perfect individuals, but by thinking together.
Committees. Peer review. Code reviews. Research groups. Investment committees. The most reliable decisions are almost never made by one voice alone.
That insight shaped KingLeemer from the start.
The Original Idea
The earliest version of KingLeemer was not ambitious. It was not autonomous. It did not use tools or browse the web.
It simply asked: What if multiple strong models answered the same question independently, and we forced them to disagree before agreeing?
Systems like Grok-4-Heavy reinforced that this direction mattered: intelligence could be collective, not singular.
But we did not want chaos. We did not want voting. We did not want averages.
We wanted structure.
So we built KingLeemer V1.
Orchestrator
Consensus Engine
Final Synthesis
User Query
Orchestrator
Query Classifier
Agent Pool
4 Parallel Agents
Consensus
Clustering + Judge
Synthesis
Final Answer
Why We Built V1 First
V1 is deliberately restrained. It does not act. It does not browse endlessly. It does not spawn uncontrolled agents.
It does one thing well: reasoning discipline.
Multiple models respond independently. Their claims are extracted, grouped, challenged, and merged. When there is disagreement, it is surfaced and resolved, not hidden.
V1 exists because we believe something fundamental:
If an AI system cannot reason reliably without tools, giving it tools will not make it safer. It will just make it louder.
Before autonomy, you need epistemic grounding. Before action, you need judgment.
V1 gave us that foundation. It also gave us something else: speed.
Most questions do not need autonomy. They need clarity. V1 remains fast, economical, and strong. It still handles the majority of real-world queries better than a single model ever could.
Extract Claims
Parse each agent response into atomic factual claims
Cluster by Similarity
Group claims using embeddings (cosine similarity > 0.85)
Judge Conflicts
LLM judge resolves contested clusters only
Synthesize
Merge agreements + resolutions into final answer
| Claim | GPT-5.2 | Claude | Gemini | Grok | Status |
|---|---|---|---|---|---|
| Use React.memo for expensive components | unanimous | ||||
| Always use useCallback for event handlers | contested | ||||
| Virtual DOM diffing is the main bottleneck | unanimous | ||||
| Code splitting improves initial load time | unanimous | ||||
| useMemo should wrap all computed values | strong | ||||
| Server components reduce client bundle size | strong |
Why We Did Not Skip to Autonomy
At some point, though, another limitation became impossible to ignore.
There are problems that cannot be solved by thinking alone.
- Current information
- Verification
- Calculations
- Real-world data
- Multi-step tasks
Pretending a reasoning-only system can handle those well is dishonest.
So we built KingLeemer V2. But only after V1 worked.
V2 Is Not "V1, But Bigger"
KingLeemer V2 is not just a more powerful version of V1. It is a different kind of system.
- V2 plans
- It delegates
- It checks its work
- It retries when something fails
- It knows when to stop
Instead of a council answering a question, V2 behaves more like a manager running a project. Specialists are created only when needed. Tools are used deliberately. Every step is logged.
Nothing happens in the dark.
After nearly 40,000 lines of code, this distinction became unavoidable: autonomy is not about cleverness. It is about process.
And process requires governance.
Planner Agent
Decision Engine
Reflection Loop
Final Synthesis
User Query
Planner Agent
Creates Plan
Agent Pool
Tool Loops
Decision Engine
Debate + Vote
Reflection
Critic → Refiner
Synthesis
With Citations
Observe
Query + tools + memory
Think
Reason + plan + decide
Act
Tool call + wait
Result
Output + error + facts
Critic
"What's wrong?"
Refiner
"Fix the issues"
Verifier
"Is it correct now?"
| Aspect | V1 (Council) | V2 (Autonomous) |
|---|---|---|
| Execution | Single-turn, parallel | Multi-step tool loops |
| Planning | Static classification | Dynamic, adaptive planning |
| Agents | Fixed pool | Dynamic spawning |
| Tools | None | Web, code, browser, MCP |
| Memory | None | Working + episodic + semantic |
| Decisions | Clustering + judge | Debate + voting + consensus |
| Search | Single path | Tree search (LATS) |
| Durability | Request-scoped | Durable jobs |
| Learning | None | Reflection, self-improvement |
Why Both Versions Exist, On Purpose
We did not build V2 to replace V1. We built V2 because not every problem needs autonomy, and not every question deserves it.
V1 is still the right tool when:
- Speed matters
- The task is conceptual
- The answer lives in reasoning rather than execution
V2 is for when the system needs to do something, not just think about it.
Trying to collapse both into one system would make each worse. So KingLeemer has two modes, not because we could not choose, but because intelligence itself is not one-dimensional.
GPT-5.2
OpenAI
Claude Sonnet 4.5
Anthropic
Gemini 3 Flash
Grok-4.1-Fast
xAI
GLM-4.7
Zhipu AI
Kimi K2.5
Moonshot AI
Sonar Pro
Perplexity
DeepSeek R1
DeepSeek
Discipline Over Hype
One thing we have been careful about is resisting the urge to call KingLeemer "agentic" just because it can be.
Autonomy without boundaries is not impressive. It is reckless.
That is why V2 is built with:
- Strict budgets
- Explicit permissions
- Limited tool access
- A complete audit trail of every decision
If KingLeemer does something, it can explain why. If it uses a source, you can see it. If it reaches a conclusion, you can trace how.
That matters more than raw power.
Budget Controls
- Max cost per run (EUR)
- Max time per run (ms)
- Max agents per run
- Max steps per agent
Safety Measures
- Domain allowlists for browser
- Tool output sanitization
- Prompt injection defense
- Complete audit trail
$0.1732
Total Cost
19.0k
Input Tokens
10.5k
Output Tokens
Vercel AI SDK
Streaming, tool calling, structured outputs
OpenRouter
Unified API for 200+ models
Prisma
Type-safe database ORM
pgvector
Vector embeddings for semantic memory
QStash
Durable background job execution
Playwright MCP
Browser automation for web agents
Firecrawl
Deep web extraction
Next.js 15
React framework with App Router
How the Code Works
KingLeemer is built on a modular architecture that separates concerns cleanly:
V1 Core Components
ParallelAgentExecutor- Runs multiple agents concurrently with Promise.allSettled()ConsensusEngine- Extracts claims, clusters by embedding similarity, judges conflictsSynthesisEngine- Streams the final merged answer to the userBudgetController- Enforces cost and time limits
V2 Core Components
PlannerAgent- Analyzes queries, creates structured plans, spawns specialistsRunStateManager- Persists state to Prisma, enables resumabilityAgentSpawner- Dynamic agent creation with depth and budget limitsToolLoop- observe → think → act → repeat cycleDebateEngine- Multi-round argumentation for conflict resolutionReflectionLoop- Critic → Refiner → Verifier self-improvementTreeSearch (LATS)- Multi-path exploration with UCT selection
Data Models
KingLeemerRun- Durable run container with plan, budget, working memoryKingLeemerEvent- Append-only event log for auditability
Main Orchestrator
The KingLeemerOrchestrator class manages the entire run lifecycle, from planning to synthesis.
export class KingLeemerOrchestrator {
private config: OrchestratorConfig;
private runState: RunStateManager | null = null;
private planner: PlannerAgent | null = null;
private spawner: AgentSpawner | null = null;
async execute(runId: string): Promise<KingLeemerRunV2> {
// Phase 1: Create initial plan
const plan = await this.planner.createPlan(query);
// Phase 2: Execute plan with tool loops
const results = await this.executePlan(plan);
// Phase 3: Reflect and update if needed
await this.reflectAndUpdate(plan, results);
// Phase 4: Build consensus and synthesize
const finalAnswer = await this.synthesize(results);
return finalState;
}
}What This Is Really About
KingLeemer is not about winning benchmarks. It is not about chasing the next model release.
It is about building AI systems that:
- Know when to be confident
- Know when to hesitate
- Know when to verify
- Know when to stop
V1 taught us how to think together. V2 teaches us how to act responsibly.
Both are necessary.
Looking Forward
If there is one belief behind KingLeemer, it is this:
The future of AI is not a single super-intelligence. It is systems that coordinate intelligence well.
KingLeemer is our attempt to build that future carefully, one layer at a time.
KingLeemer began as an attempt to answer one question honestly: What does it actually mean for an AI system to understand something well enough to be trusted? The answer, it turns out, is not bigger models. It is better coordination. It is structured disagreement. It is knowing when to stop.
— Repath Khan
Founder, LeemerChat