Kimi K2.5: Moonshot AI’s Frontier Multimodal Model, Now Live on LeemerChat

Context

262K tokens

Pretraining

~15T multimodal tokens

Agent Swarm

Up to 100 sub-agents

Tool Calls

Up to 1,500

Frontier status

Why Kimi K2.5 is a frontier model

Frontier models are defined by their ability to solve real-world tasks end-to-end: multimodal understanding, tool orchestration, and long-horizon reasoning. K2.5 meets that bar with top-tier benchmarks across agents, coding, and vision—while maintaining massive context and autonomous orchestration.

Native multimodal reasoning

Kimi K2.5 fuses vision and language natively, enabling visual coding, UI parsing, and multi-modal reasoning without separate adapters.

Long-horizon agentic depth

262K context plus self-directed agent swarms let it sustain complex plans and multi-step workflows across massive tool chains.

Frontier-grade coding

Top-tier performance on SWE-bench and multilingual coding benchmarks makes K2.5 a serious frontier contender for real software work.

Agent swarm

Self-directed agent swarms at scale

For complex tasks, Kimi K2.5 can self-direct an agent swarm with up to 100 sub-agents, executing parallel workflows across up to 1,500 tool calls. The swarm is automatically created and orchestrated by K2.5 without any predefined subagents or workflow.

Self-directed agent swarm

Kimi K2.5 can automatically spawn and orchestrate up to 100 sub-agents for complex tasks. No predefined subagents. No manual workflow design.

1,500 tool calls in parallel

The swarm can execute parallel workflows across up to 1,500 tool calls, compressing research, coding, and data synthesis into a single run.

Up to 4.5x faster execution

Compared with a single-agent setup, K2.5 reduces execution time by up to 4.5x by coordinating parallel sub-tasks.

Benchmark snapshot

Recreated benchmark highlights

The chart below recreates the published benchmark snapshot comparing Kimi K2.5 with GPT-5.2 (xhigh), Claude Opus 4.5, and Gemini 3 Pro.

Agents

Humanity's Last Exam (Full)

percentile (%)

Kimi K2.550.2

GPT-5.2 (xhigh)45.5

Claude Opus 4.543.2

Gemini 3 Pro45.8

Agents

BrowseComp

percentile (%)

Kimi K2.574.9

GPT-5.2 (xhigh)65.8

Claude Opus 4.557.8

Gemini 3 Pro59.2

Agents

DeepSearchQA

percentile (%)

Kimi K2.577.1

GPT-5.2 (xhigh)71.3

Claude Opus 4.576.1

Gemini 3 Pro63.2

Coding

SWE-bench Verified

percentile (%)

Kimi K2.576.8

GPT-5.2 (xhigh)80

Claude Opus 4.580.9

Gemini 3 Pro76.2

Coding

SWE-bench Multilingual

percentile (%)

Kimi K2.573

GPT-5.2 (xhigh)72

Claude Opus 4.577.5

Gemini 3 Pro65

Image

MMMU Pro

percentile (%)

Kimi K2.578.5

GPT-5.2 (xhigh)79.5

Claude Opus 4.574

Gemini 3 Pro81

Image

MathVision

percentile (%)

Kimi K2.584.2

GPT-5.2 (xhigh)83

Claude Opus 4.577.1

Gemini 3 Pro86.1

Image

OmniDocBench 1.5*

percentile (%)

Kimi K2.588.8

GPT-5.2 (xhigh)85.7

Claude Opus 4.587.7

Gemini 3 Pro88.5

Video

VideoMMMU

percentile (%)

Kimi K2.586.6

GPT-5.2 (xhigh)85.9

Claude Opus 4.584.4

Gemini 3 Pro87.6

Video

LongVideoBench

percentile (%)

Kimi K2.579.8

GPT-5.2 (xhigh)76.5

Claude Opus 4.567.2

Gemini 3 Pro77.7

* OmniDocBench score is computed as (1 − normalized Levenshtein distance) × 100, where a higher score denotes superior accuracy.

Ready to build with Kimi K2.5?

The most powerful multimodal Kimi model is now live — free to try on LeemerChat for every user.

Start chatting Explore more posts