Model launchApril 24, 20269 min read
The Trillion-Parameter Open-Weight Wave Hits LeemerChat: DeepSeek V4, MiMo 2.5, Kimi K2.6, and a Limited Ling Promo
We refreshed our partner lineup around Chinese-led trillion-scale open models—DeepSeek V4 Pro and Flash, Xiaomi MiMo 2.5 Pro and 2.5, Moonshot Kimi K2.6, plus a time-bound free route for InclusionAI Ling 2.6 1T. Here is why we bundled them, how they differ, and how to benchmark them on real work instead of leaderboard screenshots alone.
Partner refresh
5 vendor families
Context ceiling
Up to 1M tokens
Access
Free daily tier
Ling promo ends
Apr 30, 2026
LeemerChat exists so you can run serious models on serious work without duct-taping a dozen dashboards together. This journal entry centers the trillion-parameter-class MiMo and Kimi lines—the open-weight stacks Xiaomi and Moonshot ship when they want agentic coding and multimodal depth at full scale—alongside DeepSeek V4, Tencent's efficiency-first Hy3 Preview MoE, and the time-boxed Ling promo. Everything below is in the same selector, with partner-tier access so you judge them on your prompts and repos, not only on leaderboard PDFs.
Prior vs current: SWE-Bench Verified by family
Nothing here is “legacy-only”: the Current rows are the April 2026-era releases you ship against today. The Prior rows are the last public step in the same vendor line so you can see generational lift (Hy2→Hy3, GLM-4.7→GLM-5, K2→K2.5). Claude Opus and GLM-4.7 are different companies—you compare Opus 4.5→4.6 inside Anthropic and 4.7→5 inside Z.AI; mixing them as “old vs new” would be cross-vendor, not generational. Figures are vendor-reported or widely cited snapshots (mixed eval settings); your failing tests still win the argument.
| Family | Generation | Model | SWE-bench Verified | Notes |
|---|---|---|---|---|
| Anthropic · Claude Opus | Prior | Claude Opus 4.5 | 81.0% | Nov 2025 snapshot |
| Current | Claude Opus 4.6 | 80.8% | Apr 2026 · ≈flat vs 4.5 (within run variance) | |
| Z.AI · GLM | Prior | GLM-4.7 | 73.8% | Pre–GLM-5 coding / agent stack |
| Current | GLM-5 | 77.8% | ~+4.0 pp vs GLM-4.7 in vendor cohort | |
| Moonshot · Kimi | Prior | Kimi K2 | 66.0% | Earlier K2 instruct line |
| Current | Kimi K2.5 | 76.8% | ~+10.8 pp vs K2 (trillion-class instruct) | |
| Tencent · Hunyuan Hy | Prior | Hy2 | 53.0% | Prior Hunyuan generation |
| Current | Hy3 Preview | 74.4% | ~+21.4 pp vs Hy2 (295B MoE instruct) |
Terminal & coding (vendor-reported)
- Hy3 Preview — Terminal-Bench 2.0 ~54.4% (instruct); Hy2 → Hy3 jump ~23.2% → ~54.4% on the same harness.
- Hy3 Preview — LiveCodeBench v6 ~34.9% (pre-train / code evals, per Tencent model card).
- MiMo, Kimi, DeepSeek, and GLM trade leads on GDPVal-style agent suites and terminal workloads in public aggregator charts—evidence that the Chinese open-weight cohort is tier-one on real tasks, not only on static exams.


Xiaomi MiMo 2.5 Pro and MiMo 2.5
The trillion-scale MiMo line: flagship agent runs, native omnimodal perception, and 1M-token context for repo-scale work.
xiaomi/mimo-v2.5-proxiaomi/mimo-v2.5
- Scale class
- ~1T MiMo flagship
- Pro focus
- SWE + long agents
- Context
- 1M tokens

MiMo 2.5 Pro is the stack Xiaomi positions for general agentic capability, complex software engineering, and workloads that stretch into thousands of tool calls. MiMo 2.5 keeps native omnimodal perception with near-Pro agent behavior at roughly half the inference cost—if you are standardizing on a single trillion-parameter Xiaomi brain, start here.
Moonshot Kimi K2.6 (and K2.5)
Moonshot’s trillion-class multimodal flagship for long-horizon coding, UI synthesis from prompts, and swarm-scale orchestration.
moonshotai/kimi-k2.6:nitromoonshotai/kimi-k2.5:nitro
- Scale class
- ~1T K2.6 path
- K2.6 context
- 256k tokens
- K2.5 context
- 262k tokens
Kimi K2.6 is the forward Moonshot release when you want the latest multimodal agent tooling—long coding marathons, UI generation from mixed text and visuals, and orchestration across many sub-agents. Kimi K2.5 stays in the roster for teams locked to that revision; both sit in the same trillion-parameter strategic lane as MiMo, not as mid-size helpers.
DeepSeek V4 Pro and V4 Flash
MoE depth and MoE speed share one architecture language: hybrid attention, 1M context, configurable reasoning modes.
deepseek/deepseek-v4-prodeepseek/deepseek-v4-flash
- Pro (total / active)
- ~1.6T / 49B
- Flash (total / active)
- ~284B / 13B
- Context
- 1M tokens

V4 Pro trades more compute per token for depth: hybrid attention keeps long documents tractable, and reasoning modes let you bias toward fast answers or slower, more thorough passes. V4 Flash is tuned for throughput-heavy chat, IDE copilots, and batched agent loops where cost and responsiveness matter as much as peak benchmark scores.
Tencent Hy3 Preview
295B MoE (~21B active), 256K context, reasoning modes off / low / high—built for agentic coding and production tool loops.
tencent/hy3-preview:free
- Architecture
- 295B MoE / 21B active
- Reasoning
- Disabled · Low · High
- Context
- 256k tokens
Hy3 Preview is Tencent's efficiency play: a Hunyuan 3 base model that chases reliable multi-step workflows instead of raw parameter bragging rights. Dial reasoning down when you need snappy IDE assistance, or crank it up when the task is a gnarly repo-wide fix—it is the partner pick when you want a Chinese-lab MoE tuned for SWE-bench-class evidence without standing up your own cluster.
InclusionAI Ling 2.6 1T
Limited partner promo: instant instruct flagship routed on OpenRouter, free daily access for a fixed window.
inclusionai/ling-2.6-1t:free
- Positioning
- Fast instruct 1T-class
- Promo ends
- Apr 30, 2026
- UI mark
- OpenRouter icon
Ling 2.6 1T targets roughly a quarter of the cost of comparable tiers while still competing on math, coding, and SWE-bench-style evaluations. In-product we surface the OpenRouter mark beside Ling so it is easy to spot next to permanent partner entries. After the window closes, the route may rotate or leave the free tier, treat it as a scheduled flight, not a permanent default.
Why this wave matters
For most developers, the important story is competition: when Xiaomi and Moonshot keep pushing trillion-parameter MiMo and Kimi revisions, Tencent answers with MoE efficiency (Hy3), and DeepSeek widens the 1M-context MoE stack, closed APIs have to run faster to earn their margin. LeemerChat is widening the bench, not picking a single winner—partner ordering elevates DeepSeek, Xiaomi, Moonshot, Tencent, and the Ling promo so coding-heavy routes surface earlier alongside Western flagships.
Benchmarks help orient you, but your repository wins the argument. Run the same refactor, failing test, and product brief through V4 Pro, MiMo 2.5 Pro, Kimi K2.6, Hy3 Preview, and Ling while the promo is live. Track latency, edit distance, and how often you reach for a second model to clean up the first pass; that is the only scoreboard that matters for shipping.