Back to the journal

Model launchApril 24, 20269 min read

The Trillion-Parameter Open-Weight Wave Hits LeemerChat: DeepSeek V4, MiMo 2.5, Kimi K2.6, and a Limited Ling Promo

We refreshed our partner lineup around Chinese-led trillion-scale open models—DeepSeek V4 Pro and Flash, Xiaomi MiMo 2.5 Pro and 2.5, Moonshot Kimi K2.6, plus a time-bound free route for InclusionAI Ling 2.6 1T. Here is why we bundled them, how they differ, and how to benchmark them on real work instead of leaderboard screenshots alone.

Partner refresh

5 vendor families

Context ceiling

Up to 1M tokens

Access

Free daily tier

Ling promo ends

Apr 30, 2026

LeemerChat exists so you can run serious models on serious work without duct-taping a dozen dashboards together. This journal entry centers the trillion-parameter-class MiMo and Kimi lines—the open-weight stacks Xiaomi and Moonshot ship when they want agentic coding and multimodal depth at full scale—alongside DeepSeek V4, Tencent's efficiency-first Hy3 Preview MoE, and the time-boxed Ling promo. Everything below is in the same selector, with partner-tier access so you judge them on your prompts and repos, not only on leaderboard PDFs.

Prior vs current: SWE-Bench Verified by family

Nothing here is “legacy-only”: the Current rows are the April 2026-era releases you ship against today. The Prior rows are the last public step in the same vendor line so you can see generational lift (Hy2→Hy3, GLM-4.7→GLM-5, K2→K2.5). Claude Opus and GLM-4.7 are different companies—you compare Opus 4.5→4.6 inside Anthropic and 4.7→5 inside Z.AI; mixing them as “old vs new” would be cross-vendor, not generational. Figures are vendor-reported or widely cited snapshots (mixed eval settings); your failing tests still win the argument.

FamilyGenerationModelSWE-bench VerifiedNotes
Anthropic · Claude OpusPriorClaude Opus 4.581.0%Nov 2025 snapshot
CurrentClaude Opus 4.680.8%Apr 2026 · ≈flat vs 4.5 (within run variance)
Z.AI · GLMPriorGLM-4.773.8%Pre–GLM-5 coding / agent stack
CurrentGLM-577.8%~+4.0 pp vs GLM-4.7 in vendor cohort
Moonshot · KimiPriorKimi K266.0%Earlier K2 instruct line
CurrentKimi K2.576.8%~+10.8 pp vs K2 (trillion-class instruct)
Tencent · Hunyuan HyPriorHy253.0%Prior Hunyuan generation
CurrentHy3 Preview74.4%~+21.4 pp vs Hy2 (295B MoE instruct)

Terminal & coding (vendor-reported)

  • Hy3 Preview — Terminal-Bench 2.0 ~54.4% (instruct); Hy2 → Hy3 jump ~23.2% → ~54.4% on the same harness.
  • Hy3 Preview — LiveCodeBench v6 ~34.9% (pre-train / code evals, per Tencent model card).
  • MiMo, Kimi, DeepSeek, and GLM trade leads on GDPVal-style agent suites and terminal workloads in public aggregator charts—evidence that the Chinese open-weight cohort is tier-one on real tasks, not only on static exams.
Line chart of SWE-Bench Verified scores from late 2025 through April 2026 for Hy3 Preview, Kimi, GLM, and Claude
Composite SWE-Bench Verified trend chart (Hy3, Kimi, GLM, Claude). Use it to see how fast the gap closed between trillion-scale Chinese partners and Western flagships across one winter.
Xiaomi

Xiaomi MiMo 2.5 Pro and MiMo 2.5

The trillion-scale MiMo line: flagship agent runs, native omnimodal perception, and 1M-token context for repo-scale work.

  • xiaomi/mimo-v2.5-pro
  • xiaomi/mimo-v2.5
Scale class
~1T MiMo flagship
Pro focus
SWE + long agents
Context
1M tokens
Artificial Analysis charts ranking GLM, Kimi, MiMo, DeepSeek, and Ling across agentic and terminal benchmarks
Aggregates like Artificial Analysis show MiMo and Kimi sparring with GLM and DeepSeek across GDPVal-style agent suites—useful proof that trillion-class Chinese partners are mainline options, not experiments.

MiMo 2.5 Pro is the stack Xiaomi positions for general agentic capability, complex software engineering, and workloads that stretch into thousands of tool calls. MiMo 2.5 keeps native omnimodal perception with near-Pro agent behavior at roughly half the inference cost—if you are standardizing on a single trillion-parameter Xiaomi brain, start here.

MoonshotAIKimi

Moonshot Kimi K2.6 (and K2.5)

Moonshot’s trillion-class multimodal flagship for long-horizon coding, UI synthesis from prompts, and swarm-scale orchestration.

  • moonshotai/kimi-k2.6:nitro
  • moonshotai/kimi-k2.5:nitro
Scale class
~1T K2.6 path
K2.6 context
256k tokens
K2.5 context
262k tokens

Kimi K2.6 is the forward Moonshot release when you want the latest multimodal agent tooling—long coding marathons, UI generation from mixed text and visuals, and orchestration across many sub-agents. Kimi K2.5 stays in the roster for teams locked to that revision; both sit in the same trillion-parameter strategic lane as MiMo, not as mid-size helpers.

DeepSeek

DeepSeek V4 Pro and V4 Flash

MoE depth and MoE speed share one architecture language: hybrid attention, 1M context, configurable reasoning modes.

  • deepseek/deepseek-v4-pro
  • deepseek/deepseek-v4-flash
Pro (total / active)
~1.6T / 49B
Flash (total / active)
~284B / 13B
Context
1M tokens
Benchmark chart comparing DeepSeek V4 Pro Max with Claude, GPT, and Gemini on knowledge, reasoning, and agentic tasks
Independent evaluation snapshots place DeepSeek V4-class models at the top of coding and agent leaderboards alongside other frontier stacks. Use them as orientation, then validate on your own codebase.

V4 Pro trades more compute per token for depth: hybrid attention keeps long documents tractable, and reasoning modes let you bias toward fast answers or slower, more thorough passes. V4 Flash is tuned for throughput-heavy chat, IDE copilots, and batched agent loops where cost and responsiveness matter as much as peak benchmark scores.

Tencent

Tencent Hy3 Preview

295B MoE (~21B active), 256K context, reasoning modes off / low / high—built for agentic coding and production tool loops.

  • tencent/hy3-preview:free
Architecture
295B MoE / 21B active
Reasoning
Disabled · Low · High
Context
256k tokens

Hy3 Preview is Tencent's efficiency play: a Hunyuan 3 base model that chases reliable multi-step workflows instead of raw parameter bragging rights. Dial reasoning down when you need snappy IDE assistance, or crank it up when the task is a gnarly repo-wide fix—it is the partner pick when you want a Chinese-lab MoE tuned for SWE-bench-class evidence without standing up your own cluster.

OpenRouter

InclusionAI Ling 2.6 1T

Limited partner promo: instant instruct flagship routed on OpenRouter, free daily access for a fixed window.

  • inclusionai/ling-2.6-1t:free
Positioning
Fast instruct 1T-class
Promo ends
Apr 30, 2026
UI mark
OpenRouter icon

Ling 2.6 1T targets roughly a quarter of the cost of comparable tiers while still competing on math, coding, and SWE-bench-style evaluations. In-product we surface the OpenRouter mark beside Ling so it is easy to spot next to permanent partner entries. After the window closes, the route may rotate or leave the free tier, treat it as a scheduled flight, not a permanent default.

Why this wave matters

For most developers, the important story is competition: when Xiaomi and Moonshot keep pushing trillion-parameter MiMo and Kimi revisions, Tencent answers with MoE efficiency (Hy3), and DeepSeek widens the 1M-context MoE stack, closed APIs have to run faster to earn their margin. LeemerChat is widening the bench, not picking a single winner—partner ordering elevates DeepSeek, Xiaomi, Moonshot, Tencent, and the Ling promo so coding-heavy routes surface earlier alongside Western flagships.

Benchmarks help orient you, but your repository wins the argument. Run the same refactor, failing test, and product brief through V4 Pro, MiMo 2.5 Pro, Kimi K2.6, Hy3 Preview, and Ling while the promo is live. Track latency, edit distance, and how often you reach for a second model to clean up the first pass; that is the only scoreboard that matters for shipping.

Open LeemerChat

Related Posts

May 6, 2026

Grok 4.3, Qwen 3.6, and a Cleaner Frontier Lineup for LeemerChat

We replaced every active Grok route with Grok 4.3, promoted Qwen3.6 Max Preview into the premium slot, added Qwen3.6 35B A3B, and cleaned out older Qwen/Grok baggage. Here's why.

Read more
April 9, 2026

GLM-5.1 Is Now Free on LeemerChat — The Model That Works for 8+ Hours Without Stopping

Z.AI's GLM-5.1 is the first model built for long-horizon autonomous coding — running independently for 8+ hours, planning, executing, and self-improving without human input. It beats or matches GPT-5.4 and Claude Opus 4.6 on several benchmarks. Here's why we made it free, how it compares, and why Pro still matters.

Read more
May 9, 2026

LeemerChat GA1: General Availability After Years of Iteration

LeemerChat GA1 is now generally available, shaped by years of development and billions of tokens across chat, research, podcast, Analyst, LeemerStudio, and new model lanes.

Read more
May 7, 2026

Introducing LeemerH2: The Model Council

LeemerH2 is the successor to Leemer Heavy, built as a council of models for stronger software engineering, research, verification, and 128K-token synthesis.

Read more

Try These Features

Explore more:All PostsReleasesModelsBenchmarksEngineeringInsightsAll FeaturesAbout UsTermsPrivacy