Back to BlogModel Refresh · April 4, 20269 min read
Frontier Model Refresh

We just moved a chunk of the frontier into the free tier.

This launch is about more than adding model IDs. We re-cut the lineup around what users actually do — long-context coding, screenshot-heavy debugging, multimodal planning, and agent loops that keep running after the first response. Three frontier models go free. Three premium slots get sharper. The old overlap gets cleaned out.

LeemerChat Team

leemerchat.com

Free frontier
3

MiMo V2 Pro, GLM-5V Turbo, and Gemma 4 31B IT now ship as free partner models.

Premium refresh
3

GPT-5.4, GLM-5, and MiniMax M2.7 now anchor the paid frontier tier.

Biggest window
1M

GPT-5.4 and MiMo V2 Pro push the active catalog into million-token territory.

Why we did this

Frontier is only useful if you can actually use it.

We've always believed the gap between "what free users get" and "what the best AI looks like" is a gap worth closing. This refresh is the biggest single step we've taken toward that.

Frontier access shouldn't gate-keep building

We built LeemerChat so builders, students, and researchers in Ireland and beyond could work with the best models — not just the ones they can afford. Moving MiMo V2 Pro, GLM-5V Turbo, and Gemma 4 31B IT to the free tier is the most direct way we can act on that.

Partner economics changed what's possible

Our partner model program means we can absorb inference costs on select models without passing them to free users. These three models have the capability and cost profile that makes that math work — which is why they're going free instead of staying behind a paywall.

Agentic work needs large contexts to be real

Short context windows force users into artificial workarounds. Giving free users access to a 1M-token agent model (MiMo V2 Pro) and 262K+ multimodal options means they can run real workflows — not toy demos.

Free Partner Models

Xiaomi MiMo-V2-Pro

Free Partner
xiaomi/mimo-v2-pro
1M context

Xiaomi's global top-tier agent model. MiMo-V2-Pro is the result of Xiaomi's full-stack AI research — trained on the principle that intelligence is about prediction and compression. With a 1M-token window, it is purpose-built for long-horizon agent loops: repository-spanning coding sessions, multi-document planning, and autonomous workflows that sustain context across hundreds of tool calls.

Why it matters for free users

Free users now have access to the same million-token window that was previously exclusive to premium tiers. This is the biggest context jump in LeemerChat's free catalog history.

1M-token context for true long-horizon work
Optimized for autonomous agent loops
Top-tier coding and planning on par with flagship models
Multimodal omni variant available (MiMo-V2-Omni)

Z.AI GLM-5V-Turbo

Free Partner
z-ai/glm-5v-turbo
202K context

Z.AI's first native multimodal agent foundation model. GLM-5V-Turbo fuses vision and language at the architecture level — not as an adapter bolt-on. It handles image, video, and text inputs in a single model, making it the right tool when your workflow starts from a screenshot, a UI bug, a diagram, or a scanned document rather than a clean text prompt.

Why it matters for free users

Most multimodal models treat vision as secondary. GLM-5V-Turbo is designed around the perceive → plan → execute loop, which means it doesn't just describe images — it acts on them.

Native multimodal: image, video, and text in one model
202K context for long-form visual workflows
Built for vision-grounded coding and debugging
Seamless agent integration with full tool loop support

Google Gemma 4 31B IT

Free Partner
google/gemma-4-31b-it
256K context

Google DeepMind's dense 31B flagship in the Gemma 4 family. Unlike the MoE variants in the same family, the 31B Dense model runs all 30.7B parameters on every token — providing deep, coherent reasoning for complex problems. With a 256K context window, it handles full codebases, long research documents, and multilingual work across 140+ languages.

Why it matters for free users

Gemma 4 31B scores 89.2% on AIME 2026 (no tools), 84.3% on GPQA Diamond, and 80% on LiveCodeBench v6 — placing it firmly at the frontier for open-weight models. This is the pragmatic benchmark-grounded choice for structured coding and reasoning work.

89.2% AIME 2026 (no tools) — top open-weight math score
84.3% GPQA Diamond — deep scientific reasoning
Native function calling and agentic workflow support
Built-in thinking mode for step-by-step reasoning
256K context with variable image resolution support
Premium Frontier Models

OpenAI GPT-5.4

Premium
openai/gpt-5.4
1M context

The new flagship premium OpenAI slot in LeemerChat. OpenAI positions GPT-5.4 as their primary recommendation for complex reasoning, advanced coding, and multi-step problem solving. The 1M-token context window fundamentally changes what counts as a single-session problem — full codebases, extended research arcs, and enterprise workflows all fit inside one context.

Why we made this the recommendation

GPT-5.4 replaces the older GPT-5 chat variants that fragmented the OpenAI experience. One model, one recommendation, clear upgrade path.

1M-token context — full codebase in a single session
OpenAI flagship for complex reasoning and coding
Multi-step agentic planning and execution
Strongest general-purpose premium pick

Z.AI GLM-5

Premium
z-ai/glm-5
80K context

Z.AI's frontier open-source language model, built for complex engineering and long-horizon agentic systems. GLM-5 scores 50.4% on Humanity's Last Exam with tools — outperforming GPT-5.4 (45.5%) and Claude Opus 4.5 (43.4%) on that benchmark. It brings serious RL-infrastructure improvements over GLM-4.7, with major gains in coding reliability and synthesis-heavy work.

Why we made this the recommendation

GLM-5 directly replaces the GLM-4.7 line. For users doing complex engineering, agent planning, or multi-model synthesis work, this is a material capability upgrade — not just a version bump.

50.4% Humanity's Last Exam (with tools) — beats GPT and Claude on this benchmark
Frontier open-source scale with commercial viability
Strong coding reliability improvements over GLM-4.7
Built for agent planning and complex synthesis work

MiniMax M2.7

Premium
minimax/minimax-m2.7
200K context

MiniMax's next-generation productivity model, designed for autonomous real-world workflows and continuous self-improvement through multi-agent collaboration. M2.7 scores 56.2% on SWE-Pro, 57.0% on Terminal Bench 2, and achieves 1495 ELO on GDPval-AA — setting a new standard for multi-agent systems. It consolidates the M2.1/M2.5 split into one clear recommendation.

Why we made this the recommendation

Where M2.5 was strong on office-style document work, M2.7 extends that into live debugging, root cause analysis, and financial modeling workflows. The single-model consolidation removes the M2.1 vs. M2.5 confusion entirely.

56.2% SWE-Pro — strong production engineering performance
1495 ELO on GDPval-AA for multi-agent systems
Terminal Bench 2: 57% — genuine CLI/ops-grade execution
Live debugging, financial modeling, full document generation
200K context across the full workflow
Benchmark Snapshot

These aren't marketing claims. The numbers hold up.

Selected benchmarks for the new free and premium models. Where numbers aren't available from official model cards, we leave the cell blank rather than interpolate.

BenchmarkCategoryGemma 4 31BGLM-5MiniMax M2.7
AIME 2026 (no tools)
Gemma 4 31B and GLM-5 both clear SOTA thresholds
Math89.2%92.7%
GPQA DiamondScience84.3%
LiveCodeBench v6Coding80.0%
SWE-ProEngineering56.2%
HLE with tools
GLM-5 beats GPT-5.4 (45.5%) and Claude (43.4%) here
Research26.5%50.4%
Terminal Bench 2Ops/CLI57.0%
MMLU ProKnowledge85.2%

Benchmarks sourced from official model cards. "—" indicates score not available from public model card at time of writing. Results marked with notes indicate peer-model comparisons from the same source.

What we cleaned up

Fewer tiers. Less selector noise. Clearer picks.

Adding new models without removing old ones creates confusion. We retired overlapping entries and collapsed model families to ensure every slot in the catalog has a clear reason to exist.

Retired the GPT-5 chat variants

We removed the older GPT-5 chat-focused variants from active surfaces. GPT-5.4 is a strict upgrade — better reasoning, 1M context, and OpenAI's own recommendation for complex coding and planning work. One model, clearer pick.

GLM-4.7 → GLM-5 and GLM-5V Turbo

The GLM-4.7 line is retired. GLM-5 (premium) and GLM-5V Turbo (free) replace it entirely. Z.AI's benchmark improvements are real: GLM-5 clears 50% on Humanity's Last Exam with tools, beating Claude Opus 4.5 on that benchmark. The multimodal Turbo variant goes free.

MiniMax M2.1 + M2.5 → M2.7

We had two MiniMax slots creating unnecessary choice confusion. M2.7 consolidates them. It scores 56.2% on SWE-Pro, 57.0% on Terminal Bench 2, and 1495 ELO on GDPval-AA. One premium MiniMax recommendation, not two.

What frontier means now

Frontier is about operating range, not benchmark bragging.

Agentic work is the organizing principle

These models are here because they're good at execution loops: repo reading, long planning arcs, multimodal debugging, and document-grounded reasoning. That's what real users actually run.

Frontier no longer means premium-only

The biggest change is economic, not cosmetic. Free users now get serious frontier-grade options — not fallback models that only make sense for lightweight chat.

Context size is infrastructure

202K, 256K, and 1M contexts aren't features — they're the difference between fitting a real project in one session or not. Every model in this refresh qualifies on context range.

Launch Offer

Claim your first month for $1

We start from a base price of $1 and localize the display with a live USD conversion. If you are in Ireland, you will see euro. If you are in India, you will see rupees.

Localized for your regionBase USD fallback
Start with $1

Exchange rate data via ExchangeRate-API. Offer display adapts at runtime based on detected location.

Source notes

This post is based on provider docs, official model cards, and public model pages. Benchmark figures are sourced directly from the relevant model card or provider documentation. Where a provider makes high-level positioning claims without a public benchmark sheet, we phrase those cautiously or omit them. Cells marked "—" indicate data not publicly available at time of writing.

Related Posts

April 9, 2026

GLM-5.1 Is Now Free on LeemerChat — The Model That Works for 8+ Hours Without Stopping

Z.AI's GLM-5.1 is the first model built for long-horizon autonomous coding — running independently for 8+ hours, planning, executing, and self-improving without human input. It beats or matches GPT-5.4 and Claude Opus 4.6 on several benchmarks. Here's why we made it free, how it compares, and why Pro still matters.

Read more
February 11, 2026

GLM-5 Is Live: Frontier Open-Source Scale for Complex Engineering and Agentic Work

GLM-5 launches on LeemerChat with major upgrades in scale, training data, and RL infrastructure. Built for long-horizon agentic systems, coding reliability, and complex reasoning under production constraints.

Read more
March 2, 2026

Get Ready for Mission Control: The Next Evolution of Agentic Execution

Mission Control is our next-generation agentic research and execution platform. It represents a fundamental shift in how we interact with AI—moving away from rigid pipelines and chat interfaces, and stepping into the era of autonomous, goal-oriented swarms.

Read more
February 22, 2026

The Foundry Report: Why Fine-Tuned Models Are Still the Sharpest Weapon in Enterprise AI

Tinker is now generally available. Vision input, Kimi K2 Thinking, and LoRA Without Regret are reshaping what custom model training looks like in 2026. Here's why fine-tuning is more strategically important than ever — and how LeemerLabs Model Foundry is building the infrastructure to prove it.

Read more
Explore more:All PostsReleasesModelsBenchmarksEngineeringInsightsAll FeaturesAbout UsTermsPrivacy