GLM-5 Is Live: Frontier Open-Source Scale for Complex Engineering and Agentic Work

Total Parameters

745B

Active Parameters

40-50B

Training Tokens

28.5T

Context Length

128K+

Frontier Open Source

Why GLM-5 matters

We believe scaling remains one of the strongest levers for improving intelligence efficiency on the path to AGI. But raw scale alone is not enough. What makes GLM-5 compelling is how pre-training scale, systems efficiency, and reinforcement learning infrastructure were advanced together.

GLM-5 represents a major step forward in open-weight AI systems. With approximately 745 billion total parameters and a Mixture-of-Experts (MoE) architecture that activates only 40-50 billion parameters per token, it delivers frontier-level performance with manageable computational costs.

The result is a model that pushes beyond GLM-4.7 on reasoning, coding, and agentic execution benchmarks, while narrowing the gap to top closed frontier systems like Claude Opus 4.5, Gemini 3 Pro, and GPT-5.2. Practically, this means better reliability in long tool chains, fewer collapses on multi-step objectives, and stronger completion quality under real production constraints.

Native Multimodal

Core Capabilities

GLM-5 is designed for complex systems engineering and long-horizon agentic tasks. It excels across reasoning, coding, and autonomous execution domains.

Complex Reasoning

Advanced mathematical reasoning, scientific analysis, and logical problem-solving at frontier level.

Code Intelligence

State-of-the-art software engineering, debugging, and multi-language code generation capabilities.

Agentic Execution

Long-horizon task planning, tool orchestration, and autonomous workflow execution.

Systems Engineering

End-to-end system design, architecture planning, and complex engineering workflows.

Terminal Operations

Advanced command-line operations, shell scripting, and infrastructure automation.

Long Context

Process massive documents, codebases, and multi-step reasoning chains efficiently.

Architecture

Technical Architecture

Mixture-of-Experts (MoE)

745B total parameters with only 40-50B active per token. Smart routing selects specialized experts for each input, maximizing capability while minimizing compute.

DeepSeek Sparse Attention

Advanced sparse attention mechanisms enable efficient long-context processing without proportional compute overhead. Handle 128K+ context windows with ease.

Async RL Infrastructure

New "slime" reinforcement learning system increases training throughput for faster iteration and improved post-training quality.

Scale-Up Training

Pre-training corpus expanded from 23T to 28.5T tokens. Total parameters scaled from 355B to 745B for broader coverage and stronger generalization.

Generation-over-Generation

GLM-5 vs GLM-4.7 Improvements

Significant gains across agentic, reasoning, and coding benchmarks demonstrate the impact of increased scale and improved post-training.

Humanity's Last Exam

+17.8%improvement

42.8%50.4%

Terminal-Bench 2.0

+37.1%improvement

41.0%56.2%

MCP-Atlas

+30.4%improvement

52.0%67.8%

CyberGym

+83.8%improvement

23.5%43.2%

Tool-Decathlon

+59.7%improvement

23.8%38.0%

Reasoning Benchmarks

Academic & Reasoning Performance

GLM-5 demonstrates strong performance across mathematical reasoning, scientific knowledge, and academic benchmarks — competitive with leading frontier models.

Benchmark	Description	GLM-5	GLM-4.7	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2	DeepSeek-V3.2	Kimi K2.5
Humanity's Last Exam	Academic reasoning benchmarkNo tools	30.5%	24.8%	28.4%	37.2%	35.4%	—	—
Humanity's Last Exam	With tool access enabledWith tools	50.4%	42.8%	43.4%	45.8%	45.5%	—	—
AIME 2026 I	Mathematics competition	92.7%	92.9%	93.3%	90.6%	—	92.7%	92.5%
HMMT Nov. 2025	Harvard-MIT Math Tournament	96.9%	93.5%	91.7%	93.0%	97.1%	90.2%	91.1%
IMO AnswerBench	International Math Olympiad	82.5%	82.0%	78.5%	83.3%	86.3%	78.3%	81.8%
GPQA-Diamond	Graduate-level science Q&A	86.0%	85.7%	87.0%	91.9%	92.4%	82.4%	87.6%

Coding Benchmarks

Software Engineering Excellence

Top-tier performance on real-world coding tasks, from repository-level changes to terminal-based workflows and cybersecurity challenges.

Benchmark	Description	GLM-5	GLM-4.7	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2	DeepSeek-V3.2	Kimi K2.5
SWE-bench Verified	Real-world software engineering	77.8%	73.8%	80.9%	76.2%	80.0%	—	76.8%
SWE-bench Multilingual	Cross-language code tasks	73.3%	66.7%	77.5%	65.0%	72.0%	—	73.0%
Terminal-Bench 2.0	Agentic terminal workflows	56.2%	41.0%	59.3%	54.2%	54.0%	—	—
CyberGym	Cybersecurity challenges	43.2%	23.5%	50.6%	39.9%	—	17.3%	41.3%

Agentic Benchmarks

Autonomous Agent Performance

Exceptional capability in multi-step planning, tool orchestration, and long-horizon task execution — key for production AI agents.

Benchmark	Description	GLM-5	GLM-4.7	Claude Opus 4.5	Gemini 3 Pro	GPT-5.2	DeepSeek-V3.2	Kimi K2.5
BrowseComp	Web browsing & research	75.9%	67.5%	67.8%	59.2%	65.8%	—	74.9%
MCP-Atlas	Multi-step MCP workflows	67.8%	52.0%	65.2%	66.6%	68.0%	—	—
τ²-Bench	Agentic tool use & planning	89.7%	87.4%	91.6%	90.7%	85.5%	—	—
Tool-Decathlon	Long-horizon real-world tasks	38.0%	23.8%	43.5%	36.4%	46.3%	35.2%	27.8%

Head-to-Head Comparison

Key Benchmark Highlights

Visual comparison of GLM-5 against leading frontier models across critical benchmarks.

Reasoning

Humanity's Last Exam (with tools)

Score

GLM-550.4%

Gemini 3 Pro45.8%

GPT-5.245.5%

Claude Opus 4.543.4%

GLM-4.742.8%

Coding

SWE-bench Verified

Score

Claude Opus 4.580.9%

GPT-5.280%

GLM-577.8%

Gemini 3 Pro76.2%

GLM-4.773.8%

Agents

MCP-Atlas

Score

GPT-5.268%

GLM-567.8%

Gemini 3 Pro66.6%

Claude Opus 4.565.2%

GLM-4.752%

Terminal

Terminal-Bench 2.0

Score

Claude Opus 4.559.3%

GLM-556.2%

GPT-5.254%

Gemini 3 Pro54.2%

GLM-4.741%

Applications

Designed for Real-World Impact

GLM-5 excels in scenarios requiring deep reasoning, complex coding, and autonomous execution.

Software Engineering

Build, debug, and refactor complex codebases. Excel at SWE-bench tasks, multilingual coding, and long-horizon development workflows.

AI Agents & Automation

Deploy autonomous agents for research, data processing, and multi-step business workflows with reliable long-horizon execution.

Systems Architecture

Design distributed systems, cloud infrastructure, and complex technical architectures with deep reasoning capabilities.

Research & Analysis

Process massive documents, perform literature reviews, and synthesize insights across long-form content.

Specifications

Technical Specifications

Architecture

Mixture-of-Experts (MoE)

Total Parameters

~745 billion

Active Parameters

40-50 billion per token

Pre-training Data

28.5 trillion tokens

Context Window

128,000+ tokens

Attention Mechanism

DeepSeek Sparse Attention (DSA)

License

MIT (Open Source)

Inference Framework

vLLM compatible

Developer

Zhipu AI (Z.ai)

Advantages

Why Choose GLM-5?

Best-in-class performance among open-source models on reasoning benchmarks

Top-tier coding performance with 77.8% on SWE-bench Verified

Exceptional agentic capabilities with 67.8% on MCP-Atlas

Massive 745B parameter scale with efficient MoE architecture

MIT licensed — full commercial use, fine-tuning, and deployment freedom

Cost-efficient inference with only 40-50B active parameters per token

Native long-context support for complex document and codebase analysis

Strong multilingual capabilities across coding and reasoning tasks

Open Source

MIT Licensed — Full Freedom

GLM-5 is released under the MIT license, enabling commercial use, fine-tuning, and research deployment without restrictive licensing barriers. Build with confidence.

MIT

License

100%

Open Weight

What this unlocks in LeemerChat

Stronger model reliability for long-horizon engineering prompts
Better planning + execution in multi-tool, multi-turn agent loops
Higher quality code reasoning under multilingual and terminal-heavy tasks
A more capable open-source control brain for orchestrated workflows
Complex systems engineering with deep reasoning capabilities
Autonomous agent workflows with reliable long-horizon execution

Try GLM-5 in chat View all Z.AI models

Available now on LeemerChat

Experience the Frontier of Open Source

GLM-5 represents a new standard for open-weight AI. Best-in-class performance, MIT licensed, and ready for your most demanding engineering and agentic workflows.

Start chatting with GLM-5 Explore more posts