GLM 4.5 AirX

zai

The high-speed variant of GLM-4.5-Air, delivering ultra-fast response times while maintaining strong performance. With 106B total parameters and 12B active per forward pass, it combines the efficiency of the Air architecture with optimized inference speed exceeding 100 tokens per second. Ideal for low-latency production deployments where speed matters alongside intelligent agent capabilities.

Try Now

Capabilities

Tool Use

Extended Thinking

Example Use Cases

Need fast glm agent model

Low-latency agentic workflows

Lightweight performance with ultra-fast response

Technical Specifications

Context Window

128,000 tokens

Max Output

96,000 tokens

Cache Miss Cost

$1.10 per 1M tokens

Non-Reasoning Cost

$4.50 per 1M tokens

Cache Read Cost

$0.22 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Superseded by GLM 5

Recommended Replacement

GLM 5