The high-speed variant of GLM-4.5-Air, delivering ultra-fast response times while maintaining strong performance. With 106B total parameters and 12B active per forward pass, it combines the efficiency of the Air architecture with optimized inference speed exceeding 100 tokens per second. Ideal for low-latency production deployments where speed matters alongside intelligent agent capabilities.
Try NowNeed fast glm agent model
Low-latency agentic workflows
Lightweight performance with ultra-fast response
128,000 tokens
96,000 tokens
$1.10 per 1M tokens
$4.50 per 1M tokens
$0.22 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls