A completely free multimodal model with native function calling support from the GLM-4.6V series. Handles image, video, and document understanding at zero cost while supporting tool invocation for building multimodal agents. With 128K context, it provides substantial capability for visual understanding workflows without any API costs.
Try NowNeed free vision model
Budget multimodal tasks with tool use
Zero-cost image understanding
128,000 tokens
24,000 tokens
$0 per 1M tokens
$0 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls