A completely free multimodal model with native function calling support from the GLM-4.6V series. Handles image, video, and document understanding at zero cost while supporting tool invocation for building multimodal agents. With 128K context, it provides substantial capability for visual understanding workflows without any API costs.
Try Now128,000 tokens
24,000 tokens
$0
$0
$15
$0.19