A compact 8B multimodal model excelling at a variety of critical benchmarks for language, text, and image capabilities. Focused on low latency and best-in-class performance with image understanding across multiple languages.
Try NowFast multilingual image understanding
Budget multimodal cross-lingual task
High-throughput visual multilingual processing
16,000 tokens
4,000 tokens
$0.50 per 1M tokens
$1.50 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls