A compact 12B multimodal model with image understanding alongside text capabilities.
Try NowImage understanding tasks
Multimodal with small footprint
Budget vision model
128,000 tokens
4,000 tokens
$0.15 per 1M tokens
$0.15 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls