A compact 12B multimodal model with image understanding alongside text capabilities.
Image understanding at low cost
Multimodal task with budget constraints
Lightweight vision and text task
128,000 tokens
$0.15
$15
$0.19
Untested