A compact 12B multimodal model with image understanding alongside text capabilities.
128,000 tokens
$0.15
$15
$0.19
Small 12B vision model; superseded by Pixtral Large