A compact 12B multimodal model with image understanding alongside text capabilities.
128,000 tokens
4,000 tokens
$0.15 per 1M tokens
8 files
Poor tool calling capabilities
$0 per month