A multimodal model from the Llama 4 collection with MoE architecture for text and image tasks. Designed for multimodal experiences with vision capabilities.
Try Now131,072 tokens
8,192 tokens
$0.15 per 1M tokens
$0.6 per 1M tokens
8 files
Yes
Poor tool calling capabilities and hallucinates web searches
$0 per month