Aya Vision 32B

cohere

A state-of-the-art 32B multimodal model excelling at a variety of critical benchmarks for language, text, and image capabilities. Serves 23 languages with full image understanding, allowing you to pass in images and text and get a single coherent response. Focused on state-of-the-art multilingual performance.

Try Now

Capabilities

Image Input

Example Use Cases

Multilingual image understanding

Cross-lingual visual question answering

Multimodal multilingual document analysis

Technical Specifications

Context Window

16,000 tokens

Max Output

4,000 tokens

Cache Miss Cost

$0.50 per 1M tokens

Non-Reasoning Cost

$1.50 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Claude Haiku 4.5