Aya Vision 32B

Cohere

A state-of-the-art 32B multimodal model excelling at a variety of critical benchmarks for language, text, and image capabilities. Serves 23 languages with full image understanding, allowing you to pass in images and text and get a single coherent response. Focused on state-of-the-art multilingual performance.

Try Now

Capabilities

Image Input

Technical Specifications

Context Window

16,000 tokens

Max Output

4,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.50

Non-Reasoning Output

$1.50

Legacy

Made legacy on

Reason

Vision research model; superseded by Command A Vision

Recommended Replacement

Command A Vision