Command A Vision

Cohere

Cohere's first model capable of processing images, excelling in enterprise use cases such as analyzing charts, graphs, and diagrams, table understanding, OCR, document Q&A, and object detection. Officially supports English, Portuguese, Italian, French, German, and Spanish with a 128K context window.

Try Now

Capabilities

Image Input

Technical Specifications

Context Window

128,000 tokens

Max Output

8,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$2.50

Non-Reasoning Output

$10