MiMo V2 Omni

Name: MiMo V2 Omni
Brand: Xiaomi

Xiaomi

MiMo V2 Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability (visual grounding, multi-step planning, tool use, and code execution), making it well-suited for complex real-world tasks that span modalities. 256K context window.

Try Now

Capabilities

Thinking

Tool Use

Image Input

Technical Specifications

Context Window

262,144 tokens

Max Output

65,536 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.40

Non-Reasoning Output

Cache Read Input

$0.08

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on March 19, 2026

Reason

Multimodal omni model; limited integration

Recommended Replacement

MiMo V2.5 Pro