MiniMax-01

minimax

MiniMax-01 combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model.

Try Now

Capabilities

Image Input

Example Use Cases

Image understanding with large context

Multimodal tasks combining text and vision

Need ultra-long context window

Technical Specifications

Context Window

1,000,192 tokens

Max Output

1,000,192 tokens

Cache Miss Cost

$0.20 per 1M tokens

Non-Reasoning Cost

$1.10 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.5 Plus