MiniMax-01 combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context of up to 4 million tokens. The text model adopts a hybrid architecture that combines Lightning Attention, Softmax Attention, and Mixture-of-Experts (MoE). The image model adopts the “ViT-MLP-LLM” framework and is trained on top of the text model.
Try NowImage understanding with large context
Multimodal tasks combining text and vision
Need ultra-long context window
1,000,192 tokens
1,000,192 tokens
$0.20 per 1M tokens
$1.10 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls