Kimi's most versatile model featuring a native multimodal architecture that supports both visual and text input. Combines thinking and non-thinking modes with dialogue and agent capabilities. With a 262K context window and massive 252K output capacity, it handles complex multimodal workflows at an exceptional price point.
Try NowNeed vision and reasoning combined
Multimodal agent tasks
Versatile model for mixed dialogue and agent work
262,144 tokens
252,144 tokens
$0.45 per 1M tokens
$2.80 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls