The InternVL3 series is an advanced multimodal large language model (MLLM). Compared to InternVL 2.5, InternVL3 demonstrates stronger multimodal perception and reasoning capabilities. In addition, InternVL3 is benchmarked against the Qwen2.5 Chat models, whose pre-trained base models serve as the initialization for its language component. Benefiting from Native Multimodal Pre-Training, the InternVL3 series surpasses the Qwen2.5 series in overall text performance.
Try NowMultimodal visual perception
Image understanding with reasoning
Vision-language analysis
32,768 tokens
32,768 tokens
$0.15 per 1M tokens
$0.60 per 1M tokens
$0.075 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls