Qwen3 series VL models feature significantly enhanced multimodal reasoning capabilities, with a particular focus on optimizing the model for STEM and mathematical reasoning. Visual perception and recognition abilities have been comprehensively improved, and OCR capabilities have undergone a major upgrade.
Try NowHardest visual reasoning task
Stem or math with image input
Complex multimodal analysis requiring thinking
131,072 tokens
32,768 tokens
$0.40 per 1M tokens
$4 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls