The "Thinking" edition of Qwen3-VL 8B Dense has a reduced memory footprint, enabling multimodal understanding and reasoning. It supports ultra-long contexts (e.g., long videos and documents), 2D/3D visual localization, and enhances image/video comprehension, spatial perception, and object recognition.
Try NowLightweight visual reasoning
Budget multimodal reasoning task
Small model for image analysis with thinking
131,072 tokens
32,768 tokens
$0.18 per 1M tokens
$2.10 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls