Qwen3-VL 8B Dense model has a reduced memory footprint and delivers comprehensive improvements in image/video understanding, ultra-long context support (e.g., long videos and documents), spatial perception, and object recognition, enabling it to handle complex real-world tasks.
Try NowLightweight vision model needed
Budget image or video understanding
Resource-constrained multimodal task
131,072 tokens
32,768 tokens
$0.18 per 1M tokens
$0.70 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls