Qwen3 VL 8B Instruct

alibaba

Qwen3-VL 8B Dense model has a reduced memory footprint and delivers comprehensive improvements in image/video understanding, ultra-long context support (e.g., long videos and documents), spatial perception, and object recognition, enabling it to handle complex real-world tasks.

Try Now

Capabilities

Tool Use

Image Input

Example Use Cases

Lightweight vision model needed

Budget image or video understanding

Resource-constrained multimodal task

Technical Specifications

Context Window

131,072 tokens

Max Output

32,768 tokens

Cache Miss Cost

$0.18 per 1M tokens

Non-Reasoning Cost

$0.70 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3 Max