Qwen3.5 397B A17B

alibaba

The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers state-of-the-art performance comparable to leading-edge models across a wide range of tasks, including language understanding, logical reasoning, code generation, agent-based tasks, image understanding, video understanding, and graphical user interface (GUI) interactions. With its robust code-generation and agent capabilities, the model exhibits strong generalization across diverse agent scenarios.

Try Now

Capabilities

Tool Use

Image Input

Extended Thinking

Example Use Cases

Need strong multimodal reasoning from alibaba

Complex coding or agent task with vision

Cost-effective large moe model with image understanding

Technical Specifications

Context Window

256,000 tokens

Max Output

64,000 tokens

Cache Miss Cost

$0.60 per 1M tokens

Non-Reasoning Cost

$3.60 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls