Qwen3.5 122B A10B

alibaba

The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of overall performance, this model is second only to Qwen3.5-397B-A17B. Its text capabilities significantly outperform those of Qwen3-235B-2507, and its visual capabilities surpass those of Qwen3-VL-235B.

Try Now

Capabilities

Image Input

Extended Thinking

Tool Use

Example Use Cases

Strong multimodal reasoning at moderate cost

Vision-language task surpassing qwen3-vl quality

Efficient moe model with image understanding

Technical Specifications

Context Window

256,000 tokens

Max Output

64,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input
≤ 128,000 input$0.115
> 128,000 input$0.287
Non-Reasoning Output
≤ 128,000 input$0.917
> 128,000 input$2.294

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19