Qwen3.5 Flash

Alibaba

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

Try Now

Capabilities

Thinking

Tool Use

Image Input

Technical Specifications

Context Window

1,000,000 tokens

Max Output

64,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input
≤ 128,000 input$0.029
≤ 256,000 input$0.115
> 256,000 input$0.172
Non-Reasoning Output
≤ 128,000 input$0.287
≤ 256,000 input$1.147
> 256,000 input$1.72
Cache Read Input
≤ 128,000 input$0.003
≤ 256,000 input$0.012
> 256,000 input$0.017
Cache Write Input
≤ 128,000 input$0.036
≤ 256,000 input$0.143
> 256,000 input$0.215

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Superseded by Qwen3.6 Flash

Recommended Replacement

Qwen3.6 Flash