Qwen3.5 Flash

alibaba

The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the 3 series, these models deliver a leap forward in performance for both pure text and multimodal tasks, offering fast response times while balancing inference speed and overall performance.

Try Now

Capabilities

Image Input

Extended Thinking

Tool Use

Example Use Cases

Fast alibaba multimodal reasoning

Cost-efficient vision-language task with large context

High-throughput text and image processing

Technical Specifications

Context Window

1,000,000 tokens

Max Output

64,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input
≤ 128,000 input$0.029
≤ 256,000 input$0.115
> 256,000 input$0.172
Non-Reasoning Output
≤ 128,000 input$0.287
≤ 256,000 input$1.147
> 256,000 input$1.72
Cache Read Input
≤ 128,000 input$0.003
≤ 256,000 input$0.012
> 256,000 input$0.017
Cache Write Input
≤ 128,000 input$0.036
≤ 256,000 input$0.143
> 256,000 input$0.215

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19