DeepSeek V4 Flash

DeepSeek

The efficiency-optimized DeepSeek V4 model with 284B parameters (13B activated) and a massive 1M token context window. Built on a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) for dramatically lower cost on long contexts, it runs at just a fraction of V3.2 inference cost while matching or exceeding its quality. Switches between fast non-thinking responses and explicit chain-of-thought reasoning with configurable effort (up to "max" for the hardest problems). Tool calls are supported in both modes. Ideal for high-throughput chat, coding assistance, and agent workflows over large documents or codebases.

Try Now

Capabilities

Thinking

Tool Use

Technical Specifications

Context Window

1,000,000 tokens

Max Output

131,072 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0

Non-Reasoning Output

$0

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19