Mercury 2

Inception

Mercury 2 is an extremely fast reasoning LLM and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving over 1,000 tokens/sec on standard GPUs. It supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice and search, and agent loops.

Try Now

Capabilities

Thinking

Tool Use

Technical Specifications

Context Window

128,000 tokens

Max Output

32,768 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.25

Non-Reasoning Output

$0.75

Cache Read Input

$0.025

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Mercury 2 from Inception; limited availability and testing

Recommended Replacement

Qwen3.6 Plus