Mercury 2

inception

Mercury 2 is an extremely fast reasoning LLM and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving over 1,000 tokens/sec on standard GPUs. It supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice and search, and agent loops.

Try Now

Capabilities

Extended Thinking

Tool Use

Example Use Cases

Latency-critical reasoning or agent loops

Real-time coding workflows where speed compounds

Fast structured output with tool calling

Technical Specifications

Context Window

128,000 tokens

Max Output

32,768 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.25

Non-Reasoning Output

$0.75

Cache Read Input

$0.025

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.5 Plus