Mercury 2 is an extremely fast reasoning LLM and the first reasoning diffusion LLM (dLLM). Instead of generating tokens sequentially, Mercury 2 produces and refines multiple tokens in parallel, achieving over 1,000 tokens/sec on standard GPUs. It supports tunable reasoning levels, 128K context, native tool use, and schema-aligned JSON output. Built for coding workflows where latency compounds, real-time voice and search, and agent loops.
Try NowLatency-critical reasoning or agent loops
Real-time coding workflows where speed compounds
Fast structured output with tool calling
128,000 tokens
32,768 tokens
$0.25
$0.75
$0.025
$15
$0.19