Mercury

Inception

Mercury is a diffusion-based large language model from Inception Labs, designed for ultra-fast inference with sub-second latency. It supports a 128K context window, native tool calling, and structured outputs. Mercury excels at general-purpose reasoning, chat, and agent workflows where speed is paramount.

Try Now

Capabilities

Tool Use

Technical Specifications

Context Window

128,000 tokens

Max Output

16,384 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.25

Non-Reasoning Output

$1

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Original Mercury; superseded by Mercury 2

Recommended Replacement

Qwen3.6 Plus