Mercury

inception

Mercury is a diffusion-based large language model from Inception Labs, designed for ultra-fast inference with sub-second latency. It supports a 128K context window, native tool calling, and structured outputs. Mercury excels at general-purpose reasoning, chat, and agent workflows where speed is paramount.

Try Now

Capabilities

Tool Use

Example Use Cases

Latency-sensitive tasks requiring instant responses

General-purpose chat and reasoning

Tool calling and structured output workflows

Technical Specifications

Context Window

128,000 tokens

Max Output

16,384 tokens

Cache Miss Cost

$0.25 per 1M tokens

Non-Reasoning Cost

$1 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.5 Plus