Mercury is a diffusion-based large language model from Inception Labs, designed for ultra-fast inference with sub-second latency. It supports a 128K context window, native tool calling, and structured outputs. Mercury excels at general-purpose reasoning, chat, and agent workflows where speed is paramount.
Try NowLatency-sensitive tasks requiring instant responses
General-purpose chat and reasoning
Tool calling and structured output workflows
128,000 tokens
16,384 tokens
$0.25 per 1M tokens
$1 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls