Nemotron Nano 9B V2

nvidia

NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so.

Try Now

Capabilities

Tool Use

Extended Thinking

Example Use Cases

Lightweight nvidia reasoning

Budget tool-calling with thinking

Small unified reasoning model

Technical Specifications

Context Window

131,072 tokens

Max Output

131,072 tokens

Cache Miss Cost

$0.04 per 1M tokens

Non-Reasoning Cost

$0.16 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.5 Plus