A small, fast model that excels at RAG, tool use, agents, and similar tasks requiring complex reasoning and multiple steps. With 128K context window and extremely low pricing, it is ideal for high-throughput production workloads where cost and speed are paramount.
Try Now128,000 tokens
4,000 tokens
$0.0375
$0.15
$15
$0.19
7B model; too small for production chat