Trinity Mini is a 26B-parameter (3B active) sparse mixture-of-experts language model, engineered for efficient inference over long contexts with robust function calling and multi-step agent workflows. With 128K context, it delivers an outstanding price-to-performance ratio while maintaining coherent multi-turn reasoning and reliable tool use. Ideal for production deployments where speed and cost efficiency are paramount.
Try NowFast inference on a budget
Function calling and agent workflows
Long-context processing with minimal compute
131,072 tokens
131,072 tokens
$0.045 per 1M tokens
$0.15 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls