R1 Distill Llama 70B

Name: R1 Distill Llama 70B
Brand: DeepSeek

DeepSeek

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including AIME 2024 pass@1: 70.0, MATH-500 pass@1: 94.5, and CodeForces Rating: 1633. The model leverages fine-tuning from DeepSeek R1 outputs, enabling competitive performance comparable to larger frontier models.

Try Now

Capabilities

Thinking

Technical Specifications

Context Window

131,072 tokens

Max Output

16,384 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.70

Non-Reasoning Output

$0.80

Legacy

Made legacy on February 19, 2026

Reason

Distilled model; superseded by native DeepSeek V3.2

Recommended Replacement

DeepSeek V4 Flash