R1 Distill Llama 70B

DeepSeek

DeepSeek R1 Distill Llama 70B is a distilled large language model based on Llama-3.3-70B-Instruct, using outputs from DeepSeek R1. The model combines advanced distillation techniques to achieve high performance across multiple benchmarks, including AIME 2024 pass@1: 70.0, MATH-500 pass@1: 94.5, and CodeForces Rating: 1633. The model leverages fine-tuning from DeepSeek R1 outputs, enabling competitive performance comparable to larger frontier models.

Try Now

Capabilities

Thinking

Example Use Cases

Reasoning task with open-source 70B model

Math or competitive programming

Distilled R1 reasoning at lower cost

Technical Specifications

Context Window

131,072 tokens

Max Output

16,384 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.70

Non-Reasoning Output

$0.80

Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.6 Plus