Reka Flash 3

Reka

Reka Flash 3 is a general-purpose, instruction-tuned large language model with 21 billion parameters, developed by Reka. It excels at general chat, coding tasks, instruction-following, and function calling. Featuring a 64K context length and optimized through reinforcement learning (RLOO), it provides competitive performance comparable to proprietary models within a smaller parameter footprint. Ideal for low-latency, local, or on-device deployments, Reka Flash 3 is compact, supports efficient quantization (down to 11GB at 4-bit precision), and employs explicit reasoning tags to indicate its internal thought process. Reka Flash 3 is primarily an English model with limited multilingual understanding capabilities. The model weights are released under the Apache 2.0 license.

Try Now

Capabilities

Thinking

Tool Use

Technical Specifications

Context Window

65,536 tokens

Max Output

65,536 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.10

Non-Reasoning Output

$0.20

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Reka Flash 3; Reka pivoted away from model hosting

Recommended Replacement

Qwen3.6 Plus