Gemma 4 26B A4B

Google

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference, delivering near-31B quality at a fraction of the compute cost. Supports multimodal input including text, images, and video (up to 60s at 1fps). Features a 256K token context window, native function calling, configurable thinking/reasoning mode, and structured output support. Released under Apache 2.0.

Try Now

Capabilities

Thinking

Tool Use

Image Input

Example Use Cases

Efficient multimodal reasoning

Budget-friendly vision-language tasks

Technical Specifications

Context Window

262,144 tokens

Max Output

262,144 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.13

Non-Reasoning Output

$0.40

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.6 Plus