Gemma 3n 4B

Google

Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs-including text, visual data, and audio-enabling diverse tasks such as text generation, speech recognition, translation, and image analysis. Leveraging innovations like Per-Layer Embedding (PLE) caching and the MatFormer architecture, Gemma 3n dynamically manages memory usage and computational load by selectively activating model parameters, significantly reducing runtime resource requirements. This model supports a wide linguistic range (trained in over 140 languages) and features a flexible 32K token context window. Gemma 3n can selectively load parameters, optimizing memory and computational efficiency based on the task or device capabilities, making it well-suited for privacy-focused, offline-capable applications and on-device AI solutions.

Try Now

Technical Specifications

Context Window

32,768 tokens

Max Output

32,768 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.02

Non-Reasoning Output

$0.04

Legacy

Made legacy on

Reason

4B edge model; no tool support; superseded by Gemma 4

Recommended Replacement

Gemma 4 26B A4B