ERNIE 4.5 VL 28B A3B

Baidu

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

Try Now

Capabilities

Thinking

Tool Use

Image Input

Example Use Cases

Budget multimodal reasoning

Image understanding with thinking

Lightweight vision-language task

Technical Specifications

Context Window

30,000 tokens

Max Output

8,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.14

Non-Reasoning Output

$0.56

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.6 Plus