ERNIE 4.5 VL 28B A3B

Baidu

A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing. Built with scaling-efficient infrastructure for high-throughput training and inference, the model leverages advanced post-training techniques including SFT, DPO, and UPO for optimized performance, while supporting an impressive 131K context length and RLVR alignment for superior cross-modal reasoning and generation capabilities.

Try Now

Capabilities

Thinking

Tool Use

Image Input

Technical Specifications

Context Window

30,000 tokens

Max Output

8,000 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.14

Non-Reasoning Output

$0.56

Tool Costs (per 1K calls)

Web Search

$15

Code Execution

$0.19

Legacy

Made legacy on

Reason

Vision model; limited availability outside China

Recommended Replacement

Qwen3.6 Plus