Molmo2 8B

Allen AI

Molmo2-8B is an open vision-language model developed by the Allen Institute for AI (Ai2) as part of the Molmo2 family, supporting image, video, and multi-image understanding and grounding. It is based on Qwen3-8B and uses SigLIP 2 as its vision backbone, outperforming other open-weight, open-data models on short videos, counting, and captioning, while remaining competitive on long-video tasks.

Try Now

Capabilities

Image Input

Technical Specifications

Context Window

36,864 tokens

Max Output

36,864 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.20

Non-Reasoning Output

$0.20

Retired

Made legacy on

Reason

8B vision model; research-focused; too small for production

Recommended Replacement

Qwen3.6 Plus

Retired on