Spotlight

Arcee

Spotlight is a 7-billion-parameter vision-language model derived from Qwen 2.5-VL and fine-tuned by Arcee AI for tight image-text grounding tasks. It offers a 32 k-token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question-answering, and diagram-analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock-ups need to be interpreted on the fly. Early benchmarks show it matching or out-scoring larger VLMs such as LLaVA-1.6 13 B on popular VQA and POPE alignment tests.

Try Now

Capabilities

Image Input

Technical Specifications

Context Window

131,072 tokens

Max Output

65,537 tokens

Pricing

Token Costs (per 1M tokens)

Cache Miss Input

$0.18

Non-Reasoning Output

$0.18

Legacy

Made legacy on

Reason

Arcee spotlight model; limited testing

Recommended Replacement

Qwen3.6 Plus