Spotlight

arcee

Spotlight is a 7-billion-parameter vision-language model derived from Qwen 2.5-VL and fine-tuned by Arcee AI for tight image-text grounding tasks. It offers a 32 k-token context window, enabling rich multimodal conversations that combine lengthy documents with one or more images. Training emphasized fast inference on consumer GPUs while retaining strong captioning, visual‐question-answering, and diagram-analysis accuracy. As a result, Spotlight slots neatly into agent workflows where screenshots, charts or UI mock-ups need to be interpreted on the fly. Early benchmarks show it matching or out-scoring larger VLMs such as LLaVA-1.6 13 B on popular VQA and POPE alignment tests.

Try Now

Capabilities

Image Input

Example Use Cases

Lightweight image understanding

Screenshot or diagram interpretation

Visual question answering on a budget

Technical Specifications

Context Window

131,072 tokens

Max Output

65,537 tokens

Cache Miss Cost

$0.18 per 1M tokens

Non-Reasoning Cost

$0.18 per 1M tokens

Web Search Cost

$15 per 1K calls

Code Execution Cost

$0.19 per 1K calls

⚠️ Legacy

Made legacy on

Reason

Untested

Recommended Replacement

Qwen3.5 Plus