This safeguard model has 8B parameters and is based on the Llama 3 family. It can do both prompt and response classification. LlamaGuard 2 acts as a normal LLM would, generating text that indicates whether the given input/output is safe/unsafe. If deemed unsafe, it will also share the content categories violated. For best results, please use raw prompt input or the completions endpoint, instead of the chat API. Usage of this model is subject to Meta's Acceptable Use Policy.
Try NowContent safety classification
Prompt or response moderation
Safety guardrail for LLM outputs
8,192 tokens
8,192 tokens
$0.20 per 1M tokens
$0.20 per 1M tokens
$15 per 1K calls
$0.19 per 1K calls