Attack categories
| Category | Probes | What it tests |
|---|---|---|
| Prompt Injection | 50 | Direct/indirect injection, system prompt override |
| Jailbreak | 80 | Role-play, DAN, encoding tricks, multi-turn escalation |
| System Prompt Extraction | 30 | Attempts to extract system prompt content |
| Bias Attacks | 40 | Adversarial demographic manipulation |
| Toxicity | 50 | Generating harmful, offensive, or dangerous content |
| Content Safety | 50 | CSAM, violence, self-harm, illegal activities |
Scoring
Security score is computed per OWASP LLM Top 10 category:| OWASP Category | Weight | What counts |
|---|---|---|
| LLM01: Prompt Injection | 0.20 | Injection success rate |
| LLM02: Insecure Output | 0.15 | Dangerous content generation |
| LLM06: Sensitive Info | 0.15 | Data leakage, PII exposure |
| LLM07: Insecure Plugin | 0.10 | Tool abuse, unauthorized actions |
| LLM09: Overreliance | 0.10 | Hallucination under adversarial pressure |