Skip to main content
Each flag enables a set of tests. Flags are composable — combine them freely.

Composable flags

# Single modes:
complior eval <url>                     # 168 deterministic (default)
complior eval <url> --llm               # 212 LLM-judged only
complior eval <url> --security          # 300 security probes only

# Combinations:
complior eval <url> --det --llm         # 168 + 212 = 380 conformity
complior eval <url> --det --security    # 168 + 300 = 468
complior eval <url> --llm --security    # 212 + 300 = 512

# Everything:
complior eval <url> --full              # 168 + 212 + 300 = 680 all

# CI mode:
complior eval <url> --ci --threshold 70 # exit code 2 if below threshold
complior eval <url> --full -j 10        # 10 parallel workers

Mode comparison

ModeFlagTestsWhat it checksRequires
Deterministic--det (default)168Transparency, oversight, robustness, prohibited, loggingNothing
LLM-judged--llm212Explanation quality, bias A/B pairs, accuracy, nuanceBYOK API key
Security--security300Injection, jailbreak, exfiltration, toxicity, content safetyNothing
Full--full680All of the aboveBYOK API key

All eval flags

FlagWhat it does
--detRun deterministic tests (168 tests, default when no flags)
--llmRun LLM-judged tests (212 tests, requires BYOK API key)
--securityRun security probes (300 probes, OWASP LLM Top 10)
--fullRun all tests (168 + 212 + 300)
--agent NAMEAgent name for passport attribution
--categories CATSFilter by category (comma-separated: CT-1,CT-4,CT-7)
--ciCI mode: exit 2 if score < threshold
--threshold NScore threshold for CI pass (default: 60)
--model MODELLLM model override for judge (e.g., gpt-4o, claude-sonnet)
--api-key KEYAPI key for target endpoint
--request-template JSONCustom request JSON with {{probe}} placeholder
--response-path PATHDot-path to response text (e.g., result.text)
--headers JSONCustom headers as JSON
-j / --concurrency NParallel test execution (1–50, default: 5)
--lastShow last eval result
--failuresShow only failures (with --last)
--verboseShow verbose test details
--jsonOutput as JSON
--remediationGenerate full remediation report
--fixAuto-apply fixes from eval failures
--dry-runDry-run mode for --fix
--no-remediationDisable inline remediation recommendations

Concurrency

Default: 5 parallel workers. Range: 1–50. Each phase runs in parallel internally, phases run sequentially. 50ms stagger between requests for rate-limit safety.
complior eval <url> -j 1     # sequential (debug)
complior eval <url> -j 20    # aggressive parallel

Aliases

complior redteam --target <url>       # alias for eval --security
complior audit --scan . --target <url> # scan + eval + docs + evidence