complior eval tests a running AI system. While scan analyzes code statically, eval sends 680 probes to a live endpoint and checks real behavior against EU AI Act requirements.
How it works
Probes are sent
680 test probes sent to your endpoint across 3 phases: deterministic (168), LLM-judged (212), security attacks (300).
Responses evaluated
Each response scored against EU AI Act criteria. Bias measured via A/B paired tests. Security via attack success rate.
Target adapters
- Generic HTTP
- OpenAI
- Anthropic
- Ollama
POST { "message": "{{probe}}" }, reads response body as text.Example output

Passport integration
With--agent, eval results are written directly into the passport:
compliance.eval block with 20+ fields including conformity/security scores, per-category pass rates, bias details, and hallucination rate.
Eval Modes
4 composable modes: deterministic, LLM-judged, security, full.
Conformity Tests
11 EU AI Act categories, 370 deterministic + LLM-judged tests.