Skip to main content
complior eval tests a running AI system. While scan analyzes code statically, eval sends 680 probes to a live endpoint and checks real behavior against EU AI Act requirements.
complior eval --target http://localhost:3000/api/chat
scan  = analyzes CODE        (static, offline, development)
eval  = tests SYSTEM         (dynamic, live endpoint, pre-deploy)

How it works

1

Configure target

Point eval at your AI system’s endpoint.
complior eval --target http://localhost:3000/api/chat
2

Probes are sent

680 test probes sent to your endpoint across 3 phases: deterministic (168), LLM-judged (212), security attacks (300).
3

Responses evaluated

Each response scored against EU AI Act criteria. Bias measured via A/B paired tests. Security via attack success rate.
4

Results recorded

Conformity Score + Security Score. Per-category breakdown. Critical gaps identified. Evidence chain updated.

Target adapters

complior eval --target http://localhost:3000/api/chat
Sends POST { "message": "{{probe}}" }, reads response body as text.

Example output

complior eval --det running 176 conformity tests against a live endpoint Tests scroll in real-time with colored PASS/FAIL indicators. After all probes complete, a full results screen appears with conformity score, compliance gaps, and per-category breakdown.

Passport integration

With --agent, eval results are written directly into the passport:
complior eval --target http://localhost:3000/api/chat --agent order-processor
This populates the compliance.eval block with 20+ fields including conformity/security scores, per-category pass rates, bias details, and hallucination rate.

Eval Modes

4 composable modes: deterministic, LLM-judged, security, full.

Conformity Tests

11 EU AI Act categories, 370 deterministic + LLM-judged tests.