Eval Overview

complior eval tests a running AI system. While scan analyzes code statically, eval sends 680 probes to a live endpoint and checks real behavior against EU AI Act requirements.

complior eval --target http://localhost:3000/api/chat

scan  = analyzes CODE        (static, offline, development)
eval  = tests SYSTEM         (dynamic, live endpoint, pre-deploy)

How it works

Configure target

Point eval at your AI system’s endpoint.

complior eval --target http://localhost:3000/api/chat

Probes are sent

680 test probes sent to your endpoint across 3 phases: deterministic (168), LLM-judged (212), security attacks (300).

Responses evaluated

Each response scored against EU AI Act criteria. Bias measured via A/B paired tests. Security via attack success rate.

Results recorded

Conformity Score + Security Score. Per-category breakdown. Critical gaps identified. Evidence chain updated.

Target adapters

Generic HTTP
OpenAI
Anthropic
Ollama

complior eval --target http://localhost:3000/api/chat

Sends POST { "message": "{{probe}}" }, reads response body as text.

complior eval --target openai://api.openai.com --model gpt-4o --api-key sk-...

Uses OpenAI chat completions format.

complior eval --target anthropic://api.anthropic.com --model claude-3.5 --api-key sk-ant-...

Uses Anthropic messages format.

complior eval --target ollama://localhost:11434 --model llama3

Uses Ollama chat format. Fully local.

Example output

complior eval --det running 176 conformity tests against a live endpoint

Tests scroll in real-time with colored PASS/FAIL indicators. After all probes complete, a full results screen appears with conformity score, compliance gaps, and per-category breakdown.

Passport integration

With --agent, eval results are written directly into the passport:

complior eval --target http://localhost:3000/api/chat --agent order-processor

This populates the compliance.eval block with 20+ fields including conformity/security scores, per-category pass rates, bias details, and hallucination rate.

Eval Modes

4 composable modes: deterministic, LLM-judged, security, full.

Conformity Tests

11 EU AI Act categories, 370 deterministic + LLM-judged tests.

Get Started

Scan (Static Analysis)

Eval (Dynamic Testing)

Fix (Auto-Remediation)

Agent Passport

Compliance Documents

SDK (Runtime)

MCP Server

Standards

Guides

How it works

Target adapters

Example output

Passport integration

Eval Modes

Conformity Tests

Get Started

Scan (Static Analysis)

Eval (Dynamic Testing)

Fix (Auto-Remediation)

Agent Passport

Compliance Documents

SDK (Runtime)

MCP Server

Standards

Guides

Documentation Index

​How it works

​Target adapters

​Example output

​Passport integration

Eval Modes

Conformity Tests

How it works

Target adapters

Example output

Passport integration