Skip to main content
Two complementary safety mechanisms: the safety filter scans LLM responses for harmful content (Art.15), and the HITL gate requires human approval for critical actions (Art.14).

Safety Filter

The safety filter is a post-hook that scans LLM responses against pattern-based rules across 5 categories.

Categories

CategoryPatternsSeverity RangeDescription
violence4HIGH–CRITICALLethal violence instructions, weapon construction, attack planning
self_harm3HIGH–CRITICALSelf-harm instructions, self-injury encouragement
illegal_instructions4MEDIUM–CRITICALHacking, drug manufacturing, counterfeiting, law evasion
pii_leakage4MEDIUM–HIGHSSN/credit card/IBAN numbers and credentials in output
hallucination_indicator4LOW–MEDIUMFalse memory claims, unverifiable citations, unverified legal claims

Severity Weights

SeverityWeight
LOW0.1
MEDIUM0.3
HIGH0.6
CRITICAL1.0

Configuration

import { complior } from '@complior/sdk';
import OpenAI from 'openai';

const client = complior(new OpenAI(), {
  safetyFilter: true,
  safetyMode: 'block',      // 'block' | 'warn' | 'log'
  safetyThreshold: 0.5,     // 0–1, lower = stricter
});
FieldTypeDefaultDescription
safetyFilterbooleanfalseEnable safety scanning
safetyMode'block' | 'warn' | 'log''block'Action on threshold exceeded
safetyThresholdnumber0.5Aggregate score threshold
  • block — throws SafetyViolationError
  • warn — adds findings to response metadata, does not throw
  • log — logs findings, no metadata or throw

SafetyFinding

interface SafetyFinding {
  readonly category: string;  // e.g., 'violence', 'self_harm'
  readonly severity: string;  // 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
  readonly evidence: string;  // Matched text
  readonly score: number;     // Pattern weight
}

Human-in-the-Loop Gate

The HITL gate is a post-hook that pauses execution when LLM output matches critical action patterns. A human operator must approve or deny before the response is returned.
If no onGateTriggered callback is configured, the gate auto-denies all matches (fail-safe per Art.14). Always provide a callback in production.

Built-in Rules

4 default rules cover the most critical action categories:
Rule IDCategoryPatternExample Match
financialFinancial transactionTransfer/send/pay + amount”transfer $5,000 to account”
data_deletionData deletionDelete/drop/purge + data”delete all user records”
permission_changePermission changeGrant/revoke + access/role”grant admin access to user”
safety_criticalSafety-critical systemDeploy/shutdown + production”deploy to production system”

Configuration

import { complior } from '@complior/sdk';
import type { GateRequest, GateDecision } from '@complior/sdk';
import OpenAI from 'openai';

const client = complior(new OpenAI(), {
  hitlGate: true,
  hitlGateTimeoutMs: 60_000, // 1 minute (default: 5 minutes)
  onGateTriggered: async (request: GateRequest): Promise<GateDecision> => {
    console.log(`Action detected: ${request.rule.description}`);
    console.log(`Matched text: "${request.matchedText}"`);
    console.log(`Provider: ${request.provider}, Method: ${request.method}`);

    const approved = await promptHumanOperator(request);
    return approved
      ? { approved: true }
      : { approved: false, reason: 'Operator rejected the action' };
  },
});

Custom Rules

Override built-in rules with hitlGateRules:
import type { GateRule } from '@complior/sdk';

const customRules: GateRule[] = [
  {
    id: 'pii_export',
    description: 'PII data export',
    pattern: /\b(export|download|extract)\b.*\b(personal data|PII|user data|customer data)\b/i,
    category: 'data_export',
  },
  {
    id: 'model_deployment',
    description: 'ML model deployment',
    pattern: /\b(deploy|publish|release)\b.*\b(model|checkpoint|weights)\b/i,
    category: 'model_ops',
  },
];

const client = complior(new OpenAI(), {
  hitlGate: true,
  hitlGateRules: customRules,
  onGateTriggered: async (req) => {
    // Your approval logic
    return { approved: true };
  },
});

Types

interface GateRule {
  readonly id: string;
  readonly description: string;
  readonly pattern: RegExp;
  readonly category: string;
}

interface GateRequest {
  readonly rule: GateRule;
  readonly matchedText: string;
  readonly provider: string;
  readonly method: string;
  readonly timestamp: number;
}

type GateDecision =
  | { approved: true }
  | { approved: false; reason?: string };

Timeout Behavior

  • Default timeout: 5 minutes (300000 ms)
  • On timeout: throws HumanGateDeniedError with reason: 'timeout'
  • On denial: throws HumanGateDeniedError with reason: 'denied'
  • On approval: response passes through with hitlGateTriggered: true in metadata

Error Handling

SafetyViolationError and HumanGateDeniedError reference.

Agent Mode

Circuit breaker for cascading agent failures.