Two complementary safety mechanisms: the safety filter scans LLM responses for harmful content (Art.15), and the HITL gate requires human approval for critical actions (Art.14).
Safety Filter
The safety filter is a post-hook that scans LLM responses against pattern-based rules across 5 categories.
Categories
Category Patterns Severity Range Description violence4 HIGH–CRITICAL Lethal violence instructions, weapon construction, attack planning self_harm3 HIGH–CRITICAL Self-harm instructions, self-injury encouragement illegal_instructions4 MEDIUM–CRITICAL Hacking, drug manufacturing, counterfeiting, law evasion pii_leakage4 MEDIUM–HIGH SSN/credit card/IBAN numbers and credentials in output hallucination_indicator4 LOW–MEDIUM False memory claims, unverifiable citations, unverified legal claims
Severity Weights
Severity Weight LOW0.1 MEDIUM0.3 HIGH0.6 CRITICAL1.0
Configuration
import { complior } from '@complior/sdk' ;
import OpenAI from 'openai' ;
const client = complior ( new OpenAI (), {
safetyFilter: true ,
safetyMode: 'block' , // 'block' | 'warn' | 'log'
safetyThreshold: 0.5 , // 0–1, lower = stricter
});
Field Type Default Description safetyFilterbooleanfalseEnable safety scanning safetyMode'block' | 'warn' | 'log''block'Action on threshold exceeded safetyThresholdnumber0.5Aggregate score threshold
block — throws SafetyViolationError
warn — adds findings to response metadata, does not throw
log — logs findings, no metadata or throw
SafetyFinding
interface SafetyFinding {
readonly category : string ; // e.g., 'violence', 'self_harm'
readonly severity : string ; // 'LOW' | 'MEDIUM' | 'HIGH' | 'CRITICAL'
readonly evidence : string ; // Matched text
readonly score : number ; // Pattern weight
}
Human-in-the-Loop Gate
The HITL gate is a post-hook that pauses execution when LLM output matches critical action patterns. A human operator must approve or deny before the response is returned.
If no onGateTriggered callback is configured, the gate auto-denies all matches (fail-safe per Art.14). Always provide a callback in production.
Built-in Rules
4 default rules cover the most critical action categories:
Rule ID Category Pattern Example Match financialFinancial transaction Transfer/send/pay + amount ”transfer $5,000 to account” data_deletionData deletion Delete/drop/purge + data ”delete all user records” permission_changePermission change Grant/revoke + access/role ”grant admin access to user” safety_criticalSafety-critical system Deploy/shutdown + production ”deploy to production system”
Configuration
import { complior } from '@complior/sdk' ;
import type { GateRequest , GateDecision } from '@complior/sdk' ;
import OpenAI from 'openai' ;
const client = complior ( new OpenAI (), {
hitlGate: true ,
hitlGateTimeoutMs: 60_000 , // 1 minute (default: 5 minutes)
onGateTriggered : async ( request : GateRequest ) : Promise < GateDecision > => {
console . log ( `Action detected: ${ request . rule . description } ` );
console . log ( `Matched text: " ${ request . matchedText } "` );
console . log ( `Provider: ${ request . provider } , Method: ${ request . method } ` );
const approved = await promptHumanOperator ( request );
return approved
? { approved: true }
: { approved: false , reason: 'Operator rejected the action' };
},
});
Custom Rules
Override built-in rules with hitlGateRules:
import type { GateRule } from '@complior/sdk' ;
const customRules : GateRule [] = [
{
id: 'pii_export' ,
description: 'PII data export' ,
pattern: / \b ( export | download | extract ) \b . * \b ( personal data | PII | user data | customer data ) \b / i ,
category: 'data_export' ,
},
{
id: 'model_deployment' ,
description: 'ML model deployment' ,
pattern: / \b ( deploy | publish | release ) \b . * \b ( model | checkpoint | weights ) \b / i ,
category: 'model_ops' ,
},
];
const client = complior ( new OpenAI (), {
hitlGate: true ,
hitlGateRules: customRules ,
onGateTriggered : async ( req ) => {
// Your approval logic
return { approved: true };
},
});
Types
interface GateRule {
readonly id : string ;
readonly description : string ;
readonly pattern : RegExp ;
readonly category : string ;
}
interface GateRequest {
readonly rule : GateRule ;
readonly matchedText : string ;
readonly provider : string ;
readonly method : string ;
readonly timestamp : number ;
}
type GateDecision =
| { approved : true }
| { approved : false ; reason ?: string };
Timeout Behavior
Default timeout: 5 minutes (300000 ms)
On timeout: throws HumanGateDeniedError with reason: 'timeout'
On denial: throws HumanGateDeniedError with reason: 'denied'
On approval: response passes through with hitlGateTriggered: true in metadata
Error Handling SafetyViolationError and HumanGateDeniedError reference.
Agent Mode Circuit breaker for cascading agent failures.