Guardrails
Ryvion includes built-in safety guardrails for AI inference: PII detection, prompt injection defense, and content safety filters. These run automatically and produce audit trail entries for compliance.
PII Detection
Personally identifiable information is detected before it reaches the model. When PII is found, it can be:
- Flagged -- marked in the audit trail but still processed
- Redacted -- removed or masked before inference
- Blocked -- the request is rejected with an error
Detected PII types
- Email addresses
- Phone numbers
- Social security numbers and national IDs
- Credit card numbers
- Physical addresses
- Names (when contextually identifiable)
Prompt Injection Defense
Requests are scanned for prompt injection attempts -- inputs designed to override system instructions or extract sensitive information.
When a prompt injection is detected:
- The attempt is logged in the audit trail
- The request may be blocked depending on severity
- The receipt records the guardrail trigger
Content Safety
Content safety filters screen both inputs and outputs with configurable thresholds per category:
| Category | Description |
|---|---|
violence | Graphic violence or harm instructions |
sexual | Explicit sexual content |
hate | Hate speech or discriminatory content |
self_harm | Self-harm or suicide-related content |
illegal | Instructions for illegal activities |
Configurable thresholds
Set per-category thresholds to control how the guardrails respond:
curl -X POST https://api.ryvion.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "phi-4",
"messages": [{"role": "user", "content": "Hello"}],
"guardrails": {
"pii_redaction": true,
"content_filter": {
"violence": "block",
"sexual": "block",
"hate": "block",
"self_harm": "block",
"illegal": "flag"
}
}
}'
Threshold values:
"block"-- reject the request and return a400error"flag"-- allow the request but log a warning to the audit trail"off"-- disable the filter for this category
Audit trail
Every guardrail activation is recorded in the audit trail with:
- Timestamp -- when the check ran
- Type -- which guardrail was triggered (PII, injection, safety)
- Action -- what was done (flagged, redacted, blocked)
- Details -- specifics of what was detected (without exposing the sensitive content)
View guardrail events in the Audit Trail dashboard or query via the API.
EU AI Act compliance
Guardrails are designed to support EU AI Act Article 14 requirements:
- Transparency -- all guardrail actions are logged and auditable
- Human oversight -- flagged events can be reviewed by humans
- Risk management -- automated detection of high-risk content
- Documentation -- full audit trail of all safety measures applied
Configuration
Guardrails are enabled by default on all inference requests. They run transparently without requiring changes to your API calls.
Relationship to receipts
Guardrail events are linked to cryptographic receipts. Each receipt records whether any guardrails were triggered during the job, creating an end-to-end audit trail from request to response.