Guardrails

Ryvion includes built-in safety guardrails for AI inference: PII detection, prompt injection defense, and content safety filters. These run automatically and produce audit trail entries for compliance.

PII Detection

Personally identifiable information is detected before it reaches the model. When PII is found, it can be:

Flagged -- marked in the audit trail but still processed
Redacted -- removed or masked before inference
Blocked -- the request is rejected with an error

Detected PII types

Email addresses
Phone numbers
Social security numbers and national IDs
Credit card numbers
Physical addresses
Names (when contextually identifiable)

Prompt Injection Defense

Requests are scanned for prompt injection attempts -- inputs designed to override system instructions or extract sensitive information.

When a prompt injection is detected:

The attempt is logged in the audit trail
The request may be blocked depending on severity
The receipt records the guardrail trigger

Content Safety

Content safety filters screen both inputs and outputs with configurable thresholds per category:

Category	Description
`violence`	Graphic violence or harm instructions
`sexual`	Explicit sexual content
`hate`	Hate speech or discriminatory content
`self_harm`	Self-harm or suicide-related content
`illegal`	Instructions for illegal activities

Configurable thresholds

Set per-category thresholds to control how the guardrails respond:

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "guardrails": {
      "pii_redaction": true,
      "content_filter": {
        "violence": "block",
        "sexual": "block",
        "hate": "block",
        "self_harm": "block",
        "illegal": "flag"
      }
    }
  }'

Threshold values:

"block" -- reject the request and return a 400 error
"flag" -- allow the request but log a warning to the audit trail
"off" -- disable the filter for this category

Audit trail

Every guardrail activation is recorded in the audit trail with:

Timestamp -- when the check ran
Type -- which guardrail was triggered (PII, injection, safety)
Action -- what was done (flagged, redacted, blocked)
Details -- specifics of what was detected (without exposing the sensitive content)

View guardrail events in the Audit Trail dashboard or query via the API.

EU AI Act compliance

Guardrails are designed to support EU AI Act Article 14 requirements:

Transparency -- all guardrail actions are logged and auditable
Human oversight -- flagged events can be reviewed by humans
Risk management -- automated detection of high-risk content
Documentation -- full audit trail of all safety measures applied

Configuration

Guardrails are enabled by default on all inference requests. They run transparently without requiring changes to your API calls.

Relationship to receipts

Guardrail events are linked to cryptographic receipts. Each receipt records whether any guardrails were triggered during the job, creating an end-to-end audit trail from request to response.