Skip to content

Guardrails

Ryvion includes built-in safety guardrails for AI inference: PII detection, prompt injection defense, and content safety filters. These run automatically and produce audit trail entries for compliance.

PII Detection

Personally identifiable information is detected before it reaches the model. When PII is found, it can be:

  • Flagged -- marked in the audit trail but still processed
  • Redacted -- removed or masked before inference
  • Blocked -- the request is rejected with an error

Detected PII types

  • Email addresses
  • Phone numbers
  • Social security numbers and national IDs
  • Credit card numbers
  • Physical addresses
  • Names (when contextually identifiable)

Prompt Injection Defense

Requests are scanned for prompt injection attempts -- inputs designed to override system instructions or extract sensitive information.

When a prompt injection is detected:

  • The attempt is logged in the audit trail
  • The request may be blocked depending on severity
  • The receipt records the guardrail trigger

Content Safety

Content safety filters screen both inputs and outputs with configurable thresholds per category:

CategoryDescription
violenceGraphic violence or harm instructions
sexualExplicit sexual content
hateHate speech or discriminatory content
self_harmSelf-harm or suicide-related content
illegalInstructions for illegal activities

Configurable thresholds

Set per-category thresholds to control how the guardrails respond:

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role": "user", "content": "Hello"}],
    "guardrails": {
      "pii_redaction": true,
      "content_filter": {
        "violence": "block",
        "sexual": "block",
        "hate": "block",
        "self_harm": "block",
        "illegal": "flag"
      }
    }
  }'

Threshold values:

  • "block" -- reject the request and return a 400 error
  • "flag" -- allow the request but log a warning to the audit trail
  • "off" -- disable the filter for this category

Audit trail

Every guardrail activation is recorded in the audit trail with:

  • Timestamp -- when the check ran
  • Type -- which guardrail was triggered (PII, injection, safety)
  • Action -- what was done (flagged, redacted, blocked)
  • Details -- specifics of what was detected (without exposing the sensitive content)

View guardrail events in the Audit Trail dashboard or query via the API.

EU AI Act compliance

Guardrails are designed to support EU AI Act Article 14 requirements:

  • Transparency -- all guardrail actions are logged and auditable
  • Human oversight -- flagged events can be reviewed by humans
  • Risk management -- automated detection of high-risk content
  • Documentation -- full audit trail of all safety measures applied

Configuration

Guardrails are enabled by default on all inference requests. They run transparently without requiring changes to your API calls.

Relationship to receipts

Guardrail events are linked to cryptographic receipts. Each receipt records whether any guardrails were triggered during the job, creating an end-to-end audit trail from request to response.