RAG-Powered Chat

Attach a knowledge base to chat completions to ground LLM responses in your documents. The system automatically retrieves relevant chunks and includes them as context.

How it works

You send a chat completion request with a knowledge_base parameter
The hub searches your knowledge base for chunks relevant to the user's message
Retrieved chunks are injected into the prompt as context
The LLM generates a response grounded in your documents

Example

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

Python

from openai import OpenAI
import requests

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

# Use the extra_body parameter for knowledge_base
response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Summarize our deployment docs"}],
    extra_body={"knowledge_base": "KB_ID"},
)
print(response.choices[0].message.content)

curl

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

Parameters

The knowledge_base parameter is added to the standard Chat Completions request body.

Parameter	Type	Required	Description
`knowledge_base`	string	Yes	ID of the knowledge base to search

All other chat completion parameters (model, messages, stream, temperature, etc.) work as normal.

When to use RAG vs. semantic search

Approach	Use case
RAG-powered chat	You want the LLM to synthesize an answer from your documents
Semantic search	You want to retrieve raw document chunks without LLM processing

For semantic search only, see Semantic Search.

Pricing

RAG-powered chat combines two costs:

Search: $0.01 CAD per query (automatic retrieval step)
Chat completion: $0.06 CAD per 1M tokens (including retrieved context tokens)

Prerequisites

Create a knowledge base
Upload at least one document
Wait for embedding to complete
Use the knowledge base ID in your chat completion request