RAG-Powered Chat
Attach a knowledge base to chat completions to ground LLM responses in your documents. The system automatically retrieves relevant chunks and includes them as context.
How it works
- You send a chat completion request with a
knowledge_baseparameter - The hub searches your knowledge base for chunks relevant to the user's message
- Retrieved chunks are injected into the prompt as context
- The LLM generates a response grounded in your documents
Example
curl -X POST https://api.ryvion.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "phi-4",
"messages": [{"role":"user","content":"Summarize our deployment docs"}],
"knowledge_base": "KB_ID"
}'
Python
from openai import OpenAI
import requests
client = OpenAI(
base_url="https://api.ryvion.ai/v1",
api_key="YOUR_KEY",
)
# Use the extra_body parameter for knowledge_base
response = client.chat.completions.create(
model="phi-4",
messages=[{"role": "user", "content": "Summarize our deployment docs"}],
extra_body={"knowledge_base": "KB_ID"},
)
print(response.choices[0].message.content)
curl
curl -X POST https://api.ryvion.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "phi-4",
"messages": [{"role":"user","content":"Summarize our deployment docs"}],
"knowledge_base": "KB_ID"
}'
Parameters
The knowledge_base parameter is added to the standard Chat Completions request body.
| Parameter | Type | Required | Description |
|---|---|---|---|
knowledge_base | string | Yes | ID of the knowledge base to search |
All other chat completion parameters (model, messages, stream, temperature, etc.) work as normal.
When to use RAG vs. semantic search
| Approach | Use case |
|---|---|
| RAG-powered chat | You want the LLM to synthesize an answer from your documents |
| Semantic search | You want to retrieve raw document chunks without LLM processing |
For semantic search only, see Semantic Search.
Pricing
RAG-powered chat combines two costs:
- Search: $0.01 CAD per query (automatic retrieval step)
- Chat completion: $0.06 CAD per 1M tokens (including retrieved context tokens)
Prerequisites
- Create a knowledge base
- Upload at least one document
- Wait for embedding to complete
- Use the knowledge base ID in your chat completion request