Skip to content

RAG-Powered Chat

Attach a knowledge base to chat completions to ground LLM responses in your documents. The system automatically retrieves relevant chunks and includes them as context.

How it works

  1. You send a chat completion request with a knowledge_base parameter
  2. The hub searches your knowledge base for chunks relevant to the user's message
  3. Retrieved chunks are injected into the prompt as context
  4. The LLM generates a response grounded in your documents

Example

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

Python

from openai import OpenAI
import requests

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

# Use the extra_body parameter for knowledge_base
response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Summarize our deployment docs"}],
    extra_body={"knowledge_base": "KB_ID"},
)
print(response.choices[0].message.content)

curl

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

Parameters

The knowledge_base parameter is added to the standard Chat Completions request body.

ParameterTypeRequiredDescription
knowledge_basestringYesID of the knowledge base to search

All other chat completion parameters (model, messages, stream, temperature, etc.) work as normal.

When to use RAG vs. semantic search

ApproachUse case
RAG-powered chatYou want the LLM to synthesize an answer from your documents
Semantic searchYou want to retrieve raw document chunks without LLM processing

For semantic search only, see Semantic Search.

Pricing

RAG-powered chat combines two costs:

  • Search: $0.01 CAD per query (automatic retrieval step)
  • Chat completion: $0.06 CAD per 1M tokens (including retrieved context tokens)

Prerequisites

  1. Create a knowledge base
  2. Upload at least one document
  3. Wait for embedding to complete
  4. Use the knowledge base ID in your chat completion request