Chat Completions
OpenAI-compatible chat inference with streaming support. Works with any OpenAI SDK.
Endpoint
POST /v1/chat/completions
Authentication
Include your API key as a Bearer token or X-API-Key header. See Authentication.
Request body
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID: phi-4, ryvion-llama-3.2-3b, or tinyllama |
messages | array | Yes | Array of message objects with role and content |
stream | boolean | No | Enable SSE streaming (default: false) |
temperature | number | No | Sampling temperature (0-2, default: 0.7) |
max_tokens | number | No | Maximum tokens to generate |
knowledge_base | string | No | Knowledge base ID for RAG-powered completions |
jurisdiction | string | No | Route to a specific jurisdiction (e.g., CA, DE) |
Message object
| Field | Type | Description |
|---|---|---|
role | string | One of system, user, or assistant |
content | string | The message content |
Non-streaming request
Python
from openai import OpenAI
client = OpenAI(
base_url="https://api.ryvion.ai/v1",
api_key="YOUR_KEY",
)
response = client.chat.completions.create(
model="phi-4",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
Node.js
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.ryvion.ai/v1",
apiKey: "YOUR_KEY",
});
const res = await client.chat.completions.create({
model: "phi-4",
messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0].message.content);
curl
curl -X POST https://api.ryvion.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"phi-4","messages":[{"role":"user","content":"Hello"}]}'
Streaming request
from openai import OpenAI
client = OpenAI(
base_url="https://api.ryvion.ai/v1",
api_key="YOUR_KEY",
)
stream = client.chat.completions.create(
model="phi-4",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
Streaming uses Server-Sent Events (SSE). Each chunk includes delta content. The stream also indicates which node picked up the request and its GPU and location.
Response format
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1712345678,
"model": "phi-4",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 5,
"completion_tokens": 9,
"total_tokens": 14
}
}
RAG-powered completions
Attach a knowledge base to ground responses in your documents:
curl -X POST https://api.ryvion.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "phi-4",
"messages": [{"role":"user","content":"Summarize our deployment docs"}],
"knowledge_base": "KB_ID"
}'
See RAG-Powered Chat for details.
Available models
| Model | Description | Streaming |
|---|---|---|
phi-4 | Microsoft Phi-4 14B | Yes |
ryvion-llama-3.2-3b | Llama 3.2 3B | Yes |
tinyllama | BitNet 1-bit, CPU-only | Yes |
Cryptographic receipts
Every completion generates a receipt. The current RYV1 signature covers the core execution fields job_id, node_public_key, result_hash, and metering_units. Model, jurisdiction, and any retrieved chunk provenance are attached separately as audit metadata. See Receipts.