Skip to content

Chat Completions

OpenAI-compatible chat inference with streaming support. Works with any OpenAI SDK.

Endpoint

POST /v1/chat/completions

Authentication

Include your API key as a Bearer token or X-API-Key header. See Authentication.

Request body

ParameterTypeRequiredDescription
modelstringYesModel ID: phi-4, ryvion-llama-3.2-3b, or tinyllama
messagesarrayYesArray of message objects with role and content
streambooleanNoEnable SSE streaming (default: false)
temperaturenumberNoSampling temperature (0-2, default: 0.7)
max_tokensnumberNoMaximum tokens to generate
knowledge_basestringNoKnowledge base ID for RAG-powered completions
jurisdictionstringNoRoute to a specific jurisdiction (e.g., CA, DE)

Message object

FieldTypeDescription
rolestringOne of system, user, or assistant
contentstringThe message content

Non-streaming request

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.ryvion.ai/v1",
  apiKey: "YOUR_KEY",
});

const res = await client.chat.completions.create({
  model: "phi-4",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0].message.content);

curl

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"phi-4","messages":[{"role":"user","content":"Hello"}]}'

Streaming request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

stream = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Streaming uses Server-Sent Events (SSE). Each chunk includes delta content. The stream also indicates which node picked up the request and its GPU and location.

Response format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "phi-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 9,
    "total_tokens": 14
  }
}

RAG-powered completions

Attach a knowledge base to ground responses in your documents:

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

See RAG-Powered Chat for details.

Available models

ModelDescriptionStreaming
phi-4Microsoft Phi-4 14BYes
ryvion-llama-3.2-3bLlama 3.2 3BYes
tinyllamaBitNet 1-bit, CPU-onlyYes

Cryptographic receipts

Every completion generates a receipt. The current RYV1 signature covers the core execution fields job_id, node_public_key, result_hash, and metering_units. Model, jurisdiction, and any retrieved chunk provenance are attached separately as audit metadata. See Receipts.