Chat Completions

OpenAI-compatible chat inference with streaming support. Works with any OpenAI SDK.

Endpoint

POST /v1/chat/completions

Authentication

Include your API key as a Bearer token or X-API-Key header. See Authentication.

Request body

Parameter	Type	Required	Description
`model`	string	Yes	Model ID: `phi-4`, `ryvion-llama-3.2-3b`, or `tinyllama`
`messages`	array	Yes	Array of message objects with `role` and `content`
`stream`	boolean	No	Enable SSE streaming (default: `false`)
`temperature`	number	No	Sampling temperature (0-2, default: 0.7)
`max_tokens`	number	No	Maximum tokens to generate
`knowledge_base`	string	No	Knowledge base ID for RAG-powered completions
`jurisdiction`	string	No	Route to a specific jurisdiction (e.g., `CA`, `DE`)

Message object

Field	Type	Description
`role`	string	One of `system`, `user`, or `assistant`
`content`	string	The message content

Non-streaming request

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)

Node.js

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.ryvion.ai/v1",
  apiKey: "YOUR_KEY",
});

const res = await client.chat.completions.create({
  model: "phi-4",
  messages: [{ role: "user", content: "Hello" }],
});
console.log(res.choices[0].message.content);

curl

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"phi-4","messages":[{"role":"user","content":"Hello"}]}'

Streaming request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

stream = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)

for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Streaming uses Server-Sent Events (SSE). Each chunk includes delta content. The stream also indicates which node picked up the request and its GPU and location.

Response format

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1712345678,
  "model": "phi-4",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Hello! How can I help you today?"
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 5,
    "completion_tokens": 9,
    "total_tokens": 14
  }
}

RAG-powered completions

Attach a knowledge base to ground responses in your documents:

curl -X POST https://api.ryvion.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role":"user","content":"Summarize our deployment docs"}],
    "knowledge_base": "KB_ID"
  }'

See RAG-Powered Chat for details.

Available models

Model	Description	Streaming
`phi-4`	Microsoft Phi-4 14B	Yes
`ryvion-llama-3.2-3b`	Llama 3.2 3B	Yes
`tinyllama`	BitNet 1-bit, CPU-only	Yes

Cryptographic receipts

Every completion generates a receipt. The current RYV1 signature covers the core execution fields job_id, node_public_key, result_hash, and metering_units. Model, jurisdiction, and any retrieved chunk provenance are attached separately as audit metadata. See Receipts.