OpenAI-compatible inference

Run open-source AI on a verified GPU network.

Drop-in replacement for the OpenAI API. Your requests run on distributed GPUs with signed receipts proving exactly what model ran, on which node, with what result.

Get API key View docs

Python — two lines to switch from OpenAI

from openai import OpenAI

client = OpenAI(
    base_url="https://ryvion-hub.fly.dev/v1",
    api_key="YOUR_RYVION_API_KEY",
)

response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Built for real infrastructure work

OpenAI-compatible API

Signed inference receipts

Distributed GPU network

Per-token pricing

The problem

You have no proof of what ran your request

Every inference API is a black box. You send a prompt, you get a response. You trust that the model you asked for is the model that ran. Ryvion changes that.

No verification

Other APIs say they ran GPT-4 or Llama 70B. You have no way to prove it. For compliance-sensitive workloads, this is unacceptable.

Centralized risk

One provider, one data center, one point of failure. If they go down, change pricing, or change terms — you have no alternative.

No data sovereignty

Your prompts go wherever the provider decides. No control over which country processes your data.

How it works

Three steps to verified inference

Same API you already use. New guarantees you can't get anywhere else.

Step 1

Swap your base URL

Point the OpenAI SDK at Ryvion. One line change. Your existing code, prompts, and tools keep working.

Step 2

Run inference on distributed GPUs

Your requests route to real GPU nodes across the network. Streaming and non-streaming, chat and embeddings.

Step 3

Get a signed receipt for every request

Every inference run produces a cryptographic receipt — proof of which model ran, on which node, with what result. Audit-ready by default.

Available models

Chat, images, audio, and embeddings

Seven models across four modalities. Only models that online nodes can actually serve.

Model	Type	Parameters	Speed
phi-4	Chat	14B	Fast
ryvion-llama-3.2-3b	Chat	3B	Very fast
tinyllama	Chat	1.1B	Instant
sdxl-turbo	Image	—	~4s / image
stable-diffusion-xl	Image	—	~15s / image
whisper-1	Audio	base	Real-time
whisper-large	Audio	large-v3	Near real-time

Chat models via llama.cpp, images via Stable Diffusion, audio via Whisper — all on real GPU nodes. More models added as operators bring hardware online.

Verified inference

Every request produces a signed receipt

Not a log. A cryptographic proof tied to the node that ran your inference.

curl — try it now

curl https://ryvion-hub.fly.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_RYVION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

What the receipt proves

Which model processed the request

Which physical GPU node ran it

Ed25519 signature from the node

Timestamp and processing duration

Input/output token counts

Ryvion

Per-token, transparent. Network competition drives prices down over time.

Traditional API

Per-token but opaque margins. Prices set by one company.

Dimension	Ryvion	Traditional API
Integration	OpenAI-compatible. Change base_url in one line, keep your existing code.	Custom SDKs, proprietary formats, migration effort for each provider.
Verification	Signed receipts per inference — cryptographic proof of model, node, and result.	Trust the provider. No proof of what model actually ran your request.
Infrastructure	Distributed GPU network. No single point of failure. Operators compete on price.	Centralized data centers. Provider sets the price. You hope they stay online.
Privacy	Jurisdiction-aware routing. Your data stays where you specify.	Your prompts go wherever the provider decides. No control.
Pricing	Per-token, transparent. Network competition drives prices down over time.	Per-token but opaque margins. Prices set by one company.

Developers

Use the OpenAI SDK with one line changed. Get verified inference with signed receipts, per-token pricing, and scoped API keys.

Get API key View docs

Node operators

Contribute GPU capacity to the network. Open-source node agent (Apache 2.0). Earn from every verified inference your node completes.

Operator guide GitHub

FAQ

Straight answers

Common questions about Ryvion.

How is Ryvion different from OpenAI or other inference APIs?

Ryvion is OpenAI-compatible — same API format, same SDKs. The difference: your requests run on a distributed GPU network with signed receipts proving exactly what model ran on which node. No other API gives you that level of verifiability.

How do I switch from OpenAI to Ryvion?

Change your base URL to https://ryvion-hub.fly.dev/v1 and use your Ryvion API key. That's it. Your existing code, prompts, and tools keep working.

What models are available?

Chat: Phi-4 (14B), Llama 3.2 3B, TinyLlama (1.1B). Images: SDXL Turbo, Stable Diffusion XL. Audio: Whisper base and large-v3. Embeddings: all-MiniLM-L6-v2. More models are added as operators bring hardware online.

What are signed receipts?

Every inference request produces a cryptographic receipt — proof that a specific model ran on a specific node and produced a specific result. This matters for compliance, audit trails, and trust.

How does pricing work?

Chat and embeddings: $0.06 per million tokens. Image generation: $0.03 per image. Audio transcription: $0.006 per minute. Every new account gets $10 in free credits — no credit card required. Additional credits via Stripe. Failed jobs are automatically refunded.