Skip to content

OpenAI-compatible inference

Run open-source AI on a verified GPU network.

Drop-in replacement for the OpenAI API. Your requests run on distributed GPUs with signed receipts proving exactly what model ran, on which node, with what result.

Python — two lines to switch from OpenAI

from openai import OpenAI

client = OpenAI(
    base_url="https://ryvion-hub.fly.dev/v1",
    api_key="YOUR_RYVION_API_KEY",
)

response = client.chat.completions.create(
    model="phi-4",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
)

for chunk in response:
    print(chunk.choices[0].delta.content, end="")

Built for real infrastructure work

OpenAI-compatible API
Signed inference receipts
Distributed GPU network
Per-token pricing

The problem

You have no proof of what ran your request

Every inference API is a black box. You send a prompt, you get a response. You trust that the model you asked for is the model that ran. Ryvion changes that.

01

No verification

Other APIs say they ran GPT-4 or Llama 70B. You have no way to prove it. For compliance-sensitive workloads, this is unacceptable.

02

Centralized risk

One provider, one data center, one point of failure. If they go down, change pricing, or change terms — you have no alternative.

03

No data sovereignty

Your prompts go wherever the provider decides. No control over which country processes your data.

How it works

Three steps to verified inference

Same API you already use. New guarantees you can't get anywhere else.

Step 1

Swap your base URL

Point the OpenAI SDK at Ryvion. One line change. Your existing code, prompts, and tools keep working.

Step 2

Run inference on distributed GPUs

Your requests route to real GPU nodes across the network. Streaming and non-streaming, chat and embeddings.

Step 3

Get a signed receipt for every request

Every inference run produces a cryptographic receipt — proof of which model ran, on which node, with what result. Audit-ready by default.

Available models

Chat, images, audio, and embeddings

Seven models across four modalities. Only models that online nodes can actually serve.

ModelTypeSpeed
phi-4ChatFast
ryvion-llama-3.2-3bChatVery fast
tinyllamaChatInstant
sdxl-turboImage~4s / image
stable-diffusion-xlImage~15s / image
whisper-1AudioReal-time
whisper-largeAudioNear real-time

Chat models via llama.cpp, images via Stable Diffusion, audio via Whisper — all on real GPU nodes. More models added as operators bring hardware online.

Verified inference

Every request produces a signed receipt

Not a log. A cryptographic proof tied to the node that ran your inference.

curl — try it now

curl https://ryvion-hub.fly.dev/v1/chat/completions \
  -H "Authorization: Bearer YOUR_RYVION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "phi-4",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

What the receipt proves

Which model processed the request
Which physical GPU node ran it
Ed25519 signature from the node
Timestamp and processing duration
Input/output token counts

This is what makes Ryvion different from every other inference API. Verifiable compute, not blind trust.

Why Ryvion

Not another inference proxy

Other APIs resell GPU time. Ryvion is a verified compute network where every response is provably real.

Dimension

Integration

Ryvion

OpenAI-compatible. Change base_url in one line, keep your existing code.

Traditional API

Custom SDKs, proprietary formats, migration effort for each provider.

Dimension

Verification

Ryvion

Signed receipts per inference — cryptographic proof of model, node, and result.

Traditional API

Trust the provider. No proof of what model actually ran your request.

Dimension

Infrastructure

Ryvion

Distributed GPU network. No single point of failure. Operators compete on price.

Traditional API

Centralized data centers. Provider sets the price. You hope they stay online.

Dimension

Privacy

Ryvion

Jurisdiction-aware routing. Your data stays where you specify.

Traditional API

Your prompts go wherever the provider decides. No control.

Dimension

Pricing

Ryvion

Per-token, transparent. Network competition drives prices down over time.

Traditional API

Per-token but opaque margins. Prices set by one company.

Developers

Use the OpenAI SDK with one line changed. Get verified inference with signed receipts, per-token pricing, and scoped API keys.

Node operators

Contribute GPU capacity to the network. Open-source node agent (Apache 2.0). Earn from every verified inference your node completes.

FAQ

Straight answers

Common questions about Ryvion.

How is Ryvion different from OpenAI or other inference APIs?

Ryvion is OpenAI-compatible — same API format, same SDKs. The difference: your requests run on a distributed GPU network with signed receipts proving exactly what model ran on which node. No other API gives you that level of verifiability.

How do I switch from OpenAI to Ryvion?

Change your base URL to https://ryvion-hub.fly.dev/v1 and use your Ryvion API key. That's it. Your existing code, prompts, and tools keep working.

What models are available?

Chat: Phi-4 (14B), Llama 3.2 3B, TinyLlama (1.1B). Images: SDXL Turbo, Stable Diffusion XL. Audio: Whisper base and large-v3. Embeddings: all-MiniLM-L6-v2. More models are added as operators bring hardware online.

What are signed receipts?

Every inference request produces a cryptographic receipt — proof that a specific model ran on a specific node and produced a specific result. This matters for compliance, audit trails, and trust.

How does pricing work?

Chat and embeddings: $0.06 per million tokens. Image generation: $0.03 per image. Audio transcription: $0.006 per minute. Every new account gets $10 in free credits — no credit card required. Additional credits via Stripe. Failed jobs are automatically refunded.