Rate Limits

Ryvion uses a pay-per-use model with spending velocity controls rather than hard request rate limits.

Spending velocity

New accounts are limited to $5 CAD per hour in spending. This prevents runaway costs from misconfigured clients or compromised API keys. The limit applies on a rolling hourly window.

As your account history builds, the velocity limit increases automatically. Contact support to raise it immediately if needed.

No hard rate limits

There are no hard rate limits on the number of requests per second. You pay for what you use. The network scales horizontally -- more requests are distributed across available GPU nodes.

Request timeout

All requests have a 30-second timeout. If the hub does not receive a response from the executing node within 30 seconds, the request returns a 504 Gateway Timeout.

For long-running workloads (fine-tuning, batch processing), use the async job API instead of synchronous requests.

Retry behavior

The Ryvion API client retries GET requests only on transient errors:

Status code	Meaning	Retried
`429`	Spending velocity exceeded	Yes (GET only)
`502`	Bad gateway	Yes (GET only)
`503`	Service unavailable	Yes (GET only)
`504`	Gateway timeout	Yes (GET only)

POST, PUT, and DELETE requests are never retried. This prevents duplicate resource creation -- for example, creating two jobs or two API keys from a single user action.

Retries use exponential backoff: 1s, 2s, 4s (up to 3 attempts).

Handling 429 responses

If you receive a 429 response, your account has hit the spending velocity limit:

{
  "error": {
    "message": "Spending velocity limit exceeded. Current limit: $5.00 CAD/hour.",
    "type": "rate_limit_error",
    "code": "spending_velocity_exceeded"
  }
}

Options:

Wait -- the limit resets on a rolling hourly window
Reduce request volume -- batch requests or reduce prompt sizes
Request a limit increase -- contact support with your use case

Best practices

Implement exponential backoff for GET retries in your client
Never retry POST, PUT, or DELETE requests automatically
Monitor your spending in the billing dashboard
Use streaming for chat completions to get partial results faster
Set max_tokens to avoid unexpectedly long (and expensive) completions
Cache responses when the same query will be repeated