Skip to content

Rate Limits

Ryvion uses a pay-per-use model with spending velocity controls rather than hard request rate limits.

Spending velocity

New accounts are limited to $5 CAD per hour in spending. This prevents runaway costs from misconfigured clients or compromised API keys. The limit applies on a rolling hourly window.

As your account history builds, the velocity limit increases automatically. Contact support to raise it immediately if needed.

No hard rate limits

There are no hard rate limits on the number of requests per second. You pay for what you use. The network scales horizontally -- more requests are distributed across available GPU nodes.

Request timeout

All requests have a 30-second timeout. If the hub does not receive a response from the executing node within 30 seconds, the request returns a 504 Gateway Timeout.

For long-running workloads (fine-tuning, batch processing), use the async job API instead of synchronous requests.

Retry behavior

The Ryvion API client retries GET requests only on transient errors:

Status codeMeaningRetried
429Spending velocity exceededYes (GET only)
502Bad gatewayYes (GET only)
503Service unavailableYes (GET only)
504Gateway timeoutYes (GET only)

POST, PUT, and DELETE requests are never retried. This prevents duplicate resource creation -- for example, creating two jobs or two API keys from a single user action.

Retries use exponential backoff: 1s, 2s, 4s (up to 3 attempts).

Handling 429 responses

If you receive a 429 response, your account has hit the spending velocity limit:

{
  "error": {
    "message": "Spending velocity limit exceeded. Current limit: $5.00 CAD/hour.",
    "type": "rate_limit_error",
    "code": "spending_velocity_exceeded"
  }
}

Options:

  1. Wait -- the limit resets on a rolling hourly window
  2. Reduce request volume -- batch requests or reduce prompt sizes
  3. Request a limit increase -- contact support with your use case

Best practices

  • Implement exponential backoff for GET retries in your client
  • Never retry POST, PUT, or DELETE requests automatically
  • Monitor your spending in the billing dashboard
  • Use streaming for chat completions to get partial results faster
  • Set max_tokens to avoid unexpectedly long (and expensive) completions
  • Cache responses when the same query will be repeated