Audio Transcription

Transcribe audio files using OpenAI Whisper models on distributed GPU nodes. Same API format as OpenAI.

Endpoint

POST /v1/audio/transcriptions

Authentication

Include your API key as a Bearer token or X-API-Key header. See Authentication.

Request body (multipart/form-data)

Parameter	Type	Required	Description
`file`	file	Yes	Audio file to transcribe
`model`	string	Yes	Model ID: `whisper-base`
`language`	string	No	Language code (e.g., `en`). Auto-detected if omitted.
`response_format`	string	No	Output format: `json`, `text`, `verbose_json`

Supported audio formats

mp3
wav
m4a
flac
webm

Example

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-base",
        file=audio_file,
    )
print(transcript.text)

curl

curl -X POST https://api.ryvion.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@recording.mp3" \
  -F "model=whisper-base"

Response format

{
  "text": "Hello, this is a transcription of the audio file."
}

With response_format=verbose_json:

{
  "text": "Hello, this is a transcription of the audio file.",
  "language": "en",
  "duration": 12.5,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "Hello, this is a transcription"
    },
    {
      "start": 3.2,
      "end": 5.1,
      "text": "of the audio file."
    }
  ]
}

Available models

Model	Description
`whisper-base`	OpenAI Whisper base model

Pricing

$0.006 CAD per minute of audio.

Features

Automatic language detection
Timestamp-level segmentation (with verbose_json)
Supports multiple audio formats
Each transcription produces a cryptographic receipt