Skip to content

Audio Transcription

Transcribe audio files using OpenAI Whisper models on distributed GPU nodes. Same API format as OpenAI.

Endpoint

POST /v1/audio/transcriptions

Authentication

Include your API key as a Bearer token or X-API-Key header. See Authentication.

Request body (multipart/form-data)

ParameterTypeRequiredDescription
filefileYesAudio file to transcribe
modelstringYesModel ID: whisper-base
languagestringNoLanguage code (e.g., en). Auto-detected if omitted.
response_formatstringNoOutput format: json, text, verbose_json

Supported audio formats

  • mp3
  • wav
  • m4a
  • flac
  • webm

Example

Python

from openai import OpenAI

client = OpenAI(
    base_url="https://api.ryvion.ai/v1",
    api_key="YOUR_KEY",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-base",
        file=audio_file,
    )
print(transcript.text)

curl

curl -X POST https://api.ryvion.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@recording.mp3" \
  -F "model=whisper-base"

Response format

{
  "text": "Hello, this is a transcription of the audio file."
}

With response_format=verbose_json:

{
  "text": "Hello, this is a transcription of the audio file.",
  "language": "en",
  "duration": 12.5,
  "segments": [
    {
      "start": 0.0,
      "end": 3.2,
      "text": "Hello, this is a transcription"
    },
    {
      "start": 3.2,
      "end": 5.1,
      "text": "of the audio file."
    }
  ]
}

Available models

ModelDescription
whisper-baseOpenAI Whisper base model

Pricing

$0.006 CAD per minute of audio.

Features

  • Automatic language detection
  • Timestamp-level segmentation (with verbose_json)
  • Supports multiple audio formats
  • Each transcription produces a cryptographic receipt