Skip to main content

REST API Reference

Macaw implements the OpenAI Audio API contract. Existing OpenAI client libraries work without modification -- just change the base_url.


Endpoints Overview

MethodPathDescription
POST/v1/audio/transcriptionsTranscribe audio to text
POST/v1/audio/translationsTranslate audio to English
POST/v1/audio/speechGenerate speech from text
GET/healthHealth check

POST /v1/audio/transcriptions

Transcribe an audio file into text.

Request

FieldTypeRequiredDescription
filefileYesAudio file (WAV, MP3, FLAC, OGG, WebM)
modelstringYesModel ID (e.g., faster-whisper-large-v3)
languagestringNoISO 639-1 language code
promptstringNoContext or hot words for the model
response_formatstringNojson (default), text, srt, vtt, verbose_json
temperaturefloatNoSampling temperature (0.0 - 1.0)

Examples

Basic transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav \
-F model=faster-whisper-large-v3
With language and format options
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav \
-F model=faster-whisper-large-v3 \
-F language=en \
-F response_format=verbose_json
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

result = client.audio.transcriptions.create(
model="faster-whisper-large-v3",
file=open("audio.wav", "rb"),
language="en",
response_format="verbose_json",
)
print(result.text)

Response

json format (default)
{
"text": "Hello, how can I help you today?"
}
verbose_json format
{
"task": "transcribe",
"language": "en",
"duration": 3.42,
"text": "Hello, how can I help you today?",
"segments": [
{
"id": 0,
"start": 0.0,
"end": 3.42,
"text": "Hello, how can I help you today?"
}
]
}

POST /v1/audio/translations

Translate audio from any supported language to English.

Request

FieldTypeRequiredDescription
filefileYesAudio file
modelstringYesModel ID
promptstringNoContext for the model
response_formatstringNoSame options as transcriptions
temperaturefloatNoSampling temperature

Example

curl -X POST http://localhost:8000/v1/audio/translations \
-F file=@audio_portuguese.wav \
-F model=faster-whisper-large-v3

Response

{
"text": "Hello, how can I help you today?"
}
info

Translation always outputs English text, regardless of the source language.


POST /v1/audio/speech

Generate speech audio from text.

Request

FieldTypeRequiredDescription
modelstringYesTTS model ID (e.g., kokoro-v1)
inputstringYesText to synthesize
voicestringYesVoice identifier (e.g., default)
response_formatstringNowav (default) or pcm

Examples

Generate WAV file
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "kokoro-v1", "input": "Hello, welcome to Macaw!", "voice": "default"}' \
--output speech.wav
Generate raw PCM
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"model": "kokoro-v1", "input": "Hello!", "voice": "default", "response_format": "pcm"}' \
--output speech.pcm
Python (OpenAI SDK)
response = client.audio.speech.create(
model="kokoro-v1",
input="Hello, welcome to Macaw!",
voice="default",
)
response.stream_to_file("output.wav")

Response

The response body is the audio file in the requested format.

FormatContent-TypeDescription
wavaudio/wavWAV with headers (default)
pcmaudio/pcmRaw PCM 16-bit, 16kHz, mono

GET /health

Returns the runtime health status.

curl http://localhost:8000/health
{
"status": "ok"
}

Error Responses

All endpoints return standard HTTP error codes with a JSON body:

{
"error": {
"message": "Model 'nonexistent' not found",
"type": "model_not_found",
"code": 404
}
}
StatusMeaning
400Invalid request (missing fields, bad format)
404Model not found
422Validation error
500Internal server error
503Worker unavailable