MacawMacaw OpenVoice

REST API

Macaw implements the OpenAI Audio API contract. Existing OpenAI client libraries work without modification -- just change the base_url.


Endpoints Overview

MethodPathDescription
POST/v1/audio/transcriptionsTranscribe audio to text
POST/v1/audio/translationsTranslate audio to English
POST/v1/audio/speechGenerate speech from text
GET/healthHealth check

POST /v1/audio/transcriptions

Transcribe an audio file into text.

Request

FieldTypeRequiredDescription
filefileYesAudio file (WAV, MP3, FLAC, OGG, WebM)
modelstringYesModel ID (e.g., faster-whisper-large-v3)
languagestringNoISO 639-1 language code
promptstringNoContext or hot words for the model
response_formatstringNojson (default), text, srt, vtt, verbose_json
temperaturefloatNoSampling temperature (0.0 - 1.0)

Examples

Basic transcription
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=faster-whisper-large-v3
With language and format options
curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=faster-whisper-large-v3 \
  -F language=en \
  -F response_format=verbose_json
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

result = client.audio.transcriptions.create(
    model="faster-whisper-large-v3",
    file=open("audio.wav", "rb"),
    language="en",
    response_format="verbose_json",
)
print(result.text)

Response

json format (default)
{
  "text": "Hello, how can I help you today?"
}
verbose_json format
{
  "task": "transcribe",
  "language": "en",
  "duration": 3.42,
  "text": "Hello, how can I help you today?",
  "segments": [
    {
      "id": 0,
      "start": 0.0,
      "end": 3.42,
      "text": "Hello, how can I help you today?"
    }
  ]
}

POST /v1/audio/translations

Translate audio from any supported language to English.

Request

FieldTypeRequiredDescription
filefileYesAudio file
modelstringYesModel ID
promptstringNoContext for the model
response_formatstringNoSame options as transcriptions
temperaturefloatNoSampling temperature

Example

curl -X POST http://localhost:8000/v1/audio/translations \
  -F file=@audio_portuguese.wav \
  -F model=faster-whisper-large-v3

Response

{
  "text": "Hello, how can I help you today?"
}

Translation always outputs English text, regardless of the source language.


POST /v1/audio/speech

Generate speech audio from text.

Request

FieldTypeRequiredDescription
modelstringYesTTS model ID (e.g., kokoro-v1)
inputstringYesText to synthesize
voicestringYesVoice identifier (e.g., default)
response_formatstringNowav (default) or pcm

Examples

Generate WAV file
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model": "kokoro-v1", "input": "Hello, welcome to Macaw!", "voice": "default"}' \
  --output speech.wav
Generate raw PCM
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model": "kokoro-v1", "input": "Hello!", "voice": "default", "response_format": "pcm"}' \
  --output speech.pcm
Python (OpenAI SDK)
response = client.audio.speech.create(
    model="kokoro-v1",
    input="Hello, welcome to Macaw!",
    voice="default",
)
response.stream_to_file("output.wav")

Response

The response body is the audio file in the requested format.

FormatContent-TypeDescription
wavaudio/wavWAV with headers (default)
pcmaudio/pcmRaw PCM 16-bit, 16kHz, mono

GET /health

Returns the runtime health status.

curl http://localhost:8000/health
{
  "status": "ok"
}

Error Responses

All endpoints return standard HTTP error codes with a JSON body:

{
  "error": {
    "message": "Model 'nonexistent' not found",
    "type": "model_not_found",
    "code": 404
  }
}
StatusMeaning
400Invalid request (missing fields, bad format)
404Model not found
422Validation error
500Internal server error
503Worker unavailable