MacawMacaw OpenVoice
Getting Started

Quickstart

Get from zero to your first transcription in under 5 minutes.


Step 1: Install

pip install macaw-openvoice[server,grpc,faster-whisper]

If you plan to use TTS as well, add the kokoro extra:

pip install macaw-openvoice[server,grpc,faster-whisper,kokoro]

Step 2: Start the Server

macaw serve

You should see output like this:

  ╔═══════════════════════════════════════╗
  ║       Macaw OpenVoice v1.0.0         ║
  ╚═══════════════════════════════════════╝

INFO  Scanning models in ~/.macaw/models
INFO  Found 2 model(s): faster-whisper-tiny (STT), kokoro-v1 (TTS)
INFO  Spawning STT worker   port=50051  engine=faster-whisper
INFO  Spawning TTS worker   port=50052  engine=kokoro
INFO  Scheduler started     aging=30.0s  batch_ms=75.0  batch_max=8
INFO  Uvicorn running on http://127.0.0.1:8000

The server is now ready to accept requests on port 8000.


Step 3: Transcribe Audio

Via REST API (curl)

curl -X POST http://localhost:8000/v1/audio/transcriptions \
  -F file=@audio.wav \
  -F model=faster-whisper-large-v3

Response:

{
  "text": "Hello, how can I help you today?"
}

Via CLI

macaw transcribe audio.wav --model faster-whisper-large-v3

Via OpenAI Python SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-needed")

result = client.audio.transcriptions.create(
    model="faster-whisper-large-v3",
    file=open("audio.wav", "rb"),
)
print(result.text)

Macaw implements the OpenAI Audio API contract, so any OpenAI-compatible client library works without modification. Just change the base_url.


Step 4: Try Real-Time Streaming

Connect via WebSocket for live transcription:

wscat -c "ws://localhost:8000/v1/realtime?model=faster-whisper-large-v3"

Send binary PCM audio frames and receive JSON transcript events:

{"type": "transcript.partial", "text": "Hello how"}
{"type": "transcript.final", "text": "Hello, how can I help you today?"}

See the Streaming STT guide for the full protocol.


Step 5: Text-to-Speech

Generate speech from text:

curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"model": "kokoro-v1", "input": "Hello, welcome to Macaw!", "voice": "default"}' \
  --output speech.wav

Or use the OpenAI SDK:

response = client.audio.speech.create(
    model="kokoro-v1",
    input="Hello, welcome to Macaw!",
    voice="default",
)
response.stream_to_file("output.wav")

What's Next?

Want to...Read...
Stream audio in real timeStreaming STT
Build a voice assistant with STT + TTSFull-Duplex Guide
Transcribe files in batchBatch Transcription
Understand the architectureArchitecture Overview
Add a new engineAdding an Engine
Use the CLICLI Reference