Skip to main content

Open-Source Voice Runtime

Build voice apps in minutes, not months

Macaw OpenVoice is a production-ready runtime for real-time speech-to-text and text-to-speech. Drop-in OpenAI API compatibility, streaming WebSocket support, and multi-engine architecture — all in a single Python process.

Python 3.11+Apache 2.01600+ tests
terminal
$ pip install macaw-openvoice[server,grpc,faster-whisper]

$ macaw serve
  ╔═══════════════════════════════════════╗
  ║       Macaw OpenVoice v1.0.0         ║
  ╚═══════════════════════════════════════╝
  INFO  Found 2 model(s)
  INFO  STT worker ready   port=50051
  INFO  TTS worker ready   port=50052
  INFO  Uvicorn running on http://127.0.0.1:8000

$ curl -X POST localhost:8000/v1/audio/transcriptions \
    -F file=@audio.wav -F model=faster-whisper-tiny

{"text": "Hello, how can I help you today?"}

Everything you need for voice

A single runtime that handles the entire voice pipeline — from raw audio to structured text and back.

🎙️

Streaming STT

Real-time partial and final transcripts via WebSocket with sub-300ms TTFB and backpressure control.

🔊

Text-to-Speech

OpenAI-compatible speech endpoint with streaming PCM or WAV output and low time-to-first-byte.

Full-Duplex

Simultaneous STT and TTS on one WebSocket connection with automatic mute-on-speak safety.

🛡️

Session Manager

6-state machine with ring buffer, WAL-based crash recovery, and zero segment duplication.

⚙️

Multi-Engine

Faster-Whisper, WeNet, and Kokoro through a single interface. Add new engines in ~500 lines.

🔗

Voice Pipeline

Preprocessing, Silero VAD, ITN post-processing, and Prometheus metrics — all built in.

OpenAI SDK compatible

Existing OpenAI client libraries work out of the box. Just point base_url to your Macaw server.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="not-needed"
)

result = client.audio.transcriptions.create(
    model="faster-whisper-tiny",
    file=open("audio.wav", "rb"),
)
print(result.text)

Architecture at a glance

A single runtime orchestrates isolated gRPC workers per engine. Workers crash independently — the runtime recovers automatically.

              Clients (REST / WebSocket / CLI)
                          │
              ┌───────────┴───────────┐
              │     API Server        │
              │  (FastAPI + Uvicorn)  │
              └───────────┬───────────┘
                          │
              ┌───────────┴───────────┐
              │      Scheduler        │
              │  Priority · Batching  │
              │  Cancellation · TTFB  │
              └─────┬─────────┬───────┘
                    │         │
           ┌────────┴──┐  ┌───┴────────┐
           │ STT Worker │  │ TTS Worker │
           │  (gRPC)    │  │  (gRPC)    │
           ├────────────┤  ├────────────┤
           │ Faster-    │  │ Kokoro     │
           │ Whisper    │  └────────────┘
           │ WeNet      │
           └────────────┘