Open-Source Voice Runtime

Build voice apps in minutes, not months

Macaw OpenVoice is a production-ready runtime for real-time speech-to-text and text-to-speech. Drop-in OpenAI API compatibility, streaming WebSocket support, and multi-engine architecture — all in a single Python process.

Python 3.11+Apache 2.01600+ tests
terminal
$ pip install macaw-openvoice[server,grpc,faster-whisper]
Successfully installed macaw-openvoice-1.0.0
$ macaw serve
Macaw OpenVoice v1.0.0
INFO Found 2 model(s)
INFO STT worker ready  port=50051
INFO TTS worker ready  port=50052
INFO Uvicorn running on http://127.0.0.1:8000
$ curl -X POST localhost:8000/v1/audio/transcriptions \
-F file=@audio.wav -F model=faster-whisper-tiny
{"text": "Hello, how can I help you today?"}
$

Everything you need for voice

A single runtime that handles the entire voice pipeline — from raw audio to structured text and back.

Streaming STT

Real-time partial and final transcripts via WebSocket with sub-300ms TTFB and backpressure control.

Text-to-Speech

OpenAI-compatible speech endpoint with streaming PCM or WAV output and low time-to-first-byte.

Full-Duplex

Simultaneous STT and TTS on one WebSocket connection with automatic mute-on-speak safety.

Session Manager

6-state machine with ring buffer, WAL-based crash recovery, and zero segment duplication.

Multi-Engine

Faster-Whisper, WeNet, and Kokoro through a single interface. Add new engines in ~500 lines.

Voice Pipeline

Preprocessing, Silero VAD, ITN post-processing, and Prometheus metrics — all built in.

Drop-in Replacement

OpenAI SDK compatible

Existing OpenAI client libraries work out of the box. Change one line and your code talks to Macaw instead.

  • /v1/audio/transcriptions
  • /v1/audio/speech
  • /v1/audio/translations
API Reference
app.py
1from openai import OpenAI
2 
3client = OpenAI(
4    base_url="http://localhost:8000/v1",
5    api_key="not-needed"
6)
7 
8result = client.audio.transcriptions.create(
9    model="faster-whisper-tiny",
10    file=open("audio.wav", "rb"),
11)
12print(result.text)
# Only base_url changes — everything else stays the same

How It Works

Architecture at a glance

A single runtime orchestrates isolated gRPC workers per engine. Workers crash independently — the runtime recovers automatically.

RESTWebSocketCLIAPI ServerFastAPI + UvicornSchedulerPriority · Batching · TTFBSTT WorkersFaster-Whisper · WeNetTTS WorkersKokoroin-processgRPCgRPC
ClientsOrchestrationSTTTTS