Open-Source Voice Runtime

Build voice apps in minutes, not months

Macaw OpenVoice is a production-ready runtime for real-time speech-to-text and text-to-speech. Drop-in OpenAI API compatibility, streaming WebSocket support, and multi-engine architecture — all in a single Python process.

Get Started Read the Docs

Python 3.11+Apache 2.01600+ tests

terminal

$ pip install macaw-openvoice[server,grpc,faster-whisper]

Successfully installed macaw-openvoice-1.0.0

$ macaw serve

Macaw OpenVoice v1.0.0

INFO Found 2 model(s)

INFO STT worker ready port=50051

INFO TTS worker ready port=50052

INFO Uvicorn running on http://127.0.0.1:8000

$ curl -X POST localhost:8000/v1/audio/transcriptions \

-F file=@audio.wav -F model=faster-whisper-tiny

{"text": "Hello, how can I help you today?"}

Everything you need for voice

A single runtime that handles the entire voice pipeline — from raw audio to structured text and back.

Streaming STT

Real-time partial and final transcripts via WebSocket with sub-300ms TTFB and backpressure control.

Text-to-Speech

OpenAI-compatible speech endpoint with streaming PCM or WAV output and low time-to-first-byte.

Full-Duplex

Simultaneous STT and TTS on one WebSocket connection with automatic mute-on-speak safety.

Session Manager

6-state machine with ring buffer, WAL-based crash recovery, and zero segment duplication.

Multi-Engine

Faster-Whisper, WeNet, and Kokoro through a single interface. Add new engines in ~500 lines.

Voice Pipeline

Preprocessing, Silero VAD, ITN post-processing, and Prometheus metrics — all built in.

Drop-in Replacement

OpenAI SDK compatible

Existing OpenAI client libraries work out of the box. Change one line and your code talks to Macaw instead.

/v1/audio/transcriptions
/v1/audio/speech
/v1/audio/translations

API Reference

app.py

1	from openai import OpenAI
2
3	client = OpenAI(
4	base_url="http://localhost:8000/v1",
5	api_key="not-needed"
6	)
7
8	result = client.audio.transcriptions.create(
9	model="faster-whisper-tiny",
10	file=open("audio.wav", "rb"),
11	)
12	print(result.text)

# Only base_url changes — everything else stays the same

How It Works

Architecture at a glance

A single runtime orchestrates isolated gRPC workers per engine. Workers crash independently — the runtime recovers automatically.

ClientsOrchestrationSTTTTS

Read the full architecture guide