MacawMacaw OpenVoice
Getting Started

Configuration

Macaw OpenVoice uses a combination of model manifests, runtime defaults, and environment variables for configuration.


Model Manifests

Each engine model is described by a macaw.yaml manifest file. This file declares the model's capabilities and how the runtime should interact with it.

Example: macaw.yaml for Faster-Whisper
name: faster-whisper-large-v3
type: stt
engine: faster-whisper
architecture: encoder-decoder
languages:
  - en
  - pt
  - es
options:
  beam_size: 5
  vad_filter: false      # VAD is handled by the runtime, not the engine
  word_timestamps: false

Key Fields

FieldTypeDescription
namestringUnique model identifier
typestringstt or tts
enginestringEngine backend (faster-whisper, wenet, kokoro)
architecturestringencoder-decoder, ctc, or streaming-native
languageslistSupported language codes
optionsdictEngine-specific configuration

Always set vad_filter: false in your manifest. The VAD is managed by the Macaw runtime -- enabling the engine's internal VAD would duplicate the work and cause unpredictable behavior.


Runtime Configuration

Runtime behavior is controlled through server startup options:

Start with custom settings
macaw serve --host 0.0.0.0 --port 8000

Server Options

OptionDefaultDescription
--host127.0.0.1Bind address
--port8000HTTP port
--workers1Uvicorn workers

Scheduler Settings

The scheduler manages request prioritization and batching:

SettingDefaultDescription
Aging timeout30.0sMax time a request waits in queue
Batch window75msTime window to accumulate batch requests
Batch max size8Maximum requests per batch

Streaming WebSocket requests bypass the scheduler entirely -- they use a direct gRPC streaming connection for minimum latency.


Pipeline Configuration

Preprocessing

The audio preprocessing pipeline runs before VAD and is not configurable per-request -- it ensures all audio reaches the VAD and engine in a consistent format:

  1. Resample to 16 kHz mono
  2. DC removal (high-pass filter)
  3. Gain normalization

VAD Settings

VAD can be configured per WebSocket session via the session.configure command:

WebSocket session configuration
{
  "type": "session.configure",
  "vad": {
    "sensitivity": "normal"
  },
  "language": "en",
  "hot_words": ["Macaw", "OpenVoice"]
}
VAD SettingOptionsDescription
sensitivityhigh, normal, lowControls speech detection threshold

Post-Processing (ITN)

Inverse Text Normalization converts spoken numbers and patterns to their written form. It is applied only to final transcripts, never to partials.

InputOutput
"two thousand twenty six""2026"
"ten dollars and fifty cents""$10.50"
"one two three four""1234"

ITN requires the itn extra: pip install macaw-openvoice[itn]. If not installed, transcripts are returned as-is (fail-open behavior).


Environment Variables

Macaw is configured entirely via MACAW_* environment variables -- 66 variables across 12 groups, all validated at startup.

See the Environment Variables Reference for the complete list with defaults, types, ranges, and deployment examples.

Minimal .env for production
MACAW_HOST=0.0.0.0
MACAW_PORT=8000
MACAW_MODELS_DIR=/opt/macaw/models
MACAW_LOG_FORMAT=json
MACAW_CORS_ORIGINS=https://app.example.com

Next Steps