Skip to main content

Configuration

Macaw OpenVoice uses a combination of model manifests, runtime defaults, and environment variables for configuration.


Model Manifests

Each engine model is described by a macaw.yaml manifest file. This file declares the model's capabilities and how the runtime should interact with it.

Example: macaw.yaml for Faster-Whisper
name: faster-whisper-large-v3
type: stt
engine: faster-whisper
architecture: encoder-decoder
languages:
- en
- pt
- es
options:
beam_size: 5
vad_filter: false # VAD is handled by the runtime, not the engine
word_timestamps: false

Key Fields

FieldTypeDescription
namestringUnique model identifier
typestringstt or tts
enginestringEngine backend (faster-whisper, wenet, kokoro)
architecturestringencoder-decoder, ctc, or streaming-native
languageslistSupported language codes
optionsdictEngine-specific configuration
warning

Always set vad_filter: false in your manifest. The VAD is managed by the Macaw runtime -- enabling the engine's internal VAD would duplicate the work and cause unpredictable behavior.


Runtime Configuration

Runtime behavior is controlled through server startup options:

Start with custom settings
macaw serve --host 0.0.0.0 --port 8000

Server Options

OptionDefaultDescription
--host127.0.0.1Bind address
--port8000HTTP port
--workers1Uvicorn workers

Scheduler Settings

The scheduler manages request prioritization and batching:

SettingDefaultDescription
Aging timeout30.0sMax time a request waits in queue
Batch window75msTime window to accumulate batch requests
Batch max size8Maximum requests per batch
tip

Streaming WebSocket requests bypass the scheduler entirely -- they use a direct gRPC streaming connection for minimum latency.


Pipeline Configuration

Preprocessing

The audio preprocessing pipeline runs before VAD and is not configurable per-request -- it ensures all audio reaches the VAD and engine in a consistent format:

  1. Resample to 16 kHz mono
  2. DC removal (high-pass filter)
  3. Gain normalization

VAD Settings

VAD can be configured per WebSocket session via the session.configure command:

WebSocket session configuration
{
"type": "session.configure",
"vad": {
"sensitivity": "normal"
},
"language": "en",
"hot_words": ["Macaw", "OpenVoice"]
}
VAD SettingOptionsDescription
sensitivityhigh, normal, lowControls speech detection threshold

Post-Processing (ITN)

Inverse Text Normalization converts spoken numbers and patterns to their written form. It is applied only to final transcripts, never to partials.

InputOutput
"two thousand twenty six""2026"
"ten dollars and fifty cents""$10.50"
"one two three four""1234"
info

ITN requires the itn extra: pip install macaw-openvoice[itn]. If not installed, transcripts are returned as-is (fail-open behavior).


Environment Variables

VariableDescriptionDefault
MACAW_MODELS_DIRDirectory for model files~/.macaw/models
MACAW_LOG_LEVELLogging levelINFO
MACAW_STT_PORTgRPC port for STT worker50051
MACAW_TTS_PORTgRPC port for TTS worker50052

Next Steps