MacawMacaw OpenVoice
Supported Models

Supported Models

Macaw OpenVoice is engine-agnostic — it supports multiple STT and TTS engines through a unified backend interface. Each engine runs as an isolated gRPC subprocess, and the runtime adapts its pipeline automatically based on the model's architecture.

Model Catalog

Macaw tracks models across 8 categories. The status column shows integration progress:

StatusMeaning
AvailableIntegrated in Macaw, ready to use via macaw pull
Coming SoonEngine adapter in active development
PlannedOn the roadmap, not yet started

Overview by Category

STT Highlights

StatusModelWER (%)LanguagesLicense
Availablefaster-whisper-large-v37.4100+MIT
Availabledistil-whisper-large-v3~7.5EnglishMIT
PlannedCanary-Qwen-2.5B5.63EnglishCC-BY-4.0
PlannedParakeet-TDT-1.1B~8.0EnglishCC-BY-4.0
PlannedQwen3-ASR-1.7B30+Apache-2.0

See all 34 STT models →

TTS Highlights

StatusModelParametersLanguagesLicense
AvailableKokoro-82M82M9Apache-2.0
PlannedQwen3-TTS-12Hz-0.6B0.6B10Apache-2.0
Plannedparler-tts-mini-multilingual0.9B8Apache-2.0
PlannedCSM-1B~1BEnglishApache-2.0

See all 17 TTS models →

VAD & Turn Detection Highlights

StatusModelTypeLicense
AvailableSilero VADEnergy + Neural VADMIT
Plannedsmart-turn-v2Semantic End-of-TurnBSD-2-Clause

See all 4 VAD models →

Quick Install

Install a model from the catalog
macaw pull faster-whisper-large-v3
List installed models
macaw list
Inspect model details
macaw inspect faster-whisper-large-v3
Remove a model
macaw remove faster-whisper-large-v3

Models are downloaded from HuggingFace Hub and stored in ~/.macaw/models/ by default.

Engine Comparison

STT Engines

FeatureFaster-WhisperWeNet
ArchitectureEncoder-decoderCTC
Streaming partialsVia LocalAgreementNative
Hot wordsVia initial_prompt workaroundNative keyword boosting
Cross-segment contextYes (224 tokens)No
Language detectionYesNo
TranslationYes (to English)No
Word timestampsYesYes
Batch inferenceYesYes
Best forAccuracy, multilingualLow latency, Chinese

How Architecture Affects the Pipeline

The architecture field in the model manifest tells the runtime how to adapt its streaming pipeline:

Encoder-DecoderCTCStreaming-Native
LocalAgreementYes — confirms tokens across multiple inference passesNoNo
Cross-segment contextYes — 224 tokens from previous final as initial_promptNoNo
Native partialsNo — runtime generates partials via LocalAgreementYesYes
Accumulation5s chunks before inferenceFrame-by-frame (160ms minimum)Frame-by-frame
ExampleFaster-WhisperWeNetParaformer (future)

Choosing a model

  • Best accuracy: faster-whisper-large-v3 — highest quality, 100+ languages
  • Best speed/accuracy trade-off: faster-whisper-small — runs on CPU, good quality
  • Fastest startup: faster-whisper-tiny — 256 MB, loads in ~2s
  • English only, fast: distil-whisper-large-v3 — 6x faster than large-v3, ~1% WER gap
  • Low-latency streaming: WeNet (CTC) — frame-by-frame native partials
  • Chinese focus: WeNet — optimized for Chinese with native hot word support

Model Manifest

Every model has a macaw.yaml manifest that describes its capabilities, resource requirements, and engine configuration. See Configuration for the full manifest format.

Example: macaw.yaml
name: faster-whisper-large-v3
version: "1.0.0"
type: stt
engine: faster-whisper

capabilities:
  architecture: encoder-decoder
  streaming: true
  languages: ["auto", "en", "pt", "es", "ja", "zh"]
  word_timestamps: true
  translation: true
  partial_transcripts: true
  hot_words: false
  batch_inference: true
  language_detection: true
  initial_prompt: true

resources:
  memory_mb: 3072
  gpu_required: false
  gpu_recommended: true
  load_time_seconds: 8

engine_config:
  model_size: "large-v3"
  compute_type: "float16"
  device: "auto"
  beam_size: 5
  vad_filter: false

Dependencies

Each engine has its own optional dependency group. Install only what you need:

ExtraCommandWhat It Installs
faster-whisperpip install macaw-openvoice[faster-whisper]faster-whisper>=1.1,<2.0
wenetpip install macaw-openvoice[wenet]wenet>=2.0,<3.0
kokoropip install macaw-openvoice[kokoro]kokoro>=0.1,<1.0
huggingfacepip install macaw-openvoice[huggingface]huggingface_hub>=0.20,<1.0
itnpip install macaw-openvoice[itn]nemo_text_processing>=1.1,<2.0
Install everything for a typical deployment
pip install macaw-openvoice[server,grpc,faster-whisper,kokoro,huggingface]

Adding Your Own Engine

Macaw is designed to make adding new engines straightforward — approximately 400-700 lines of code with zero changes to the runtime core. See the Adding an Engine guide.