Skip to main content

Supported Models

Macaw OpenVoice is engine-agnostic — it supports multiple STT and TTS engines through a unified backend interface. Each engine runs as an isolated gRPC subprocess, and the runtime adapts its pipeline automatically based on the model's architecture.

Model Catalog

These are the official models available via macaw pull:

STT Models

ModelEngineArchitectureMemoryGPULanguagesTranslation
faster-whisper-large-v3Faster-Whisperencoder-decoder3,072 MBRecommended100+ (auto-detect)Yes
faster-whisper-mediumFaster-Whisperencoder-decoder1,536 MBRecommended100+ (auto-detect)Yes
faster-whisper-smallFaster-Whisperencoder-decoder512 MBOptional100+ (auto-detect)Yes
faster-whisper-tinyFaster-Whisperencoder-decoder256 MBOptional100+ (auto-detect)Yes
distil-whisper-large-v3Faster-Whisperencoder-decoder1,536 MBRecommendedEnglish onlyNo

TTS Models

ModelEngineMemoryGPULanguagesDefault Voice
kokoro-v1Kokoro512 MBRecommended9 languagesaf_heart

VAD (Internal)

ModelPurposeMemoryGPUCost
Silero VADVoice Activity Detection~50 MBNot needed~2ms/frame
WeNet — bring your own model

WeNet is a supported engine but has no pre-configured models in the catalog. You provide your own WeNet model and create a macaw.yaml manifest for it.

Quick Install

Install a model from the catalog
macaw pull faster-whisper-large-v3
List installed models
macaw list
Inspect model details
macaw inspect faster-whisper-large-v3
Remove a model
macaw remove faster-whisper-large-v3

Models are downloaded from HuggingFace Hub and stored in ~/.macaw/models/ by default.

Engine Comparison

STT Engines

FeatureFaster-WhisperWeNet
ArchitectureEncoder-decoderCTC
Streaming partialsVia LocalAgreementNative
Hot wordsVia initial_prompt workaroundNative keyword boosting
Cross-segment contextYes (224 tokens)No
Language detectionYesNo
TranslationYes (to English)No
Word timestampsYesYes
Batch inferenceYesYes
Best forAccuracy, multilingualLow latency, Chinese

How Architecture Affects the Pipeline

The architecture field in the model manifest tells the runtime how to adapt its streaming pipeline:

Encoder-DecoderCTCStreaming-Native
LocalAgreementYes — confirms tokens across multiple inference passesNoNo
Cross-segment contextYes — 224 tokens from previous final as initial_promptNoNo
Native partialsNo — runtime generates partials via LocalAgreementYesYes
Accumulation5s chunks before inferenceFrame-by-frame (160ms minimum)Frame-by-frame
ExampleFaster-WhisperWeNetParaformer (future)
Choosing a model
  • Best accuracy: faster-whisper-large-v3 — highest quality, 100+ languages
  • Best speed/accuracy trade-off: faster-whisper-small — runs on CPU, good quality
  • Fastest startup: faster-whisper-tiny — 256 MB, loads in ~2s
  • English only, fast: distil-whisper-large-v3 — 6x faster than large-v3, ~1% WER gap
  • Low-latency streaming: WeNet (CTC) — frame-by-frame native partials
  • Chinese focus: WeNet — optimized for Chinese with native hot word support

Model Manifest

Every model has a macaw.yaml manifest that describes its capabilities, resource requirements, and engine configuration. See Configuration for the full manifest format.

Example: macaw.yaml
name: faster-whisper-large-v3
version: "1.0.0"
type: stt
engine: faster-whisper

capabilities:
architecture: encoder-decoder
streaming: true
languages: ["auto", "en", "pt", "es", "ja", "zh"]
word_timestamps: true
translation: true
partial_transcripts: true
hot_words: false
batch_inference: true
language_detection: true
initial_prompt: true

resources:
memory_mb: 3072
gpu_required: false
gpu_recommended: true
load_time_seconds: 8

engine_config:
model_size: "large-v3"
compute_type: "float16"
device: "auto"
beam_size: 5
vad_filter: false

Dependencies

Each engine has its own optional dependency group. Install only what you need:

ExtraCommandWhat It Installs
faster-whisperpip install macaw-openvoice[faster-whisper]faster-whisper>=1.1,<2.0
wenetpip install macaw-openvoice[wenet]wenet>=2.0,<3.0
kokoropip install macaw-openvoice[kokoro]kokoro>=0.1,<1.0
huggingfacepip install macaw-openvoice[huggingface]huggingface_hub>=0.20,<1.0
itnpip install macaw-openvoice[itn]nemo_text_processing>=1.1,<2.0
Install everything for a typical deployment
pip install macaw-openvoice[server,grpc,faster-whisper,kokoro,huggingface]

Adding Your Own Engine

Macaw is designed to make adding new engines straightforward — approximately 400-700 lines of code with zero changes to the runtime core. See the Adding an Engine guide.