Skip to main content

Installation

Macaw OpenVoice requires Python 3.11+ and uses pip extras to install only the engines you need.


Prerequisites

RequirementMinimumRecommended
Python3.113.12
pip21.0+latest
OSLinux, macOSLinux (for GPU support)
CUDAOptional12.x (for GPU inference)
info

Macaw runs on CPU by default. GPU support depends on the engine -- Faster-Whisper uses CTranslate2 which supports CUDA out of the box.


Install with pip

The simplest way to get started:

Minimal install (STT only)
pip install macaw-openvoice[server,grpc,faster-whisper]
Full install (STT + TTS + ITN)
pip install macaw-openvoice[server,grpc,faster-whisper,kokoro,itn]

Available Extras

ExtraWhat it addsSize
serverFastAPI + Uvicorn (required for serving)~20 MB
grpcgRPC runtime for worker communication~15 MB
faster-whisperFaster-Whisper STT engine~100 MB
wenetWeNet CTC STT engine~80 MB
kokoroKokoro TTS engine~50 MB
itnNeMo Inverse Text Normalization~200 MB
streamMicrophone streaming via sounddevice~5 MB
devDevelopment tools (ruff, mypy, pytest)~50 MB

uv is significantly faster than pip and handles virtual environments automatically:

Create a virtual environment and install
uv venv --python 3.12
uv sync --all-extras
Activate the environment
source .venv/bin/activate

GPU Setup

For GPU-accelerated inference with Faster-Whisper:

  1. Install CUDA drivers for your GPU
  2. Install the CUDA-enabled version of CTranslate2:
pip install ctranslate2
warning

Ensure your CUDA version matches the CTranslate2 build. Check compatibility at the CTranslate2 releases page.


Verify Installation

Check that Macaw is installed correctly
macaw --help

You should see:

Usage: macaw [OPTIONS] COMMAND [ARGS]...

Macaw OpenVoice CLI

Commands:
serve Start the API server
transcribe Transcribe an audio file
translate Translate audio to English
list List installed models
pull Download a model
inspect Show model details

Next Steps