Skip to main content

Changelog

All notable changes to Macaw OpenVoice are documented here. This project follows Semantic Versioning and the Keep a Changelog format.

[Unreleased]

Added

  • Full-duplex STT + TTS on a single WebSocket connection (M9)
  • Mute-on-speak mechanism with guaranteed unmute via try/finally
  • TTS cancel and auto-cancel of previous synthesis
  • tts.speaking_start and tts.speaking_end WebSocket events
  • KokoroBackend with 9-language support and streaming synthesis
  • TTS gRPC worker on port 50052
  • POST /v1/audio/speech REST endpoint (OpenAI-compatible)
  • TTS metrics (TTFB, synthesis duration, requests, active sessions)
  • Session backpressure controller (rate limit at 1.2x real-time)
  • session.configure command for dynamic session settings
  • input_audio_buffer.commit command for manual buffer commit
  • macaw pull and macaw remove CLI commands
  • macaw ps command to list models on a running server

Changed

  • WebSocket protocol extended with TTS commands and events
  • Session Manager now supports mute/unmute for full-duplex
  • Scheduler metrics made optional (graceful degradation without prometheus_client)

Milestone History

MilestoneDescriptionStatus
M1FastAPI server + health endpointDone
M2Model Registry + manifestsDone
M3Scheduler + priority queue + cancellationDone
M4gRPC STT workers (Faster-Whisper)Done
M5Streaming STT via WebSocketDone
M6Session Manager (state machine, ring buffer, WAL)Done
M7WeNet CTC engine + pipeline adaptationDone
M8TTS engine (Kokoro) + REST endpointDone
M9Full-duplex STT + TTSDone
info

For the complete commit history, see the GitHub repository.