Audio Codec Models

Audio codecs compress and decompress audio signals into compact token representations. In the context of voice AI, neural audio codecs are used as intermediate representations for TTS systems, voice conversion, and efficient audio streaming. They enable high-quality audio reconstruction from extremely compact representations.

Models

Status	Model	Sample Rate	Parameters	License	HuggingFace
Planned	NVIDIA Audio Codec 44kHz	44.1 kHz	—	—	Link
Planned	NVIDIA Audio Codec 22kHz	22.05 kHz	—	—	Link

Choosing a model

High-fidelity applications: The 44kHz codec preserves more audio detail and is suitable for music or high-quality voice synthesis.
Voice-focused applications: The 22kHz codec is sufficient for speech and offers a more compact representation.

Models

On this page