MacawMacaw OpenVoice
Supported Models

Emotion Recognition Models

Emotion recognition models detect the emotional state of a speaker from their voice. This enables empathetic conversational AI, call center sentiment analysis, and adaptive voice experiences that respond to user mood. Models range from simple categorical classification (happy, sad, angry) to dimensional emotion analysis (valence, arousal, dominance).

Models

StatusModelTypeParametersEmotionsHardwareLicenseHuggingFace
Plannedemotion-diarization-wavlm-largeEmotion Diarization~315MAnger, Happiness, Sadness, NeutralGPU recommendedApache-2.0Link
Plannedemotion-recognition-wav2vec2Emotion ClassificationAngry, Happy, Sad, NeutralCPU / GPUApache-2.0Link
PlannedAudio2Emotion-v3.0Emotion ClassificationMultiple (NVIDIA Riva)GPULink
PlannedOmniDimen-V1.2-4B-EmotionText Emotion Analysis4BDimensional (valence, arousal)GPUMITLink

Choosing a model

  • Emotion-aware diarization: emotion-diarization-wavlm-large combines speaker diarization with emotion detection — it tells you not just who spoke, but how they felt.
  • Simple emotion classification: wav2vec2 emotion recognition provides straightforward categorical emotion labels per audio segment.
  • NVIDIA ecosystem: Audio2Emotion-v3.0 integrates with NVIDIA Riva and Audio2Face pipelines for animated avatar experiences.
  • Fine-grained analysis: OmniDimen-2-4B-Emotion provides dimensional emotion analysis rather than discrete categories.

On this page