Supported Models
Emotion Recognition Models
Emotion recognition models detect the emotional state of a speaker from their voice. This enables empathetic conversational AI, call center sentiment analysis, and adaptive voice experiences that respond to user mood. Models range from simple categorical classification (happy, sad, angry) to dimensional emotion analysis (valence, arousal, dominance).
Models
| Status | Model | Type | Parameters | Emotions | Hardware | License | HuggingFace |
|---|---|---|---|---|---|---|---|
| Planned | emotion-diarization-wavlm-large | Emotion Diarization | ~315M | Anger, Happiness, Sadness, Neutral | GPU recommended | Apache-2.0 | Link |
| Planned | emotion-recognition-wav2vec2 | Emotion Classification | — | Angry, Happy, Sad, Neutral | CPU / GPU | Apache-2.0 | Link |
| Planned | Audio2Emotion-v3.0 | Emotion Classification | — | Multiple (NVIDIA Riva) | GPU | — | Link |
| Planned | OmniDimen-V1.2-4B-Emotion | Text Emotion Analysis | 4B | Dimensional (valence, arousal) | GPU | MIT | Link |
Choosing a model
- Emotion-aware diarization: emotion-diarization-wavlm-large combines speaker diarization with emotion detection — it tells you not just who spoke, but how they felt.
- Simple emotion classification: wav2vec2 emotion recognition provides straightforward categorical emotion labels per audio segment.
- NVIDIA ecosystem: Audio2Emotion-v3.0 integrates with NVIDIA Riva and Audio2Face pipelines for animated avatar experiences.
- Fine-grained analysis: OmniDimen-2-4B-Emotion provides dimensional emotion analysis rather than discrete categories.