official Journal of AlNoor University

A Comprehensive Review of Speech Emotion Recognition: Advances, Challenges, and Future Directions

Document Type : Review Article

Authors

1 Ninevah University

2 University of Telafer

3 University of Mosul

4 Universiti Teknologi PETRONAS, Malaysia.

Abstract
Automated detection of human emotion from speech signals is a relatively new area in artificial intelligence aimed at determining the emotions people express through their speech. Traditionally, SER did feature extraction recognition with handcrafted ones and classical machine learning ones such as SVM (support vector machines) and HMM (hidden Markov models). The richness of emotions made these methodologies however challenging. The evolution of deep learning, in particular CNNs, RNNs, and other Transformer-based structures, has greatly improved the accuracy and robustness of SER systems. In this work, the SER is studied in depth taking into account the most relevant methods and feature extraction methods as well as an introduction of benchmark databases. It also includes augmentation methods, evaluation measures and the difficulties of real-time processing. Regardless of the advancements, SER continues to encounter challenges, including scarcity of datasets, imbalance between classes, domain adaptation, and high computational requirements. The review highlights unanswered questions regarding research and analyses. future directions, including multimodal fusion, self-supervised learning, and Explainable AI.

Keywords

Subjects