official Journal of AlNoor University

Integrating Deep Learning and Swarm Intelligence for Speech Recognition: A Review

Document Type : Review Article

Authors

1 university of Mosul/ College of Computer science and Mathematics / computer science department.

2 university of mosul/ College of Computer science and Mathematics/computer science department

Abstract
Abstract
With an emphasis on deep learning and bio-inspired optimization techniques, this paper provides an extensive overview of current developments in voice and emotion detection systems. Advanced recurrent networks like GRU and SVNN, attention-based encoder-decoder frameworks, and hybrid CNN-LSTM architectures are just a few of the models examined in the examined papers. In order to increase robustness, feature extraction methods like MFCC, PLPC, LPCC, and log Mel-filter banks are frequently used in conjunction with data augmentation techniques including speed perturbation, noise injection, and pitch shifting. To enhance feature selection and classifier performance, a number of optimization methods are used, including Particle Swarm Optimization (PSO), Cat Swarm Optimization (CSO), Glowworm Swarm Optimization (GSO), and innovative hybrids like MUPW and GREO. The examined works show state-of-the-art accuracy in a variety of tasks, such as multimodal (audio-visual) recognition systems, Arabic dialect recognition, and emotional speech classification. According to experimental results, there are significant improvements in performance compared to standard models; in certain systems, accuracy rates can approach 99.76%. The increasing efficacy of combining deep learning with intelligent optimization is highlighted in this paper, which also makes recommendations for future developments including transducer-based architectures, real-time adaptation, and domain-specific data augmentation.

Keywords

Subjects