Emotion Detection using Audio Data Samples

Ameya Ajit Mande, Shruti Telang, Zongru Shao, Sukrut Dani


A person’s speech can be altered by various changes in the autonomic nervous system and effective technologies can process this information to recognize emotion. As an example, speech produced in a state of fear, anger, or joy becomes loud and fast, with a higher and wider range in pitch, whereas emotions such as sadness or tiredness generate slow and low-pitched speech. Detection of human emotions through voice-pattern and speech-pattern analysis has many applications such as better assisting human-machine interactions. This paper aims to detect emotions from audio. Several machine learning algorithms including K-nearest neighbours (KNN) and decision trees were implemented, based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC). Our evaluation shows that the proposed approach yields accuracies of 98%, 92% and 99% using KNN, Decision Trees and Extra-Tree Classifiers, respectively, for 7 emotions using Toronto Emotional Speech Set (TESS) Dataset.


emotion detection, acoustic features, machine learning, KNN, decision tree, extra-tree, MFCC

Full Text:



. Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155-177.

. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.

. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.

. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and datasets. Pattern Recognition, 44(3), 572–587.

Gajarla, V., & Gupta, A. (2015). Emotion detection and sentiment analysis of images. Georgia Institute of Technology.

Cowie, R., Douglas-Cowie, E., Savvidou*, S., McMahon, E., Sawey, M., & Schröder, M. (2000). 'FEELTRACE': An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.

Semwal, N., Kumar, A., & Narayanan, S. (2017, February). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1-6). IEEE.

Sundarprasad, N. (2018). Speech Emotion Detection Using Machine Learning Techniques.

Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92-105.

. Chauhan, P. M., & Desai, N. P. (2014, March). Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1-5). IEEE.

. Sudhakar, R. S., & Anil, M. C. (2015, February). Analysis of speech features for emotion detection: a review. In 2015 International Conference on Computing Communication Control and Automation (pp. 661-664). IEEE.

. Fawcett T(2006) An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861-874 – Fei.L and Perona,.P (2006) – A Bayesian Hierarchical Model for Learning Natural Scene Categories

. Zhang, X., Xu, C., Xue, W., Hu, J., He, Y., & Gao, M. (2018). Emotion Recognition Based on Multichannel Physiological Signals with Comprehensive Nonlinear Processing. Sensors, 18(11), 3886.

Ellis, Daniel PW. "Classifying music audio with timbral and chroma features." (2007): 339-340.

Soltani, K., & Ainon, R. N. (2007, February). Speech emotion detection based on neural networks. In 2007 9th international symposium on signal processing and its applications (pp. 1-3). IEEE.

Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow. Random House.

Kozma, L. (2008). k Nearest Neighbors algorithm (kNN). Helsinki University of Technology.

Detection of Audio Emotional Intelligence Using Machine Learning Algorithms, Tejesh Batapati x17108811 MSc Research Project in Data Analytics(2018)

Crowder, J., and Shelli Friess. "Artificial psychology: The psychology of AI." In Proceedings of the 3rd Annual International Multi-Conference on Informatics and Cybernetics. Orlando, FL. 2012.

Karthik, R., Satapathy, P., Patnaik, S., Priyadarshi, S., Bharath, K. P., & Kumar, M. R. (2019). Automatic Phone Slip Detection System. In Microelectronics, Electromagnetics and Telecommunications (pp. 327-336). Springer, Singapore.

Salamon, J., & Bello, J. P. (2015, April). Unsupervised feature learning for urban sound classification. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 171-175). IEEE.

Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, Bhiksha Raj, "Audio event detection from acoustic unit occurrence patterns", Acoustics Speech and Signal Processing (ICASSP) 2012 IEEE International Conference on, pp. 489-492, 2012.

Liaw, A. and Wiener, M.,( 2002.) Classification and regression by randomForest. R news, 2(3), pp.18-22.

DOI: https://doi.org/10.26483/ijarcs.v10i6.6489


  • There are currently no refbacks.

Copyright (c) 2019 International Journal of Advanced Research in Computer Science