Emotion Detection using Audio Data Samples

Ameya Ajit Mande; Shruti Telang; Zongru Shao; Sukrut Dani

doi:10.26483/ijarcs.v10i6.6489

PDF

Published: Dec 20, 2019

DOI: https://doi.org/10.26483/ijarcs.v10i6.6489

Keywords:

emotion detection, acoustic features, machine learning, KNN, decision tree, extra-tree, MFCC

Ameya Ajit Mande

Mechanical Engineering Department Maharashtra Institute of Technology Aurangabad

Shruti Telang

Computer Science Engineering Fr. C. Rodrigues Institute of Technology Navi Mumbai

Zongru Shao

Senior R&D Engineer, Spectronn New Jersey, USA

Sukrut Dani

Information Technology Marathwada Mitra Mandal College of Engineering Pune Email: sukrutdani@gmail.com

Abstract

A personâ€™s speech can be altered by various changes in the autonomic nervous system and effective technologies can process this information to recognize emotion. As an example, speech produced in a state of fear, anger, or joy becomes loud and fast, with a higher and wider range in pitch, whereas emotions such as sadness or tiredness generate slow and low-pitched speech. Detection of human emotions through voice-pattern and speech-pattern analysis has many applications such as better assisting human-machine interactions. This paper aims to detect emotions from audio. Several machine learning algorithms including K-nearest neighbours (KNN) and decision trees were implemented, based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC). Our evaluation shows that the proposed approach yields accuracies of 98%, 92% and 99% using KNN, Decision Trees and Extra-Tree Classifiers, respectively, for 7 emotions using Toronto Emotional Speech Set (TESS) Dataset.

Downloads

Download data is not yet available.

Issue

Vol. 10 No. 6 (2019): November-December 2019

Section

Articles

COPYRIGHT

Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.

Authors who publish with this journal agree to the following terms:

Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
The journal allows the author(s) to retain publishing rights without restrictions.
The journal allows the author(s) to hold the copyright without restrictions.

Author Biographies

Ameya Ajit Mande, Mechanical Engineering Department Maharashtra Institute of Technology Aurangabad

Mechanical Engineering Department

Maharashtra Institute of Technology

Â Aurangabad

Shruti Telang, Computer Science Engineering Fr. C. Rodrigues Institute of Technology Navi Mumbai

Computer Science Engineering

Zongru Shao, Senior R&D Engineer, Spectronn New Jersey, USA

Senior R&D Engineer, Spectronn

Sukrut Dani, Information Technology Marathwada Mitra Mandal College of Engineering Pune Email: sukrutdani@gmail.com

Information Technology
Marathwada Mitra Mandal College of Engineering
Pune

References

. Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155-177.

. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71â€“99). Berlin Heidelberg: Springer.

. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763â€“786.

. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and datasets. Pattern Recognition, 44(3), 572â€“587.

Gajarla, V., & Gupta, A. (2015). Emotion detection and sentiment analysis of images. Georgia Institute of Technology.

Cowie, R., Douglas-Cowie, E., Savvidou*, S., McMahon, E., Sawey, M., & SchrÃ¶der, M. (2000). 'FEELTRACE': An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.

Semwal, N., Kumar, A., & Narayanan, S. (2017, February). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1-6). IEEE.

Sundarprasad, N. (2018). Speech Emotion Detection Using Machine Learning Techniques.

Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92-105.

. Chauhan, P. M., & Desai, N. P. (2014, March). Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1-5). IEEE.

. Sudhakar, R. S., & Anil, M. C. (2015, February). Analysis of speech features for emotion detection: a review. In 2015 International Conference on Computing Communication Control and Automation (pp. 661-664). IEEE.

. Fawcett T(2006) An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861-874 â€“ Fei.L and Perona,.P (2006) â€“ A Bayesian Hierarchical Model for Learning Natural Scene Categories

. Zhang, X., Xu, C., Xue, W., Hu, J., He, Y., & Gao, M. (2018). Emotion Recognition Based on Multichannel Physiological Signals with Comprehensive Nonlinear Processing. Sensors, 18(11), 3886.

Ellis, Daniel PW. "Classifying music audio with timbral and chroma features." (2007): 339-340.

Soltani, K., & Ainon, R. N. (2007, February). Speech emotion detection based on neural networks. In 2007 9th international symposium on signal processing and its applications (pp. 1-3). IEEE.

Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow. Random House.

Kozma, L. (2008). k Nearest Neighbors algorithm (kNN). Helsinki University of Technology.

Detection of Audio Emotional Intelligence Using Machine Learning Algorithms, Tejesh Batapati x17108811 MSc Research Project in Data Analytics(2018)

Crowder, J., and Shelli Friess. "Artificial psychology: The psychology of AI." In Proceedings of the 3rd Annual International Multi-Conference on Informatics and Cybernetics. Orlando, FL. 2012.

Karthik, R., Satapathy, P., Patnaik, S., Priyadarshi, S., Bharath, K. P., & Kumar, M. R. (2019). Automatic Phone Slip Detection System. In Microelectronics, Electromagnetics and Telecommunications (pp. 327-336). Springer, Singapore.

Salamon, J., & Bello, J. P. (2015, April). Unsupervised feature learning for urban sound classification. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 171-175). IEEE.

Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, Bhiksha Raj, "Audio event detection from acoustic unit occurrence patterns", Acoustics Speech and Signal Processing (ICASSP) 2012 IEEE International Conference on, pp. 489-492, 2012.

Liaw, A. and Wiener, M.,( 2002.) Classification and regression by randomForest. R news, 2(3), pp.18-22.

Article Sidebar

Main Article Content