Emotion Detection using Audio Data Samples
Main Article Content
Abstract
A person’s speech can be altered by various changes in the autonomic nervous system and effective technologies can process this information to recognize emotion. As an example, speech produced in a state of fear, anger, or joy becomes loud and fast, with a higher and wider range in pitch, whereas emotions such as sadness or tiredness generate slow and low-pitched speech. Detection of human emotions through voice-pattern and speech-pattern analysis has many applications such as better assisting human-machine interactions. This paper aims to detect emotions from audio. Several machine learning algorithms including K-nearest neighbours (KNN) and decision trees were implemented, based on acoustic features such as Mel Frequency Cepstral Coefficient (MFCC). Our evaluation shows that the proposed approach yields accuracies of 98%, 92% and 99% using KNN, Decision Trees and Extra-Tree Classifiers, respectively, for 7 emotions using Toronto Emotional Speech Set (TESS) Dataset.
Downloads
Article Details
COPYRIGHT
Submission of a manuscript implies: that the work described has not been published before, that it is not under consideration for publication elsewhere; that if and when the manuscript is accepted for publication, the authors agree to automatic transfer of the copyright to the publisher.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work
- The journal allows the author(s) to retain publishing rights without restrictions.
- The journal allows the author(s) to hold the copyright without restrictions.
References
. Anagnostopoulos, C. N., Iliou, T., & Giannoukos, I. (2015). Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011. Artificial Intelligence Review, 43(2), 155-177.
. Batliner, A., Schuller, B., Seppi, D., Steidl, S., Devillers, L., Vidrascu, L., & Amir, N. (2011). The automatic recognition of emotions in speech. In Emotion-oriented systems (pp. 71–99). Berlin Heidelberg: Springer.
. Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., & Rose, R. (2007). Automatic speech recognition and speech variability: A review. Speech Communication, 49(10), 763–786.
. El Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and datasets. Pattern Recognition, 44(3), 572–587.
Gajarla, V., & Gupta, A. (2015). Emotion detection and sentiment analysis of images. Georgia Institute of Technology.
Cowie, R., Douglas-Cowie, E., Savvidou*, S., McMahon, E., Sawey, M., & Schröder, M. (2000). 'FEELTRACE': An instrument for recording perceived emotion in real time. In ISCA tutorial and research workshop (ITRW) on speech and emotion.
Semwal, N., Kumar, A., & Narayanan, S. (2017, February). Automatic speech emotion detection system using multi-domain acoustic feature selection and classification models. In 2017 IEEE International Conference on Identity, Security and Behavior Analysis (ISBA) (pp. 1-6). IEEE.
Sundarprasad, N. (2018). Speech Emotion Detection Using Machine Learning Techniques.
Nicolaou, M. A., Gunes, H., & Pantic, M. (2011). Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space. IEEE Transactions on Affective Computing, 2(2), 92-105.
. Chauhan, P. M., & Desai, N. P. (2014, March). Mel frequency cepstral coefficients (mfcc) based speaker identification in noisy environment using wiener filter. In 2014 International Conference on Green Computing Communication and Electrical Engineering (ICGCCEE) (pp. 1-5). IEEE.
. Sudhakar, R. S., & Anil, M. C. (2015, February). Analysis of speech features for emotion detection: a review. In 2015 International Conference on Computing Communication Control and Automation (pp. 661-664). IEEE.
. Fawcett T(2006) An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861-874 – Fei.L and Perona,.P (2006) – A Bayesian Hierarchical Model for Learning Natural Scene Categories
. Zhang, X., Xu, C., Xue, W., Hu, J., He, Y., & Gao, M. (2018). Emotion Recognition Based on Multichannel Physiological Signals with Comprehensive Nonlinear Processing. Sensors, 18(11), 3886.
Ellis, Daniel PW. "Classifying music audio with timbral and chroma features." (2007): 339-340.
Soltani, K., & Ainon, R. N. (2007, February). Speech emotion detection based on neural networks. In 2007 9th international symposium on signal processing and its applications (pp. 1-3). IEEE.
Harari, Y. N. (2016). Homo Deus: A brief history of tomorrow. Random House.
Kozma, L. (2008). k Nearest Neighbors algorithm (kNN). Helsinki University of Technology.
Detection of Audio Emotional Intelligence Using Machine Learning Algorithms, Tejesh Batapati x17108811 MSc Research Project in Data Analytics(2018)
Crowder, J., and Shelli Friess. "Artificial psychology: The psychology of AI." In Proceedings of the 3rd Annual International Multi-Conference on Informatics and Cybernetics. Orlando, FL. 2012.
Karthik, R., Satapathy, P., Patnaik, S., Priyadarshi, S., Bharath, K. P., & Kumar, M. R. (2019). Automatic Phone Slip Detection System. In Microelectronics, Electromagnetics and Telecommunications (pp. 327-336). Springer, Singapore.
Salamon, J., & Bello, J. P. (2015, April). Unsupervised feature learning for urban sound classification. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 171-175). IEEE.
Anurag Kumar, Pranay Dighe, Rita Singh, Sourish Chaudhuri, Bhiksha Raj, "Audio event detection from acoustic unit occurrence patterns", Acoustics Speech and Signal Processing (ICASSP) 2012 IEEE International Conference on, pp. 489-492, 2012.
Liaw, A. and Wiener, M.,( 2002.) Classification and regression by randomForest. R news, 2(3), pp.18-22.