Profile photo for Kishore Vasan

This is a very interesting question and has an even more interesting solution.

Conceptual stuff:

Each voice has a feature known as Mel-frequency cepstrum . Looking at the coefficients of this feature is the key to classify different emotions.

As you can see from this, each emotion has its own region in a vector space.

To know more about conceptually applying mfcc in emotion recognition: Emotion Detection Using MFCC and Cepstrum Features

Now, on to the interesting part, Machine Learning:

First extract mfcc feature matrix for each voice sample and convert it into a 1*13 vector for each audio. [As per research papers, the first 13 coefficients primarily determine the emotion!]

Next, Classify each vector as per its emotion.

Split the dataset into a 75% training and 25% test data. Put all the vectors in a Feed Forward Neural Network with BackProp. Have a hidden layer with 13*1.5 nodes and add SoftMax layer. Then, voila, you have yourself a voice based emotion recognition system.

I did this project recently using Savee Database classifying three emotions(Anger, Sad and Neutral). I got a pretty good accuracy and was able to find the emotion of a brand new test case. The project was done in Python, I will be uploading the code soon on github so watch out. :)

View question
About · Careers · Privacy · Terms · Contact · Languages · Your Ad Choices · Press ·
© Quora, Inc. 2025