Let’s detect sarcasm. Very simple problem, right? (I just went meta.)
Okay. Let’s look at a couple of sarcastic product reviews. Intuitively, if a review has a positive sentiment but a low rating, then it’s probably sarcastic. Examples:
- “I was tired of getting hit on by beautiful women. After I bought this jacket, problem solved!” (Rating: 0.5/5)
- “Great burrito, now actually try cooking the beans.” (Rating: 1/5)
You may have noticed that the sentiment of the reviews are positive (“problem solved”, “great”), but the ratings are low. That seems like a sign of sarcasm.
Now that we suspect there is some relationship between {sentiment, rating} and {sarcasm}, we list down some data points: Sentiment (+1 for positive, 0 for neutral, -1 for negative), Rating (0 to 5), Sarcasm (1 for Yes, 0 for No)
(Sentiment, Rating, Sarcasm)
(1, 0.5, 1)
(1, 1, 1)
(1, 5, 0)
(-1, 4, 1)
(-1, 1, 0)
... and a few thousand more.
So, to find out the actual relationship, we want to work on sentiment and rating values to somehow get the value of sarcasm. We will use layers as steps to move from inputs to output. Let’s look at the first example (1, 0.5, 1):
Each line in that network has a weight. We will use those weights to calculate the values in the circles in the hidden layer and the output layer (which we hope will be ‘1’). Initially we assign weights randomly:
Now we have our initial stupid neural network. Let’s see what the output will be. At each circle (aka “neuron”) in the hidden and output layer, we multiply their inputs with the corresponding weights and sum up the results.
Hidden Layer 1st Neuron =[math] [/math][math](1 * 0.2) + (0.5 * 0.4) = 0.4 [/math]
Hidden Layer 2nd Neuron =[math] (1 * 0.3) + (0.5 * 0.6) = 0.6 [/math]
Hidden Layer 3rd Neuron =[math] (1 * 0.4) + (0.5 * 0.7) = 0.75[/math]
Also, we want the output (Sarcasm) to be a number between 0 and 1 (because nothing else makes sense). We do this by using a magic function on the output layer, that reduces any given number to a number between 0 and 1. Any function we use at any neuron is called the activation function and in this case, we use the sigmoid function on the output layer.
Final Layer = [math](0.4 * 0.3) + (0.6 * 0.4) + (0.75 * 0.5) = 0.735[/math]
Output = [math]sigmoid(0.735)[/math] [math]= 0.324[/math]
So, we have an output 0.324. But we were expecting 1! So, what do we do? We change the weights slightly to nudge the output towards the correct value. We do this using a method called Back propagation, which is explained in this blog.
We repeat this thousands of times covering all the training data, changing the weights slightly every time. Eventually, we’ll get the ‘right’ weights which will best predict sarcasm, given sentiment and rating.
That’s it! Most applications of neural networks that you see, are variations of the above neural network with changes in:
- The structure of inputs and outputs (duh).
- The number of hidden layers/neurons.
- How the neurons are connected.
- The training process.
- The activation function.
… and some other hyper parameters.
And in case you haven’t noticed, logistic regression is just a one-layer neural network. Whaaaaaa
More importantly, can we all take a moment here to appreciate how perfectly circular those circles in my diagrams are? :)