HYBRID ARCHITECTURE FOR SENTIMENT ANALYSIS USING DEEP LEARNING

: Sentiment analysis involves classifying text into positive, negative and neutral classes according to the emotions expressed in the text. Extensive study has been carried out in performing sentiment analysis using the traditional ‘bag of words’ approach which involves feature selection, where the input is given to classifiers such as Naive Bayes and SVMs. A relatively new approach to sentiment analysis involves using a deep learning model. In this approach, a recently discovered technique called word embedding is used, following which the input is fed into a deep neural network architecture. As sentiment analysis using deep learning is a relatively unexplored domain, we plan to perform in-depth analysis into this field and implement a state of the art model which will achieve optimal accuracy. The proposed methodology will use a hybrid architecture, which consists of CNNs (Convolutional Neural Networks) and RNNs (Recurrent Neural Networks), to implement the deep learning model on the SAR14 and Stanford Sentiment Treebank data sets.


I. INTRODUCTION
In today's world, reviews of a service provided by a business play an integral role in determining its success. From a restaurant on Zomato to a product on Amazon, reviews are invariably used by customers while making decisions about which products or services to purchase. These days social media also plays a large role in providing a platform for people to express their opinions regarding businesses. Hence a technique that can classify these opinions and emotions into distinct classes or categories would be invaluable to any business, as they would no longer have to pore over millions of reviews manually to understand the success of their service as per the voice of their customers. Sentiment analysis is such a technique, in which text is broadly classified into three classes; positive, negative and neutral. Extensive research has been done on sentiment analysis using a feature selection approach. The main focus of this paper is to discuss a proposed methodology for implementing sentiment analysis using deep learning techniques, which is a relatively unexplored domain. We will be using the Stanford Sentiment Treebank data set and the SAR14 movie review dataset to test our model and optimize accuracy. The Stanford Sentiment Treebank dataset consists of 11855 sentences extracted from Rotten Tomatoes movie reviews and has fully labeled parse trees with five classes as labels (extremely negative to extremely positive). It has 8544 training examples, 1101 validation samples and 2210 test cases. The SAR14 data set is a data set of 233600 movie reviews The input data will first be pre-processed, followed which a technique called word embedding will be performed. The resulting output will then be fed into a deep neural network architecture. In order to achieve optimal accuracy, we propose a hybrid architecture made up of CNNs and RNNs. Trees Approach and concludes that despite the fact that deep learning approach is more complicated rather than simple methods as "bag-of-words", it shows slightly better results. [3] Feature level Sentiment Analysis on Movie Reviews written by Pallavi Sharma et al. talks about classifying the polarity of the movie reviews on the basis of features by handling negation, intensifier, conjunction and synonyms with appropriate pre-processing steps. The work proposed in this paper has given an accuracy of 81%, the performance is decreased because of not handling the True Negative reviews correctly. [4] Houshmand Shirani-Mehr in his paper-Applications of Deep Learning to Sentiment Analysis of Movie Reviews compares the performances of different deep learning techniques as well as the implementation of the traditional Naïve Bayes classifier on performing sentiment analysis of movie reviews obtained from the Stanford Sentiment Treebank. [5]

III. TRADITIONAL APPROACH
The traditional sentiment analysis approach uses a bag-ofwords approach to represent sentences and then feeds the fixed length vectors obtained as a result of this to a classifier such as Naive Bayes and SVMs. However, this approach fails to take into consideration the order of the words and hence leads to a less accurate model. Another modelling technique used is ngrams, but this has a fixed range and cannot be used for long term dependencies. [6] These approaches also often fail to classify sentences with a negation, such as 'The movie was not amazing' as negative and often give it a positive label.
A deep learning approach can overcome these problems. Word embedding provides contextual information about words and hence allows us to gain intuition about the sentences. Deep neural networks such as RNNs allow us to process sequential data, so information about the order of the words is retained and CNNs allow us to include phrase information in a word by word representation.

A. Word Embedding
A neural network cannot efficiently process text input as it cannot perform operations such as multiplication and convolution on words. Hence, we use a technique called word embedding, where the text is represented as vectors in space such that the distance between vectors depends on the semantic similarity between the words. Hence word embedding produces an embedding matrix of the input words which captures their meaning, semantic and contextual relations. To perform word embedding on our input data we will use the word2vec model. Word2vec is a two-layer shallow neural network that takes a text data set as input and first constructs a vocabulary from the text data. [3] It then learns the vector representation of these words and produces the word vectors as output. The output generated by the word2vec model will be the input to the neural network architecture.

B. Hybrid Architecture
This is a novel approach which applies CNN model followed by an RNN model. The model consists of an initial convolutional layer, a middle recurrent network layer and a final fully connected layer. This allows us to include phrase information in a word by word representation, obtained by CNN, which act as time steps for the RNN. We will test this by using a traditional RNN as well as an LSTM, to see whether the vanishing gradient problem makes a difference in our model and improves the accuracy.

1) CNN:
Convolutional Neural Networks (CNN) are powerful deep models for understanding image content, and have recently started being used in numerous classification problems [7]. CNN's have also started being used in text classification problems given their great performance.
Word2vec, proposed by Google, is a two-layer neural network model, which transforms words into vectors rather than sampling each value from a uniform distribution. These vectors are arranged in the vector space such that words with similar These vectors are fed to the CNN as input.

i) Convolution operation
Features obtained from N-grams of varying length play different roles in the final decision of the sentiment classification. Let's take a sentence "An adaption of Ministry of Utmost Happiness would probably have been better as a movie than a TV show", as an example to explain this better. In this case 3-grams would probably detect "have been better" as a positive indicator, whereas 5-grams would mostly detect "would probably have been better" as a negative indicator.
A convolution operation involves a filter f which takes a window of h word embeddings to produce a new feature c i [8]. [8] b is the bias term. f is a non-linear function such as a hyperbolic tangent. A feature map c is constructed by applying this filter to each possible window of words in the sentence. c  [ c 1 , c 2 , …. c n-h+1 ] [8] ii) Max Pooling Operation The feature maps obtained are now passed over to a max over time pooling operation layer. This step is used to select the most important feature, the one with the highest value. This pooling operation is applied to all the feature maps to obtain a vector with the most important features c max = max{ C } = max { c 1 , c 2 , …. c n-h+1 } [8] By using this scheme, we naturally deal with variable sentence lengths.

iii) Dropout and Softmax
To prevent overfitting of parameters and assigning too much weight to a particular node, we use dropout. Dropout randomly drops out parameters of hidden units in the classifier. This sets a portion of the features pooled in the previous layer to zero, so that only the unaffected units play a role in calculating gradients when passed to the softmax layer [8].
The softmax layer uses the features which were regularized using dropout, and computes the probability distribution of the input over all the labels, in our case the sentiments. This layer squashes a K dimensional vector contain real values into a vector containing values in the range [0,1], that add up to 1.

2) RNN:
Recurrent Neural Networks(RNNs) are powerful deep neural networks. Unlike traditional feedforward neural networks, RNNs have memory to a limited capacity, that is, the current input at a particular time step depends on the previous inputs as well. This makes RNNs an ideal tool for sequential data or data in which the order matters. Sentiment analysis is such an application, as the order in which words appear often change their contextual meaning. For example, in the sentence "The movie was not good" the word 'not' appearing before 'good' gives the sentence a negative sentiment.
RNNs have loops in them which allows information to be carried across the network as input is processed' [9]. An RNN can be depicted by figure -- To make it easier to understand, an RNN can be 'unrolled' into a series of connected feedforward networks and depicted as shown in figure --. Figure 2. RNN Unrolled [10] X t refers to the current input at time step t, which in our case is the t th word of a given sentence represented as a one hot vector. A is the hidden layer, which is recurrent across the network, and h t refers to the output at time step t, which is a function of the current input as well as the output of the previous input. h t is calculated as, h t =  ( W H h t-1 + W X x t ) [11] where W H is the hidden layer weight matrix and W X is the weight matrix which is multiplied with the input and is different for each input.
At the end of the network, is an activation function such as a softmax function, which produces the final output, that is a numerical value between 0 and 1 which represents the sentiment of the given sentence.
RNNs, however have an issue; they cannot handle long term dependencies. Just like feedforward neural networks, RNNs update their weight matrices using backward propagation through time(BPTT), that is, for each layer, the error is calculated and sent backwards through the network, and the weight matrices are updated. This can lead to the vanishing gradient problem which results in a point at which the network stops learning.
Hence, we propose to use an LSTM (Long Short-Term Memory) network, which is a more complex and effective RNN which overcomes with the vanishing gradient problem and is hence capable of learning long term dependencies. [6] An LSTM is also composed of a chain of repeating units however unlike in RNNs, each unit consists of four layers. The four components of a unit of an LSTM are; an input gate, a forget gate, an output gate and a new memory container. Figure --depicts the internal components of an LSTM unit. Figure 3. LSTM [6] The input gate is used to give different amounts of emphasis to different inputs, the forget gate is used to decide which information will continue to persist and which information is unnecessary and can be thrown away, and the output gate is used to determine the final output h t . With the help of these components, an LSTM can regulate what inputs are important and should impact the final output of a sentence and can eliminate words such as 'the' and 'a', which have no bearing on the sentiment of a sentence.

V. CONCLUSION
Sentiment analysis is a very useful task for businesses to understand the emotions of their customers regarding their products and services. There are several methods of performing sentiment analysis; deep learning is one such, largely unexplored method. This methodology proposes a deep learning model to classify the Stanford Sentiment Treebank data set and the SAR14 movie review data set. The main steps involved in the proposed methodology are; pre-processing, word embedding and feeding input into a deep neural network architecture. A hybrid architecture of CNNs and RNNs is proposed to achieve optimal accuracy.
There are three types of sentiment analysis; document level, sentence level, and aspect level. Document level sentiment analysis refers to labelling the sentiment of an entire piece of text while sentence level sentiment analysis labels the sentiment of individual sentences. Aspect level sentiment analysis on the other hand identifies the main features in a document and labels the sentiment expressed regarding those features. For example, in a review of a smartphone, an aspect level sentiment analysis model will identify features such as the camera and battery life and assign a label to each feature.

VI. FUTURE SCOPE
The proposed model will perform sentence level sentiment analysis; however, we will modify the model in the future to perform aspect level sentiment analysis. We will also further extend our model and improve its performance by implementing an architecture that uses boosted CNNs. A normal CNN as proposed in the model, takes a fixed length n gram input, however a boosted CNN architecture is composed of a series of CNNs where each CNN has different filters and an Adaboost which will filter the outputs and pick the best result as the final output. This will improve the accuracy of the model.