Non-Neural Text Summarization
As described earlier, extractive summarization techniques choose a subset of the sentences in the source text to produce summaries. More specifically, all extractive summarizer always follow these three steps (Nenkova and McKeown, 2011):
1. Build an intermediate representation of the main information of the input text.
2. Once the intermediate representation is generated, each sentence is assigned an importance score based on its representation.
3. Eventually, to produce a summary, a summarizer system uses greedy algo-rithms or select the top k most important sentences.
Some other approaches can transform the selection process into an optimization problem where a subset of sentences is chosen while maximizing the overall im-portance and coherency and minimizing the redundancy. Algorithms to build inter-mediate representations for extraction are prolifics. Generally, they can be classified into two major types of approaches based on the representation: topic represen-tation and indicator representation. Topic representation approaches transform the text into topic(s) to falicitate their interpretation in the text. Techniques employed are mostly: topic word approaches, latent semantic analysis and Bayesian topic models (Nenkova and McKeown, 2011). On the other hand, indicator representation ap-proaches describe each sentence by a set of features (indicators) of importance (i.e.: length of sentence, position in the document, having certain phrases,…) and use them to rank directly sentences. Models used could be Naive Bayes, decision trees, support vector machines, Hidden Markov models and Conditional Random Fields. For more information on extractive techniques, we refer the audience to an excellent review by (Allahyari et al., 2017).
Sentiment analysis in dialogues
Although being popular, the neural model is still not well studied for sentiment analysis in the context of dialogues, although there exists close relations and strong respective influences between these two aspects. This motivates us to fill this gap by extending our researches towards interactions between sentiment and dialogue. We start by defining the basic concepts of dialog acts that we will use next.
Dialogue act Dialog acts (DA), a type of speech act (Austin, 1962), is an utterance that serves a function in the dialog in broad terms. A typical dialog system consists of a taxonomy of classifying different functions dialog acts can play. Stolcke et al., (2000) introduced more than 40 tags of different dialog acts often seen in conver-sational speech. Examples of types in this study are shown in the second column of Table 2.7 where each utterance is assigned a unique DA label in a conversation which involves two speakers talking about one of several general topics.
The ability to model and automatically recognize dialog act is an important step towards understanding spoken language. Thus, the problem received a lot of atten-tions from the scientific community. Traditionally, dialog act tagging has followed a supervised approach, which starts with the developing of annotation guidelines, labeling of corpora and training a tagger to classify dialog acts.
Stolcke et al., (2000) developped a probabilistic integration of speech recognition with dialog modeling based on a Hidden Markov model (HMM), in which lexical, prosodic and acoustic features are employed to improve both speech and dialog act classification accuracy. Forsythand and Martell, (2007) studied and built a chat corpus with a higher complexity than a normal spoken dialog when multiple topics are being discussed by multiple people simultaneously. Jeong, Lin, and Lee, (2009) proposed a semi-supervised learning approach to transfer dialog acts from labeled speech corpus to forums and e-mail. They attempted to create domain-independent acts with the restructure of the source act inventories. Vosoughi and Roy, (2016) created six speech acts for Twitter and applied a multi-class classification problem to discover dialog act based on a set of semantic and syntactic features.
This annotate-train-test paradigm has been succesful but with a cost of expen-sive labeling process and limiting the amount of training data. Thus, Woszczyna and Waibel, (1994) proposed an unsupervised HMM for a dialog structure of meet-ing scheduling. Crook, Granell, and Pulman, (2009) employed Dirichlet mixture models to group utterances into a number of common acts and ingored the dialogue sequential structure. Ritter, Cherry, and Dolan, (2010) presented various unsuper-vised approach to cluster raw utterances and infer their corresponding dialog acts from noisy conversations on Twitter.
Sentiment Analysis and Dialogue act Pluwak, (2016) demonstrates that some ex-pressions of sentiments might not be detected with traditional methods of opinion mining, and that exploiting dialog acts may partly solve this challenge. To provide an example in the sport domain: « Bring the old goal-keeper back ! », does not convey a direct polarization of sentiment but through an inference of the act of demand and advice: 1) the player and fans’ unhappiness with the current one and 2) preference for the previous player – both are indicators of a negative judgement towards this goal-keeper. Using such an analysis of the function of each utterance allows the right interpretation of such statements. Therefore, there is a need to understand opinions and expressions of attitude in a broader sense than sentiment recognition based on lexicons or parts-of-speech. Indeed, researchers have long time seen sentiment anal-ysis and dialog act recognition in an intimately close relationship. Clavel and Calle-jas, (2016) studied the impact of both dialog act recognition and sentiment analysis in human-agent conversational platforms and affective conversational interfaces. An embodied conversational agent (ECA) (Figure 2.8), virtual assistant interacting with humans, has to consider the human emotional behaviors and attitudes in order to adapt its behavior accordingly.
Preliminary analysis of sentiment transfer by a Deep CNN
Recent successes of deep learning methods go along with a strong dependence on massive labeled data set (Krizhevsky, Sutskever, and Hinton, 2012; Hinton et al., 2012). In the meanwhile, there are a lot of domains where the collection of labeled data is difficult or simply very expensive. Twitter is such a case. Despites of being abundant of data, human can never label all new tweets posted every day. Distant supervision, a method which automatically collects samples with some pre-defined rules from a larger pool of data and transfers them to another smaller similar task, can tackle this issue. Effectively, on Twitter, Go, Bhayani, and Huang, (2009) showed that emoticons can be a reliable and effective clue for such rules for the sentiment analysis task. Tang et al., (2014a) and Deriu et al., (2016) have followed this stategy and ended up winning Semeval competitions. In these works, they only transfered a shallow convolutional networks (Kim, 2014) on word-level and achieved already remarkable result.
Moreover, neural networks are extremely good at learning the input represen-tation only when the network is deep with abundant data (Yosinski et al., 2014). In previous works, Zhang, Zhao, and LeCun, (2015) and Schwenk et al., (2017) have shown that it is possible to build such a deep network for text classification task with character input level. In the context of Twitter, this is more convenient as it allows the model to learn special tokens, slangs, elongated words, contiguous sequences of exclamation marks, abbreviations, hashtags,… Motivated by these observations, we explore the capability of transferring knowledge learning from abundant data of Twitter to Semeval dataset with a deep convolutional network. The benefit of a deep structure for neural networks is demonstrated in many domains of computer vision and speech recognition but there is no such similar firm claim on natural language processing task until now. To investigate this question, we explore the structure of deep models on character-level as described in Conneau et al., (2016). However, as the length of the sequence in twitter is shorter than in the amazon and movie reviews used in Conneau et al., (2016), we don’t perform pooling operation and use two models: a small model with (1-1-1) convolutional blocks and a bigger model with (2-2-1) convolutional blocks using (256-256-128) features in each block correspondingly. Following Conneau et al., (2016), each convolutional block con-tains two consecutive convolutional layers, where each one followed by a temporal BatchNorm layer (Ioffe and Szegedy, 2015) and an ReLU activation. For the last fully connected layers, we use (5632-4096-2048-512-128) features respectively to observe better the transfer’s process. The rest of other hyperparameters follow Conneau et al., (2016).
Convolutional Block and Transitional Layer
Following He et al., (2016), we define Fl (.) as a function of three consecutive opera-tions: batch normalization (BN), rectified linear unit (ReLU) and a 1×3 convolution. To adapt the variability of the changing dimension of the concatenation operation, we define a transition layer which composes a 1×3 convolution and a 1×2 local max-pooling between two dense blocks. Given a vector cl 1 outputed by a convolutional layer l 1, the local max-pooling layer l outputs a vector cl: j i k (j 1) i<k jcl = max cl 1 (3.3).
where 1 i n and k is the kernel pooling size, n is the sequence length. The word-level DenseNet model is the same as the character-level model shown in Figure 3.4, except for the last two layers, where the local max-pooling and the two fully connected layers are replaced by a single global average pooling layer. In the figure, 3, Temp Conv, 128 means temporal convolutional operation with kernel window size = 3 and filter size = 64; pool/2 means local max-pooling with kernel size = stride size = 2, it will reduce the size of the sequence by a half. We empirically observed that better results are obtained with word tokens.
Table of contents :
1.1 Sentiment Analysis
1.1.1 Non-Neural Sentiment Analysis
1.1.2 Neural Sentiment Analysis
1.2 Automatic summarization
1.2.1 Non-Neural Text Summarization
1.2.2 Neural Text Summarization
1.3 Thesis Outline
1.3.1 Thesis contributions
Part I. Sentiment Recognition
2.1 Neural Networks architectures for sentiment analysis
2.2 Sentiment analysis in dialogues
3 Impact of Neural Networks depth for sentiment analysis
3.2 Preliminary analysis of sentiment transfer by a Deep CNN
3.2.2 Twitter data
3.2.3 Transfer learning results
3.3.2 Dense Connectivity
3.3.3 Convolutional Block and Transitional Layer
4 Dialogue acts and sentiment analysis
4.2 Mastodon Corpus
4.3 Multi-task model
4.3.1 Model description
4.3.2 Training procedure
4.4.1 Multi-task experiments
4.4.2 Transfer between tasks
Part II. Sentence Compression
5 Related work
5.1 Neural Sentence Summarization
5.2 Neural Text-to-text Generation
6 RL sentence compression
6.2 Extraction of Dependency Subtrees
6.3.1 Extractor Network
6.3.2 Abstractor Network
6.3.3 Reinforce Extraction
6.4.1 Full Select-and-Paraphrase Model
6.4.2 Oracle Setting
7 Enriching summarization with syntactic knowledge
7.2.2 Integrating Syntax
7.2.3 Reinforcement Learning
8.1 Summary and Conclusions
8.2 Directions for Future Research