Paper (8 pages)
Author copy: PDF
English linguist John Rupert Firth has a famous saying "you shall know a word by the company it keeps." Most word representation learning models are based on this assumption that a word's semantic meaning can be learned from the context in which it resides. The context is defined as a small unordered number of words surrounding the target word. Research has shown that context alone provides limited information because the context contains only neighboring words. Thus only local information is learned in the word embeddings. Some research tries to improve this by utilizing outside information sources such as a knowledge base. We observe that the meaning of a word in a sentence can be better interpreted when the class information or label of the sentence is presented. We propose three approaches to train class-specific embeddings to encode class information by utilizing the linear compositionality property of word embeddings. We present a general framework consisting of a pair of convolutional neural networks for text classification tasks where the learned class-specific embeddings serve as features. We evaluate our approach and framework on topic classification of a disaster-focused Twitter dataset and a benchmark Twitter sentiment classification dataset from SemEval 2013. Our results show a potential relative accuracy improvement of more than 5% over a recent baseline.
To be published in the Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China, January 2018. An earlier version was presented at the International Workshop on Mining Actionable Insights from Social Networks (MAISoN'17), Cambridge, UK, February 2017.
Back to Brian Davison's publications