Paper (4 pages)
Author copy: PDF
Modern embedding methods focus only on the words in the text. The word or sentence embeddings are trained to represent the semantic meaning of the raw texts. However, many quantified attributes associated with the text, such as numeric attributes associated with Yelp review text, are ignored in the vector representation learning process. Those quantified numeric attributes can provide important information to complement the text. For example, review stars, business stars and number of likes, etc., have great influence on interpreting the semantic meaning of text. Numeric attributes associated with the text often reveal the quantity or the significance of the object that the number is modifying. We propose an algorithm using vector projection to generate numeric-attribute-powered sentence embeddings for multi-label text classification. We evaluate our algorithm on a public Yelp dataset, showing that classification performance improves significantly when numeric attributes are incorporated well.
To be published in the Proceedings of the IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China, January 2018.
Back to Brian Davison's publications