Weakly-Supervised Sentiment Analysis

Speaker: Dr. Yulan He, Knowledge Media Institute, The Open University, UK

Inviter: Dr. WangBin, Center for Advanced Computing Research, ICT

Time: 10:00am-12:00am, December13th, 2010 (Monday)

Place: Room 440, Institute of Computing Technology, Chinese Academy of Sciences

Abstract:

Sentiment analysis or opinion mining aims to use automated tools to detect subjective information such as opinions, attitudes, and feelings expressed in text. This talk presents some of our recently proposed weakly-supervised approaches by learning sentiment classification models from polarity words without the use of labelled documents. The first one is a joint sentiment/topic model (JST) which detects sentiment and topic simultaneously from text. Experimental results on datasets from five different domains show that the JST model outperforms existing semi-supervised approaches in some of the datasets despite using no labelled documents. Moreover, the topics and topic sentiment detected by JST are indeed coherent and informative. We hypothesize that the JST model can readily meet the demand of large-scale sentiment analysis from the Web in an open-ended fashion.
In the second part of my talk, I will present a novel self-training framework where an initial classifier is learned by incorporating prior information extracted from an existing sentiment lexicon with preferences on expectations of sentiment labels of those lexicon words being expressed using generalized expectation criteria. Documents classified with high confidence are then used as pseudo-labelled examples for automatic domain-specific feature acquisition. The word-class distributions of such self-learned features are estimated from the pseudo-labelled examples and are used to train another classifier by constraining the model’s predictions on unlabeled instances. Experiments on both the movie review data and the multi-domain sentiment dataset show that our approach attains comparable or better performance than existing weakly-supervised sentiment classification methods despite using no labelled documents.

Bio:

Yulan He is currently a Senior Lecturer at the Knowledge Media Institute of The Open University, UK. She previously held the positions of Lecturer at the University of Exeter, UK, Lecturer at the University of Reading, UK, and Assistant Professor at Nanyang Technological University, Singapore. She received the BASc (1st class Honors) and MEng degrees in Computer Engineering from Nanyang Technological University, Singapore, in 1997 and 2001, respectively, and the PhD degree in 2004 from Cambridge University Engineering Department working on statistical models to spoken language understanding. Her early research focused on biomedical literature mining and microarray data analysis. Her current research interests lie in the integration of machine learning and natural language processing for sentiment analysis and information extraction from the Web.