Academic journal article American Academic & Scholarly Research Journal

Feature Selection for Opinion Polarity Detection by Machine Learning Method

Academic journal article American Academic & Scholarly Research Journal

Feature Selection for Opinion Polarity Detection by Machine Learning Method

Article excerpt

Abstract. Recent years have seen increasing interest in techniques of opinion mining and subjectivity analysis. In this article, we outline the results generated by our approach to detecting features for the classification polarity of opinions in French language using machine learning techniques. Indeed, in sentiment analysis, identifying features associated with an opinion can help to produce a finer-grained understanding of subjective previews. In this article, the proposed system consists of three phases: the pretreatments of the corpus, the extraction of the features and the classification. The second phase of our work represents the combination of the co-occurrence analysis for a better management of the intrinsic semantics of a word carrying opinion, and therefore a better extraction of features for classification.

Keywords: opinion mining, intrinsic semantics, polarity classification, Textometry, co-occurrence, Features extraction.

(ProQuest: ... denotes formulae omitted.)


Detection of opinions (also known as Sentiment Analysis) is the subject of a particular craze whether in academia or industry. Indeed, with the emergence of discussion groups, forums, blogs and compiling consumer reviews site, there is a very large mass of documents containing information expressing opinions, constituting a huge source of data for various applications survey (technological, marketing, competitive, societal). Much research at the crossroads of NLP and data mining, is addressing the problem of detecting opinions. The "bag of words" approach is one of the first models of textual representation, which is still today often used for sentiment analysis. The text is represented as a set of n-grams without consideration of their order of appearance and relationships in the text. Traditional approaches in machine learning (Naive Bayes or SVM) then use this representation to construct sentiments classification systems.

The accuracy of this kind of approach can be very high, especially when advanced features selection techniques are used in conjunction with additional lexicons extracted from texts previously identified as a carrier of opinion. However, we believe that model properties can identify more complex expressions of sentiments beyond simple recognition of opinionated construction, which should allow obtaining better classification systems. One problem with the bag of words approach is the loss of information during the construction of textual representation, seen as collections of differentiated terms. Yet, the relationship between the words in the text are often very important when determining whether the degree or the polarity of a sentiment.

In this article, we present the method used for the selection of textual features used by machine learning methods. For this, we proceeded to the preparation of data using tools of textometric analysis to select those features. Many researchers have worked on the identification of features and sentiments. (Hu & Liu, 2004; Liu et al., 2005) proposed several techniques for mining feature-associated opinions expressed in reviews. In (Su et al., 2008) an unsupervised approach based on the mutual reinforcement principle is presented. Their approach clusters features and opinion words simultaneously by using both content and link information. To identify the feature-sentiment pair for real-life reviews, a new statistical Natural Language Processing approach that combines both syntactic and statistic knowledge was proposed in (Hai et al., 2010). These are used as a vector representation of the texts so that they can be used for supervised learning. To make this selection, we assume the following postulate: "to choose between positive or negative polarity of a text, we shall merely detect in it the indicators of opinions. Several methods can be used, for example the presence or absence of a set of determined words, the location of certain words (Hai et al. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.