Academic journal article Interdisciplinary Journal of Information, Knowledge and Management

Text Classification Techniques: A Literature Review

Academic journal article Interdisciplinary Journal of Information, Knowledge and Management

Text Classification Techniques: A Literature Review

Article excerpt

Introduction

Unstructured data remains a challenge in almost all data intensive application fields such as business, universities, research institutions, government funding agencies, and technology intensive companies (Khan, Baharudin, Lee, &Khan, 2010). Eighty percent of data about an entity (person, place, or thing) are available only in unstructured form (Khan et al., 2010). They are in the form of reports, email, views, news, etc. Text mining/ analytics analyzes the hitherto hidden relationships between entities in a dataset to derive meaningful patterns which reflect the knowledge contained in the dataset. This knowledge is utilized in decision making (Brindha, Sukumaran, & Prabha, 2016).

Text analytics converts text into numbers, and numbers in turn bring structure to the data and help to identify patterns. The more structured the data, the better the analysis, and eventually the better the decisions would be. It is also difficult to process every bit of data manually and classify them clearly. This led to the emergence of intelligent tools in text processing, in the field of natural language processing, to analyze lexical and linguistic patterns (Brindha et al., 2016).

Clustering, classification, and categorization are major techniques followed in text analytics(Vasa, 2016).It is the process of assigning, for example, a document to a particular class label (say "History")amongother available class labels like "Education", "Medicine" and "Biology". Thus, text classification is a mandatory phase in knowledge discovery (Vasa, 2016).The aim of this article is to analyze various text classification techniques employed in practice, their spread in various application domains, strengths, weaknesses, and current research trends to provide improved awareness regarding knowledge extraction possibilities.

Though there is voluminous literature stating the capabilities of different types of text classification techniques, the spread of these techniques in advanced fields like Artificial Intelligence (AI) /Machine Learning (ML) is seldom reported. Further, reviewing text classification approaches from an algorithmic point of view will benefit both the industry and academia equally.

The amount of corporate data is constantly increasing, and the growing need for automation of complex data intensive applications drives the industry to look for better approaches for knowledge discovery. This knowledge will lead to insightful investments and increase the productivity of an organization. This article will also be helpful for researchers to understand the capacity of various text classification techniques before working with data intensive applications and their adaptability to AI procedures.

The world requires more intelligent systems like "Siri"; therefore, developing AI-based text processing models are the need of the hour. Apart from these, the variety of data available, cheaper computational processing, and affordable data storage calls for automation of data models that can analyze complex data to deliver quick results. For this reason, this study analyzes text classification techniques with respect to AI/ML.

The rest of the paper is organized as follows. The literature review section describes the current research trends in various text classification techniques. The methodology section describes the nature of study undertaken for this article. The text classification techniques section elaborately describes various approaches. The findings section explains various results observed from the articles reviewed. The discussions section explains research gaps, and the conclusion section highlights some of the current trends and future research options in text classification techniques.

Literature Review

This article is a literature review of various studies related to text classification approaches; therefore, this section elucidates some of the research directions observed in this regard. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed

Oops!

An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.