Academic journal article Journal of the Association for Information Systems

Graph-Based Cluster Analysis to Identify Similar Questions: A Design Science Approach

Academic journal article Journal of the Association for Information Systems

Graph-Based Cluster Analysis to Identify Similar Questions: A Design Science Approach

Article excerpt

(ProQuest: ... denotes formulae omitted.)

1 Introduction

The advent of Web 2.0 led to the emergence of an evolving information infrastructure rich in user-generated content. The rapid growth of user-generated content has made it increasingly difficult for users to find content of interest. Arazy and Kopak (2011) highlight the sheer amount of information and its quality as major concerns today. Moreover, user-generated content also potentially leads to inaccurate, misleading, or outdated information, which researchers refer to as information waste (Amrit, Wijnhoven, & Beckers, 2015). To date, researchers have developed various analytical techniques for searching and recommending user-generated content (Adomavicius & Tuzhilin, 2005; Xu & Yin, 2015). While current search engines enjoy commercial success and demonstrate good performance, their ability to find relevant information for hard questions, such as those asking for opinions or summaries, is far from satisfactory (Harper, Moy, & Konstan, 2009). Social question answering (sQa) services satisfy these complicated user information needs. Instead of relying solely on Web search engines to search using key words, users now turn to SQA services where they find other like-minded individuals who share and meet their information needs. SQA services are dedicated platforms in which users can post their questions and respond to other users' questions (Liu et al., 2008). For example, Apple customers use Apple Store Questions & Answers to ask, answer, and rate questions related to Apple's products. WebMD exemplifies a SQA service for healthcare, and Piazza exemplifies a SQA service for collaborative learning. Yahoo! Answers, another SQA service, covers a diverse range of topics.

SQA services are a collaborative endeavor that involves group effort and open participation (Shachaf, 2010). It is interesting to look at how user-generated content in SQA services relates to not only content but also the associated users. Oh (2012) suggests that users provide answers in SQA services because of altruism or to establish their reputation as an expert in a given area. Consequently, answers contributed to SQA services range in depth depending on the answerer's technical expertise and motives. Personalized answers that other users author can be useful, especially for advice and recommendations that are difficult to answer with a general Web search. To enhance the responsiveness of such services, one can identify similar questions already found in the corpus and return the available answers. Thus, SQA services need to have an efficient mechanism to identify similar questions. However, identifying similar questions is not trivial.

SQA services are rich in multiple-sentence questions: for example, "Is it possible to download anything from YouTube? Like, music onto an iPod or onto a blank CD? If so, how?". Existing techniques to identify similar questions do not apply to or barely work in the context of such complex questions (Tamura, Takamura, & Okumura, 2005). Further, identifying similar user-generated questions collected in SQA services remains largely a challenge due to the lexical mismatch between similar questions. For example, one could also posit the question "My computer keeps displaying a blue screen and it is stuck. What should I do?" with different words as in "How to bring a frozen laptop back to life?".

In SQA services, when an asker posts a question and receives an answer from an answerer, questions and their answers form dyadic content and askers and answerers form dyadic users (Bian, Liu, Zhou, Agichtein, & Zha, 2009). Dyadic content and users result in interlinks and relationships between users and user-generated content. Hence, to overcome the lexical mismatch problem, we propose cluster analysis based on the content-user relationship.

Cluster analysis is a procedure for extricating natural configurations from content and users (Balijepally, Mangalaraj, & Iyengar, 2011). …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.