Academic journal article Journal of Digital Information Management

Sina Weibo Incident Monitor and Chinese Disaster Microblogging Classification

Academic journal article Journal of Digital Information Management

Sina Weibo Incident Monitor and Chinese Disaster Microblogging Classification

Article excerpt

1. Introduction

According to the latest annual report on humanitarian crises and assistance from the United Nations Office for the Coordination of Humanitarian Affairs [1], 97 million people were affected worldwide by national disasters in 2013. China was the most affected country with 27.5 million impacted, followed in orders by the Philippines, India, Vietnam and Thailand. The biggest disaster events in China during this time in terms of cost were earthquakes (US$6.8 billion) followed by Typhoon (US$5.7 billion for Japan and China combined). The overall global trend for the cost of disasters has been steadily increasing over the past 10 years[1].

The response and recovery activities to manage disaster events are typically performed by emergency services agencies that are specifically trained to deal with the situation appropriately. Large scale disasters may involve the armed forces and in some countries international aid agencies help also. Coordinating the efforts of these multiple groups to achieve the best outcome in the shortest time frame is a challenging task and central to these activities is effective and accurate information sharing of the impact to the environment, the people affected and infrastructure damage. This is referred to as situational awareness and it is vital that all those involved share a common operating picture.

Social media has been recognized as an emerging new source of information for emergency managers [2,3,4]. Twitter in particular is an important channel of communication to source content from people experiencing disasters and for emergency services agencies to inform the public of what's going on. For example, Olteanu et al. [4] found that on average 12% of Tweets during natural disasters events were from eyewitnesses. After examining a sample of disaster related tweets they found 15% of messages were from affected individuals, 14% were offering caution and advice and 9% noted information about affected infrastructure and utilities. Similarly, research from the American Red Cross [5] found that 28% of American citizens choose social media services to send messages after disaster events and that 20% obtained emergency information from a mobile application. They also found that 40% of citizens would use social media to inform their contacts they were safe if impacted by an emergency event and if they were to send are quest for help via social media, 70% expected help to arrive in less than three hours of posting.

Twitter has been a widely investigated source of crowd sourced emergency event information [6,7,8,9,10]. This service is not available in China and we wanted to explore how well similar techniques reported using Twitter can be used on publicly available messages from a Chinese microblogging platform. Sina Weibo was chosen since this is essentially the Chinese equivalent to Twitter, is the most influential Chinese microblogging service and has more than 156 million active users per month with more than 69 million active users per day [11].

The task is to identify emergency events described by people experiencing them in China from their Sina Weibo messages. Automatic classification of messages plays an important role in identifying relevant messages. This investigation is the first step in developing a general alert and monitoring system for disaster events in China from content published on a microblogging service.

This paper is organized as follows. In section 2 we will review the related works including processing Chinese text and Microblogging classification. In section 3 we will introduce our SWIM system, followed by the experimental evaluations about four classifiers using the new training data captured by SWIM in section 4. The conclusions are given in section 5.

2. Related Works

2.1 Processing Chinese text

The Lack of whitespace between words is the main difficulty to pre-process Chinese text before classification. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.