Academic journal article Trends & Issues in Crime and Criminal Justice

Data Reduction and Data Mining Framework for Digital Forensic Evidence: Storage, Intelligence, Review and Archive

Academic journal article Trends & Issues in Crime and Criminal Justice

Data Reduction and Data Mining Framework for Digital Forensic Evidence: Storage, Intelligence, Review and Archive

Article excerpt

The increase in digital evidence presented for analysis to digital forensic laboratories has been an issue for many years, leading to lengthy backlogs of work (Parsonage 2009). This is compounded with the growing size of storage devices (Garfinkel 2010). The increasing volume of data has been discussed by various digital forensic scholars and practitioners such as McKemmish (1999) and Raghaven (2013). While many of the challenges posed by the volume of data are addressed in part by new developments in technology, the underlying issue has not been adequately resolved. Over many years, there have been a variety of different ideas put forward in relation to addressing the increasing volume of data, such as data mining (Beebe & Clark 2005; Brown, Pham & de Vel 2005; Huang, Yasinsac & Hayes 2010; Palmer 2001 ; Shannon 2004), data reduction (Beebe 2009; Garfinkel 2006; Greiner 2009; Keneally & Brown 2005; Raghaven 2013), triage (Garfinkel 2010; Parsonage 2009; Reyes et al. 2007), cross-drive analysis (Garfinkel 2010; Raghaven, Clark & Mohay 2009), user profiling (Abraham 2006; Garfinkel 2010), parallel and distributed processing (Lee, Un & Hong 2008; Nance, Hay & Bishop 2009; Roussev & Richard 2004), graphic processing units (Marziale, Richard & Roussev 2007), intelligence analysis techniques (Beebe 2009), artificial intelligence (Hoelz, Ralha & Geeverghese2009; Sheldon 2005) and visualisation (Teelink & Erbacher 2006). Despite there being much discussion regarding the data volume challenge and many calls for research into the applications of data mining and other techniques to address the problem, there has been very little published work in relation to a method or framework to apply data mining techniques or other methods to reduce and analyse the increasing volume of data. In addition, the value of extracting or using intelligence from digital forensic data has not been discussed, nor has there been any research regarding the use of open, closed and confidential source information during digital forensic analysis.

The growth in volume and number of devices impacts forensic examinations in many ways, including increasing lengths of time to create forensic copies and conduct analysis, which contributes to the increase in the backlog of requests. Digital forensic practitioners, especially those in government and law enforcement agencies, will continue to be under pressure to deliver more with less especially in today's economic landscape. This gives rise to a variety of needs, including:

* A more efficient method of collecting and preserving evidence.

* A capacity to triage evidence prior to conducting full analysis.

* Reduced data storage requirements.

* An ability to conduct a review of information in a timely manner for intelligence, research and evidential purposes.

* An ability to archive important data.

* An ability to quickly retrieve and review archived data.

* A source of data to enable a review of current and historical cases (intelligence, research and knowledge management).

In this paper, a data reduction and data mining framework is proposed that incorporates a process of reducing data volume by focusing on a subset of information. This process is not designed to replace full analysis, but provide a method of focusing an investigation to review items of importance, reduce data storage requirements for archival and retrieval purposes, and provide a capability to undertake intelligence analysis of digital forensic data. Full analysis of digital evidence may still be necessary and the data reduction processes outlined in this paper serve to support analysis rather than replace it.

The contributions of the proposed framework are two-fold:

* a data reduction method to reduce storage demands, and

* a more efficient forensic data subset collection process.

The framework provides the capability to conduct a review of a subset of data as a triage process and to store subset data for intelligence analysis, research, archival and historical review purposes. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.