Magazine article Information Management

Techniques for Making Molehills out of Unstructured Data Mountains

Magazine article Information Management

Techniques for Making Molehills out of Unstructured Data Mountains

Article excerpt

Visual analytics, a new technology that graphically illustrates datasets, helps users quickly identify responsive documents for electronic discovery review.

The astounding volume of data produced, shared, and stored by organizations today is accelerating at a greater pace than ever before. Managing this information in the past posed some challenges, but what was once considered "a lot of data" is nothing compared to what is now measured in terabytes (1,000 gigabytes) or even petabytes (1 million gigabytes).

Prior to when most data was created and stored electronically, business professionals would create a document, use it for its intended purpose, and then periodically make decisions about whether or not to file the information. Organizations archived only that which they deemed truly important because they had neither the time nor the money to engage in elaborate document storage systems.

With the adoption of and increased reliance on computers, the decision to retain information no longer revolves around manually filing a document; it focuses on actively deleting it. But with the availability of petabytes of computer storage, workers may not feel the need to delete or destroy files. Predictably, organizations have amassed huge volumes of archived materials, saved on hard drives or back-up tape media.

Over time, offsite storage of archived documents has become the norm. However, with the materials now stored remotely, an "out-of-sight, out-of-mind" approach to dealing with the data also has become common. As a result, organizations often find themselves overwhelmed when required to sort through the data pool to produce responsive documents during litigation or regulatory compliance activities in preparation for electronic discovery review. Collecting all of this data takes a great deal of time, requiring a number of steps, often starting with tape restoration. A series of processing activities follows, including de-duplication, keyword searches, and data filtering, each of which takes times and may add thousands of dollars in associated expenses.

Adding to the frustration over these mountains of data that must be managed is the reality that only a small portion of each document collection is even responsive to the case at hand. So, after dedicating significant time and expense to collecting and processing vast amounts of data, much of the effort is inevitably for naught.

The good news is that today's technology offers some help in dealing with large sets of unstructured data, with some tools taking a very logical approach to leverage the strengths of both man and machine. One such advancement is the development of a visual method for analyzing and managing data collections.

Pictures Are Worth a Thousand Words

Research has shown that people generally tend to be visual in nature, and - given the choice - they prefer to view graphical or illustrative representations of material as opposed to text. Not only is it their preference to receive information in this format, but people typically tend to process visually presented information faster and are more inclined to retain it.

Visual techniques for learning and processing a wide range of information are known to be quite effective with a vast majority of the general population. This is why teachers, for example, use a variety of visual aids in their classrooms and why lawyers use video and other graphical illustrations in their trial presentations.

But can images or any other kind of visual demonstration be effectively used to manage and analyze vast collections of data? The latest technology garnering significant attention in the legal industry leverages the visual nature of the human mind to do just that in a way that is revolutionizing how attorneys develop their discovery and case strategies.

Visual analytics is a new approach to reviewing large collections of data. In fact, this method of analyzing the contents of a dataset is among the best available for collections of significant size because its effectiveness is not compromised regardless of how many documents might be included. …

Search by... Author
Show... All Results Primary Sources Peer-reviewed


An unknown error has occurred. Please click the button below to reload the page. If the problem persists, please try again in a little while.