Speical Guest Blogger Pete Coons, D4, VP
This is the first of a multipart series that will help define some of the nifty, and often made up terms, in the eDiscovery lexicon.
“Can’t you just DeNIST the data and get rid of all the junk files…?” This is a question I am often asked. It usually comes after an individual attends an eDiscovery conference and the magical phrase “DeNIST” was uttered at some point. The individual is led to believe, or rather wants to believe, it’s a supernatural process that separates all the wheat from the chaff. Well, that’s only half the story…
Before we can define DeNIST we need to define NIST. NIST is an acronym for the National Institute of Standards and Technology. (www.nist.gov). A direct quote from the website:
“Founded in 1901, NIST is a non-regulatory federal agency within the U.S. Department of Commerce. NIST’s mission is to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life.”
Further, NIST has a sub-project called the NSRL or National Software Reference Library. An excerpt from the website www.nsrl.nist.gov is below:
“The National Software Reference Library (NSRL) is designed to collect software from various sources and incorporate file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This will help alleviate much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations.
The RDS is a collection of digital signatures of known, traceable software applications.”
In theory, every file has a unique hash value. If two files have the same hash value they are considered duplicates.
It also may help to know that most software applications comprise dozens if not hundreds of files.
When Microsoft Word is installed on a laptop there are hundreds of standard files copied to a computer’s hard drive. All of these standard install files are the same (identical hash value) no matter what computer they reside on.
Now imagine a typical computer with dozens of software applications. A typical computer hard drive contains tens of thousands of files. As you can well imagine the vast majority are not user generated and hold little to no evidentiary value for litigation purposes.
It is used regularly by the FBI and other law enforcement agencies to identify files with no evidentiary value. Best of all, the list is free.
Many eDiscovery companies take advantage of this free list and incorporate it into their software.
The list, along with the file signatures, can be stored in a database and used to compare file signatures of data collected (hard drive, server share, etc.) for discovery purposes.
Any file that has a signature that matches one in the NIST list is DeNISTed (removed) from the collection and it does not move further down the eDiscovery processing chain.
And there you have it, that’s what DeNISTing means.
Here’s the rub; the NIST list does not contain every single “junk” or system file in the known Universe.
Many attorneys and legal review teams expect the DeNIST process to get rid of every EXE and DLL on a hard drive or data collection. It doesn’t work that way. That’s the left over chaff…
So while DeNISTing is a definite time and money saver and an important part of the eDiscovery process, it’s not the “one” process that will knock out all the junk.
Next week we will discuss “Load Files”…
Peter Coons is Vice President at D4 LLC and has over 15 years of experience in litigation and electronic discovery services.