To DeNIST or Not to DeNIST, that is the question!

Speical Guest Blogger Pete Coons, D4, VP

This is the first of a multipart series that will help define some of the nifty, and often made up terms, in the eDiscovery lexicon.

“Can’t you just DeNIST the data and get rid of all the junk files…?”  This is a question I am often asked.  It usually comes after an individual attends an eDiscovery conference and the magical phrase “DeNIST” was uttered at some point.    The individual is led to believe, or rather wants to believe, it’s a supernatural process that separates all the wheat from the chaff.  Well, that’s only half the story…

Before we can define DeNIST we need to define NIST.  NIST is an acronym for the National Institute of Standards and Technology. (www.nist.gov).  A direct quote from the website:

“Founded in 1901, NIST is a non-regulatory federal agency within the U.S. Department of Commerce.  NIST’s mission is to promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life.” 

Further, NIST has a sub-project called the NSRL or National Software Reference Library.  An excerpt from the website www.nsrl.nist.gov  is below:

“The National Software Reference Library (NSRL) is designed to collect software from various sources and incorporate file profiles computed from this software into a Reference Data Set (RDS) of information. The RDS can be used by law enforcement, government, and industry organizations to review files on a computer by matching file profiles in the RDS. This will help alleviate much of the effort involved in determining which files are important as evidence on computers or file systems that have been seized as part of criminal investigations.

The RDS is a collection of digital signatures of known, traceable software applications.”

A digital signature is akin to a digital fingerprint.  It is also referred to as a hash value. 

In theory, every file has a unique hash value.  If two files have the same hash value they are considered duplicates. 

It also may help to know that most software applications comprise dozens if not hundreds of files. 

When Microsoft Word is installed on a laptop there are hundreds of standard files copied to a computer’s hard drive.   All of these standard install files are the same (identical hash value) no matter what computer they reside on.  

Now imagine a typical computer with dozens of software applications.  A typical computer hard drive contains tens of thousands of files.  As you can well imagine the vast majority are not user generated and hold little to no evidentiary value for litigation purposes.

The NIST list, as it has been unofficially dubbed in the eDiscovery community, contains over 28 Million file signatures. 

It is used regularly by the FBI and other law enforcement agencies to identify files with no evidentiary value.  Best of all, the list is free. 

Many eDiscovery companies take advantage of this free list and incorporate it into their software. 

The list, along with the file signatures, can be stored in a database and used to compare file signatures of data collected (hard drive, server share, etc.) for discovery purposes. 

Any file that has a signature that matches one in the NIST list is DeNISTed (removed) from the collection and it does not move further down the eDiscovery processing chain.   

And there you have it, that’s what DeNISTing means.

Here’s the rub; the NIST list does not contain every single “junk” or system file in the known Universe. 

Many attorneys and legal review teams expect the DeNIST process to get rid of every EXE and DLL on a hard drive or data collection.  It doesn’t work that way.  That’s the left over chaff… 

So while DeNISTing is a definite time and money saver and an important part of the eDiscovery process, it’s not the “one” process that will knock out all the junk. 

Next week we will discuss “Load Files”…

Peter Coons is Vice President at D4 LLC and has over 15 years of experience in litigation and electronic discovery services.

3 thoughts on “To DeNIST or Not to DeNIST, that is the question!

  1. Pingback: Tweets that mention To DeNIST or Not to DeNIST, that is the question! « Bow Tie Law’s Blog -- Topsy.com

  2. Pingback: Around the Block: 1/25/2010 « Post Process

  3. Pingback: Filtering & culling in eDiscovery | Sherpa Software

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s