Special Guest Blogger Pete Coons, VP of D4.
This is part two of a discussion that attempts to explain some of the often used and often misunderstood eDiscovery terms.
Last week I discussed DeNIST’ing and now we will tackle “Load Files”. I am also going to squeeze in “Processing”.
The Sedona Conference Glossary (great reference document) defines a Load File as:
“Load file: A file that relates to a set of scanned images or electronically processed files, and indicates where individual pages or files belong together as documents, to include attachments, and where each document begins and ends. A load file may also contain data relevant to the individual documents, such as metadata, coded data, text, and the like. Load files must be obtained and provided in prearranged formats to ensure transfer of accurate and usable images and data.”
That pretty much covers it. Now let’s take a step back and attempt to break down exactly where a Load File fits into a typical eDiscovery process.
Let’s say Company XYZ is being sued by a former employee for discrimination. Company XYZ must now identify and preserve documents that may be relevant to the claim. Data is identified and collected by a qualified individual within the organization or by a third party eDiscovery/Forensic service provider. Typically, the collected data is then processed (another confusing term) so it can be placed into a database for review and eventual production to the opposing party.
The Sedona Conference Glossary defines Processing as:
“Image Processing: To capture an image or representation, usually from electronic data in native format, enter it in a computer system and process and manipulate it.”
Processing data usually involves ingesting the file or e-mail into eDiscovery software. The software then catalogs the file and extracts all available text. This text is usually placed into a separate text file and it’s associated with the native file or e-mail. The processing process also extracts various metadata elements from the file and stores that information in a database.
Let’s take a Word Document that contains the text “Hello World”. After ingesting into the eDiscovery software a record is created in the database. That record will contain metadata elements about the file such as: Author, Date Created, Date Modified, and Date Last Printed.
As stated previously, the software will also create and store an accompanying text file that will have the text “Hello World” in addition to saving the original native file.
We know that no case involves just one document so let’s pretend we have 10,000 Word documents and 10,000 e-mails. It really doesn’t matter because the process is basically the same.
This is a very simple explanation of the processing process and there are other steps that occur, like indexing or tiffing (imaging), but for all intents and purposes our data is now processed and it can now be prepared to be loaded into a review database. This is where the Load File comes into play.
We have to get the data OUT so we can put it back IN somewhere.
Load files are usually simple text files (some are a bit more complex). Meaning they can be opened and viewed with Notepad or WordPad in Windows. To the uninitiated they may look daunting but after seeing a few they begin to look the same.
Think of a load file as transport file that is used to facilitate the transfer of data and its associated metadata from one database to another. We have to LOAD the data into one database from another. The load file contains information about each and every record (file) that was processed.
That information can include the original file name, the author, date created, beginning ID number, the number of attachment that exist in an e-mail, the parent ID of the attachment, etc. There are potentially dozens of metadata objects that can be provided in a load file (usually agreed upon by both parties prior to its creation). The load file will also contain a link to the native file and the accompanying extracted text file.
The load file along with the native file (or tiff) and its extracted text is the complete package for loading into another database.
And that’s processing and load files!