Why Deviate from Native Files in a Case Management Order?

ConjectureThere are Case Management Orders that show parties spent a lot of time considering eDiscovery issues. There are the ones that show a lack of thought. There are ones that are mixed.

This one shows a lot of forethought, but I am puzzled by the form of production.

Technology Assisted Review is Good for You and Me

There is nothing magical about using Technology Assisted Review. There is also no rule requiring specific technology to find responsive electronically stored information. The issue is always one of whether a production was adequate.

The Case Management Order in Green v. Am. Modern Home Ins. Co., states the following on Technology Assisted Review:

  1. Technology Assisted Review in Lieu of Search Terms. In lieu of identifying responsive ESI using the search terms and custodians/electronic systems as described in Sections II.C & II.D above, a party may use a technology assisted review platform to identify potentially relevant documents and ESI.

Green v. Am. Modern Home Ins. Co., 2014 U.S. Dist. LEXIS 165956, 4 (W.D. Ark. Nov. 24, 2014).

I would argue such a decree in a Case Management Order is unnecessary under the Federal Rules of Civil Procedure and case law, but such a specific order should preemptively end any question on whether predictive coding, data analytics, “find similar,” conceptual search, and any other available search technology can used in the case.

The Form of Production

I am not a fan of converting native files to TIFFS and conversion to OCR, absent the need to redact confidential or privileged information. That is exactly what this order proscribed, minus spreadsheets:

  1. Format. All ESI, other than databases or spreadsheets, shall be produced in a single- or multi-page 300 dpi TIFF image with a Concordance DAT file with standard delimiters and OPT file for image loading. The documents shall also be processed through Optical Character Recognition (OCR) Software with OCR text files provided along with the production. Extracted Text shall be provided for all documents unless it cannot be obtained. To the extent a document is redacted, OCR text files for such document shall not contain text for the redacted portions of the document. Each TIFF image will be assigned a Bates number that: (1) is unique across the entire document production; (2) maintains a constant length across the entire production padded to the same number of characters; (3) contains no special characters or embedded spaces; and (4) is sequential within a given document. If a Bates number or set of Bates numbers is skipped in a production, the Producing Party will so note in a cover letter or production log accompanying the production. Each TIFF image file shall be named with the Bates Number corresponding to the number assigned to the document page contained in that image. In the event a party determines that it is unableto produce in the format specified in this section without incurring unreasonable expense, the parties shall meet and confer to agree upon an alternative format for production.
  1. Metadata. To the extent that any of the following metadata fields associated with all applicable documents are available, the Producing Party will produce those metadata fields to the Requesting Party: file name, file size, author, application date created, file system date created, application date last modified, file system date last modified, date last saved, original file path, subject line, date sent, time sent, sender/author, recipient(s), copyee(s), and blind copyee(s). For emails with attachments, the Producing Party will indicate when a parent-child relationship between the message and the attachment exists. A Producing Party shall also produce a load file with each production with the following fields: Starting Bates; Ending Bates; Begin Attach; End Attach; and Source (custodian/location from which document was collected). If any metadata described in this section does not exist, is not reasonably accessible, is not reasonably available, or would be unduly burdensome to collect or provide, nothingin this ESI Order shall require any party to extract, capture, collect or produce such metadata.

Green, 4-7.

The order does included extracted text, but why go to the trouble of requiring production as TIFFs in the first place? The statement about OCR could be misconstrued to requiring OCRing the TIFFs when any searchable information is already available on the form of extracted text, thus OCRing is both redundant and adds cost. The only reason to OCR a TIFF is because it needs to be redacted, because producing extracted text would inadvertently produce the redacted content.

Most review applications today do a great job of ingesting native files and allowing users to review in near-native. If the native file needs to be accessed, most applications allow for reviewing the native within the review application or a copy downloaded for review in the native application.

Requiring conversion to static images is not the default of Federal Rule of Civil Procedure Rule 34. I do not recommend requiring conversion to TIFF for production, unless there is a substantial amount of redactions that must take place.

There are many types of metadata, from embedded, to substantive, to system. The above order reflects metadata as it was objective coding, seeking specific information. While all useful information, I would encourage parties to think in more terms of types of metadata, in addition to how the information should appear in a review application.

Spreadsheets in Native File Format

The order stated the following on spreadsheets:

  1. Spreadsheets. Absent special circumstances, Excel files, .csv files and other similar spreadsheet files will be produced in native format (“Native Files”). Native Files will be provided in a self-identified “Natives” directory. Each Native File will be produced with 6a corresponding single-page TIFF placeholderimage, which will contain language indicating that the document is being produced as a Native File. Native Files will be named with the beginning Bates number that is assigned to that specific record in the production. A “NativeLink” entry for each spreadsheet will be included in the .DAT load file indicating the relative file path to each native file on the Production Media. Native Files will be produced with extracted text and applicable metadata fields if possible and consistent with Section III.A.2 above. For documents that contain redacted text, the parties may either apply the redactions directly on the native file itself or produce TIFF image files with burned-in redactions in lieu of a Native File and TIFF placeholder image. Each Producing Party will make reasonable efforts to ensure that Native Files, prior to conversion to TIFF, reveal hidden data from redacted Native Files that are produced as TIFF image files and will be formatted so as to be readable. (For example, column widths should be formatted so that numbers do not appear as “#########”.) Under these circumstances, all single-page TIFF images shall include row and column headings.

Green, at *8-9.

I am glad the default for spreadsheets did not deviate from the Rule 34. I am curious if any of my case manager friends would agree with the order requiring TIFF placeholders and renaming the native files.

The past year has seen parties become more detailed in their case management orders regarding electronically stored information. This is a good thing. However, I strongly encourage parties to not deviate from the Federal Rules of Civil Procedure without reason, leverage the search abilities of their review applications, and make sure the case management order helps the case comply with Federal Rule of Civil Procedure Rule 1.

The Empire State Strikes Back (On the Form of Production)

In an insurance dispute over coverage, a Plaintiff sought production of electronically stored information in native file AND TIFF format after the Defendant produced discovery in hard-copy format. The Defendant opposed re-producing in native file format and sought cost-shifting if required to produce natively. Mancino v Fingar Ins. Agency, 2014 N.Y. Misc. LEXIS 30 (N.Y. Misc. 2014).

EmpireStateBuildingNew York law allows the “full disclosure of all matter material and necessary” in a lawsuit. Mancino, at *3 citing CPLR §3101(a).

The Plaintiff sought the ESI in native file format with TIFF images in order to view objective metadata including the author(s), dates of creation, and dates of edits on a key file to know whether an “Activity Report” was changed after the initial creation or the start of the lawsuit.  Mancino, at *7.

The Defendant countered that issues of metadata were “not involved” in the lawsuit and such a production was unnecessary. Id. the Defendant further argued the Plaintiff should have incurred the $3,500 native production costs and that the TIFFing would be a “laborious task.” Mancino, at *8.

Judge Rakower quickly listed the Zubulake cost-shifting factors (cited in U.S. Bank Nat. Ass’n v. GreenPoint Mortgage Funding, Inc., 94 A.D. 3d 58, 63-64 [1st Dept 2012]) and held that cost-shifting was not justified and that the producing party was to pay their own production costs. The Court clearly ordered the production of the ESI in both native file format with TIFFs. Mancino, at *8-9.

Bow Tie Thoughts

State court litigation is often overlooked by eDiscovery commentators.  Mancino is a very good reminder that over 90% of litigation in this country is in state court about regular people. The Plaintiffs in this case had their home burglarized and the resulting litigation was over coverage to recover stolen property. The key discovery focused on a file over who changed what and when on an insurance document. Few examples better highlight the need for metadata.

One big difference between this case and Federal Court is that a producing party need only produce in one form. A producing party would have to produce in native file format or with TIFF and metadata, not both. That being said, a production cost of $3,500 on a case of this size might be on the high side (it is unclear how many computers were at issue, number of hours spent, cost of production media, etc). Moreover, most processing software could do such a production with a few keystrokes (and I would bet at a lower cost then argued to the Court, depending on the volume of data to be collected pertaining to one insured party and other relevant files). There are of course other factors that could drive up costs, but I would need more information to understand why there was a $3,500 production cost estimate for the specific discovery sought.