What Did We Learn About eDiscovery in 2014?

As 2014 draws to a close, it is time to reflect on the cases from this year in eDiscovery. One of the biggest trends I took away from caselaw in 2014, is that more Judges have a greater understanding of eDiscovery, resulting in practical opinions.

Here are the practice areas I found to be the most interesting in 2014, which can be heard in full on my 2014 eDiscovery Year in Review on iTunes or Buzzsprout (Presented by Paragon):

Application of Proportionality Analysis

Judges Questioning Why The Court Was Asked Permission to Use Predictive Coding

We still have Form of Production issues eight years after the 2006 eDiscovery Amendments to the Federal Rules of Civil Procedure

The Importance of Documenting Services for Taxation of Costs

What will 2015 hold for us in the world of electronic discovery? I think we will see proportionality analysis focus on the value of the information sought in relation to the case and not solely just the cost of the discovery. Parties will have to explain how the information is useful, such as how it relates to a claim, opposed to merely saying, “It is expensive.” This will require counsel to focus on the merits of the case and how the requested discovery will help advance the litigation.

I personally hope litigants stop asking Judges for permission to use predictive coding. No one asks, “Can I de-dup the data? Is it ok to use clustering? May I please use conceptual search in addition to keywords?”

The issue with all productions is whether or not the production is adequate. In my view, parties going to war over predictive coding as a means to review electronically stored information is asking the Court to issue an advisory opinion. The time to fight is when the there actually is a dispute because a production is lacking, instead of engaging in arguments of how much a human being can read in an hour compared to a computer algorism.

To learn more on the issues from the past year, please check out my 2014 eDiscovery Year in Review audio podcast on iTunes or Buzzsprout.

I want to thank Paragon for sponsoring the 2014 eDiscovery Year in Review. Please check out their website and recent blog post on the Convergence of eDiscovery and Information Security to learn more about their services.

Advertisements

Why Deviate from Native Files in a Case Management Order?

ConjectureThere are Case Management Orders that show parties spent a lot of time considering eDiscovery issues. There are the ones that show a lack of thought. There are ones that are mixed.

This one shows a lot of forethought, but I am puzzled by the form of production.

Technology Assisted Review is Good for You and Me

There is nothing magical about using Technology Assisted Review. There is also no rule requiring specific technology to find responsive electronically stored information. The issue is always one of whether a production was adequate.

The Case Management Order in Green v. Am. Modern Home Ins. Co., states the following on Technology Assisted Review:

  1. Technology Assisted Review in Lieu of Search Terms. In lieu of identifying responsive ESI using the search terms and custodians/electronic systems as described in Sections II.C & II.D above, a party may use a technology assisted review platform to identify potentially relevant documents and ESI.

Green v. Am. Modern Home Ins. Co., 2014 U.S. Dist. LEXIS 165956, 4 (W.D. Ark. Nov. 24, 2014).

I would argue such a decree in a Case Management Order is unnecessary under the Federal Rules of Civil Procedure and case law, but such a specific order should preemptively end any question on whether predictive coding, data analytics, “find similar,” conceptual search, and any other available search technology can used in the case.

The Form of Production

I am not a fan of converting native files to TIFFS and conversion to OCR, absent the need to redact confidential or privileged information. That is exactly what this order proscribed, minus spreadsheets:

  1. Format. All ESI, other than databases or spreadsheets, shall be produced in a single- or multi-page 300 dpi TIFF image with a Concordance DAT file with standard delimiters and OPT file for image loading. The documents shall also be processed through Optical Character Recognition (OCR) Software with OCR text files provided along with the production. Extracted Text shall be provided for all documents unless it cannot be obtained. To the extent a document is redacted, OCR text files for such document shall not contain text for the redacted portions of the document. Each TIFF image will be assigned a Bates number that: (1) is unique across the entire document production; (2) maintains a constant length across the entire production padded to the same number of characters; (3) contains no special characters or embedded spaces; and (4) is sequential within a given document. If a Bates number or set of Bates numbers is skipped in a production, the Producing Party will so note in a cover letter or production log accompanying the production. Each TIFF image file shall be named with the Bates Number corresponding to the number assigned to the document page contained in that image. In the event a party determines that it is unableto produce in the format specified in this section without incurring unreasonable expense, the parties shall meet and confer to agree upon an alternative format for production.
  1. Metadata. To the extent that any of the following metadata fields associated with all applicable documents are available, the Producing Party will produce those metadata fields to the Requesting Party: file name, file size, author, application date created, file system date created, application date last modified, file system date last modified, date last saved, original file path, subject line, date sent, time sent, sender/author, recipient(s), copyee(s), and blind copyee(s). For emails with attachments, the Producing Party will indicate when a parent-child relationship between the message and the attachment exists. A Producing Party shall also produce a load file with each production with the following fields: Starting Bates; Ending Bates; Begin Attach; End Attach; and Source (custodian/location from which document was collected). If any metadata described in this section does not exist, is not reasonably accessible, is not reasonably available, or would be unduly burdensome to collect or provide, nothingin this ESI Order shall require any party to extract, capture, collect or produce such metadata.

Green, 4-7.

The order does included extracted text, but why go to the trouble of requiring production as TIFFs in the first place? The statement about OCR could be misconstrued to requiring OCRing the TIFFs when any searchable information is already available on the form of extracted text, thus OCRing is both redundant and adds cost. The only reason to OCR a TIFF is because it needs to be redacted, because producing extracted text would inadvertently produce the redacted content.

Most review applications today do a great job of ingesting native files and allowing users to review in near-native. If the native file needs to be accessed, most applications allow for reviewing the native within the review application or a copy downloaded for review in the native application.

Requiring conversion to static images is not the default of Federal Rule of Civil Procedure Rule 34. I do not recommend requiring conversion to TIFF for production, unless there is a substantial amount of redactions that must take place.

There are many types of metadata, from embedded, to substantive, to system. The above order reflects metadata as it was objective coding, seeking specific information. While all useful information, I would encourage parties to think in more terms of types of metadata, in addition to how the information should appear in a review application.

Spreadsheets in Native File Format

The order stated the following on spreadsheets:

  1. Spreadsheets. Absent special circumstances, Excel files, .csv files and other similar spreadsheet files will be produced in native format (“Native Files”). Native Files will be provided in a self-identified “Natives” directory. Each Native File will be produced with 6a corresponding single-page TIFF placeholderimage, which will contain language indicating that the document is being produced as a Native File. Native Files will be named with the beginning Bates number that is assigned to that specific record in the production. A “NativeLink” entry for each spreadsheet will be included in the .DAT load file indicating the relative file path to each native file on the Production Media. Native Files will be produced with extracted text and applicable metadata fields if possible and consistent with Section III.A.2 above. For documents that contain redacted text, the parties may either apply the redactions directly on the native file itself or produce TIFF image files with burned-in redactions in lieu of a Native File and TIFF placeholder image. Each Producing Party will make reasonable efforts to ensure that Native Files, prior to conversion to TIFF, reveal hidden data from redacted Native Files that are produced as TIFF image files and will be formatted so as to be readable. (For example, column widths should be formatted so that numbers do not appear as “#########”.) Under these circumstances, all single-page TIFF images shall include row and column headings.

Green, at *8-9.

I am glad the default for spreadsheets did not deviate from the Rule 34. I am curious if any of my case manager friends would agree with the order requiring TIFF placeholders and renaming the native files.

The past year has seen parties become more detailed in their case management orders regarding electronically stored information. This is a good thing. However, I strongly encourage parties to not deviate from the Federal Rules of Civil Procedure without reason, leverage the search abilities of their review applications, and make sure the case management order helps the case comply with Federal Rule of Civil Procedure Rule 1.

Even A Judged Questioned Why Ask for Permission to Use Predictive Coding

HighFiveI do not normally want to high five Federal judges, but Judge Ronald Buch, a Tax Judge in Texas, sure deserved one after his Dynamo Holdings opinion.

The discovery dispute can be summed up as a battle over backup tapes that had confidential information. The Requesting Party wanted the tapes; the Producing Party wanted to use predictive coding to produce what was relevant, because the cost for reviewing the material for privilege and relevancy would cost $450,000 with manual review. Dynamo Holdings v. Comm’r, 2014 U.S. Tax Ct. LEXIS 40 (Docket Nos. 2685-11, 8393-12. Filed September 17, 2014.)

The Requesting Party wanted the backup tapes to analyze metadata on when ESI was created. Moreover, the Requesting Party called “Predictive Coding” an “unproven technology.” The Requesting Party attempted to address the Producing Party’s cost concern with a clawback agreement. Dynamo Holdings, at *3.

After an evidentiary hearing with experts on the use of predictive coding, the Court granted the Producing Party’s motion to use predictive coding. Judge Buch had a “dynamo” quote on the entire issue of asking to use predictive coding:

 “And although it is a proper role of the Court to supervise the discovery process and intervene when it is abused by the parties, the Court is not normally in the business of dictating to parties the process that they should use when responding to discovery. If our focus were on paper discovery, we would not (for example) be dictating to a party the manner in which it should review documents for responsiveness or privilege, such as whether that review should be done by a paralegal, a junior attorney, or a senior attorney. Yet that is, in essence, what the parties are asking the Court to consider–whether document review should be done by humans or with the assistance of computers. Respondent fears an incomplete response to his discovery. If respondent believes that the ultimate discovery response is incomplete and can support that belief, he can file another motion to compel at that time. Nonetheless, because we have not previously addressed the issue of computer-assisted review tools, we will address it here.

Dynamo Holdings, at *10-11.

It is so refreshing to see a Judge address the issue of requesting to use a specific technology. No one does a motion to compel asking for permission on what lawyers should do document review. Moreover, no moving party asks permission to use visual analytics, de-duplication, or any of the other outstanding technology available to conduct eDiscovery.

The opinion ends with that if the Requesting Party believed the discovery response was incomplete, then a motion to compel could be filed, which is exactly the way the process should work. The issue should not be “can we use this technology,” whether the production is adequate or not, which requires evidence of a production gaps or other evidence that not all responsive information was produced.

Well done Judge Buch.

 

Nebraska, Where Proportionality is Alive and Well in Discovery

Nebraska stampOne lesson from United States v. Univ. of Neb. at Kearney, is that maybe you should take depositions of key parties and use interrogatories to find out relevant information to your case before asking for over 40,000 records that contain the personal information of unrelated third-parties to a lawsuit.

The case is a Fair Housing Act suit involving claims that students were prohibited or hindered from having “emotional assistance animals in university housing when such animals were needed to accommodate the requesting students’ mental disabilities.” United States v. Univ. of Neb. at Kearney, 2014 U.S. Dist. LEXIS 118073, 2 (D. Neb. Aug. 25, 2014).

A protracted battle over the scope of discovery broke out between the parties. The Defendants argued the search, retrieval, and review for responsive discovery was too expansive and would have been unduly burdensome. Kearney, at *5-6. As the Government’s search requests included “document* w/25 policy,” you can see the Defendant’s point on having broad hits to search terms. Kearney, at *20.

The Government’s revised search terms would have 51,131 record hits, which would have cost $155,574 for the Defendants to retrieve, review, and produce the responsive ESI. Kearney, at *5-6. This would have been on top of the $122,006 already spent for processing the Government’s requests for production. Kearney, at *7.

The Court noted that the Government’s search terms would have required production of ESI for every person with disability, whether they were students or contractors. Kearney, at *6-7. The Government argued the information was necessary, and justified, in order to show discriminatory intent by the Defendants. Id.

The Defendants wanted the scope of the discovery requests narrowed to the “housing” or “residential” content, which would have resulted in 10,997 responsive records. Kearney, at *7.

The Government did not want to limit the scope of discovery and recommended producing all the ESI subject to a clawback agreement [notice not a protective order] for the Government to search the ESI. The Defendants argued such an agreement would violate the Family Educational Rights and Privacy Act by disclosing student personal identifiable information without their notice and consent. Kearney, at *8.

Motion practice followed with the Defendant requesting cost shifting to the Government for conducting searches, the use of predictive coding software, and review hosting fees. Kearney, at *8-9.

The Court ordered the parties to answer specific discovery questions, which the Government did not answer, on “information comparing the cost of its proposed document retrieval method and amount at issue in the case, any cost/benefit analysis of the discovery methods proposed, or a statement of who should bear those costs.” Kearney, at *9.

The Court was not keen on the Government outright searching the personal data of others unrelated to the case. As the Court stated:

The public and the university’s student population may be understandably reluctant to request accommodations or voice their concerns about disparate or discriminatory treatment if, by doing so, their private files can be scoured through by the federal government for a wholly unrelated case. The government’s reach cannot extend that far under the auspices of civil discovery; at least not without first affording all nonparties impacted with an opportunity to consent or object to disclosure of information from or related to their files.

Kearney, at *18-19.

The Court stated it would not order the production of over 51,000 files with a clawback order. Moreover, the cost to review all of the ESI exceeded the value of the request. Kearney, at *19.

The Court did not accept the Government’s claim that it needed to conduct an expansive search. Kearney, at *19-20. The Court stated the following on the fundamentals of civil discovery:

Searching for ESI is only one discovery tool. It should not be deemed a replacement for interrogatories, production requests, requests for admissions and depositions, and it should not be ordered solely as a method to confirm the opposing party’s discovery is complete. For example, the government proposes search terms such as “document* w/25 policy.” The broadly used words “document” and “policy” will no doubt retrieve documents the government wants to see, along with thousands of documents that have no bearing on this case. And to what end? Through other discovery means, the government has already received copies of UNK’s policies for the claims at issue.

Kearney, at *20.

The Court further stated that “absent any evidence that the defendants hid or destroyed discovery and cannot be trusted to comply with written discovery requests, the court is convinced ESI is neither the only nor the best and most economical discovery method for, and depositions should suffice—and with far less cost and delay.” Kearney, at *21.

Bow Tie Thoughts

This case has significant privacy interests, but at its core the issue is one of proportionality. What was the cost of discovery and its benefit? In the end, the cost of expansive search terms that impacted the third party rights of others, outweighed the benefit of the discovery to the case.

The fact we have amazing search technology that can search electronic information does not mean we can forget how to litigate. The use of “search terms” cannot swallow the actual claims of a case.

It is heartening to see a Court say no to the data of unrelated third parties being enveloped into a discovery production. While there are many ways to show discrimination, requesting the electronically stored information, protected by Federal and most likely state law, of third parties should give any Court pause.

The use of predictive coding to focus the scope of discovery, or visual analytics to identify relevant information, or clustering to organizing similar information is fantastic technology to expedite review. However, the fact that technology exists still means lawyers have to use requests for admissions, interrogatories, and have requests narrowly tailored for responsive ESI.

 

Does Proportionality Disappear If a Lawyer Says “Predictive Coding” Three Times?

The In re Bridgepoint Education case is not one about the merits of predictive coding, but one of proportionality over expanding the scope of discovery by nine months. In re Bridgepoint Educ., 2014 U.S. Dist. LEXIS 108505, 10-11 (S.D. Cal. Aug. 6, 2014). While the cost of document review and the use of predictive coding have a starring role in the opinion, let’s not forget the second discovery dispute in the case ultimately is about proportionality.

The Defendants claimed that expanding the scope of discovery by nine months would increase their review costs by 26% or $390,000 (based on past review efforts in the case). In re Bridgepoint Educ., at *6-7.

Computer-Search-Magic

The Plaintiffs countered that the review costs would more likely be $11,279, because of the predictive coding system the Defendants would use instead of manual review. In re Bridgepoint Educ., at *7.

The Defendants countered that “predictive coding” did not make “manual review” for relevance elective, because the predictive coding software assigned a percentage estimate to each record on the record’s probability of being relevant. Id. As such, attorney review is still required for relevance and privilege review. Id.

The Court denied expanding the scope of discovery by nine months based on the “proportionality” rule of Federal Rule of Civil Procedure Rule 26(b)(2)(C). The Rule states a Court can limit discovery if the “burden or expense of the proposed discovery outweighs the likely benefit.” In re Bridgepoint Educ., at *9-10.

The Court found expanding the scope to be unduly burdensome. Moreover, while there might have been relevant information in the expanded timeframe, the Court agreed with the Defendants that relevant information would be in the originally agreed timeframe. In re Bridgepoint Educ., at *10-11.

Predictive Coding was also at the center of the fourth discovery dispute. The Plaintiffs argued discovery produced from three Individual Defendants should be added to the Defendants’ predictive coding software. In re Bridgepoint Educ., at *12. According to the Plaintiffs, the Defendants “unilaterally-selected search terms” to identify the original production. Id.

The Defendants argued their review process for the original production was reasonable. Moreover, adding the original production to the predictive coding process could “negatively impact the reliability of the predictive coding process.” In re Bridgepoint Educ., at *12-13. However, the Defendants were willing to run additional searches on the Individual Defendants’ production. In re Bridgepoint Educ., at *13.

The Court noted that the Defendants’ linear search methodology for the three Individual Defendants had been approved by the Court. As such, the Court ordered the parties to meet and confer on additional search terms on the original production for the Individual Defendants. Id.

Bow Tie Thoughts

In re Bridgepoint Education is an interesting spin on predictive coding cases, because effectively the REQUESTING party is arguing for the producing party to use predictive coding to reduce proportionality issues.

Businessman managing electronic documentsFirst things first: Saying, “predictive coding” has no magical properties. Nor will review costs decline by getting the opposing party to say Rumpelstiltskin. Even if an attorney is somehow tricked into saying Mister Mxyzptlk’s name backwards in a hearing, a Court will always be concerned about proportionality before expanding the scope of discovery.

Proportionality will always be concerned with cost of review, but discovery review does not exist independent of the case. The scope of discovery should not be expanded because the cost of review can be reduced leveraging advanced search capabilities alone. The issue is whether there are relevant records in the expanded universe and if the “burden or expense of the proposed discovery outweighs the likely benefit.”

Finally, just because one search methodology was used to identify records over another does not devalue the responsiveness of the production. If a requesting party is concerned with the adequacy of a production, challenge it accordingly by showing production gaps or other evidence to demonstrate the production is inadequate.

Stuck in the Predictive Coding Pipeline

ExxonMobil Pipeline had a problem in discovery: their discovery responses were overdue. The requests for production was served in November 2013 and due after one extension in January 2014. The Plaintiffs rightly brought a motion to compel.

The Defendants had enough discovery to give most eDiscovery attorneys a migraine with a nosebleed: 16 separate lawsuits, with 165 discovery requests in one case, a total of 392 requests in all the related cases, and 83 custodians with approximately 2.7 million electronic documents. Other discovery going back to 1988 had over 63,000 paper documents that were scanned and to be searched with keywords. Additionally, there were approximately 630,000-800,000 documents that had to be reviewed for responsiveness, confidentiality, and privilege. The Defendants had produced 53,253 documents consisting of over 191,994 pages. United States v. ExxonMobil Pipeline Co., 2014 U.S. Dist. LEXIS 81607, 5-8 (E.D. Ark. June 9, 2014).

pipeline

The Defendants suggested using predictive coding in light of the large volume of discovery, but the Plaintiff the United States did not agree with the use of predictive coding (at least since the filing of the motions). ExxonMobil Pipeline, at *6. Moreover, the parties did not seek relief from the Court on the use of predictive coding, other than to order the parties meet and confer. ExxonMobil Pipeline, at *6-7.

The Defendants explained that using traditional review with 50 attorneys that document review could be completed by the end of June 24 and production by the end of August 2014. ExxonMobil Pipeline, at *6.

The United States disagreed with the Defendants assumption of lawyers only reviewing 250 documents/files a day. Moreover, the Defendants did not raise concerns about document review when they entered an agreed upon scheduling order in October 2013. ExxonMobil Pipeline, at *6-7.

The Court acknowledged that the Defendants had a large volume of discovery to review. Moreover, it was unclear if the parties had agreed to a review methodology before the Court issued its order. Regardless, the Court ordered the Defendants to complete their review and production by July 10, 2014, absent good cause. ExxonMobil Pipeline, at *7-8.

Bow Tie Thoughts

Most attorneys do not think about document review strategies at the beginning of a case. They should. Discovery is the backbone of civil litigation. Unless you know the information you have to review, strategies to maximize efficiency, and reviewing for claims or defenses, document review can be a nightmare experience.

This case does not go into why the Defendants sought agreement from the Plaintiff on the use of predictive coding. I do not agree with that strategy, unless a specific review protocol was ordered at the Rule 16 conference that the producing party wanted to change.

The issue with a document production is whether or not the production is adequate. Lawyers should agree to the subject matter of the case, custodians, data ranges, and other objective information that goes to the merits of the lawsuit. When lawyers start asking each other for permission on whether they can use predictive coding, visual analytics, clustering, email threading, or any other technology, civil litigation becomes uncivil. Case in point: the Plaintiffs argued the Defendants could review more than 250 documents a day in this case. Such disputes turn into an academic fight over how much lawyers can read and analyze in a 9-hour workday. The end result of such motion practice would be a Judge ordering lawyers to read faster.

My advice is to focus on the merits and not derail the case with a fight over what review technology can be used. Fight over whether the production is adequate, not what whether you can use predictive coding.

Guess What? Cooperation Does Not Mean Privilege or Relevancy Are Dead

Here is the big lesson from the latest Biomet opinion over predictive coding:

The Steering Committee wants the whole seed set Biomet used for the algorithm’s initial training. That request reaches well beyond the scope of any permissible discovery by seeking irrelevant or privileged documents used to tell the algorithm what not to find. That the Steering Committee has no right to discover irrelevant or privileged documents seems self-evident.

United States District Court Judge Robert Miller, In re Biomet M2a Magnum Hip Implant Prods. Liab. Litig., 2013 U.S. Dist. LEXIS 172570, at *3 (D. Ind. 2013).

One word: Good.

Cooperation does not mean attorney work product is eviscerated when discussing predictive coding. Moreover, if ESI is not relevant, why drive up discovery costs in reviewing it?  Furthermore, Federal Rule of Civil Procedure Rule 26(b)(1) does not allow a requesting party to find out how the producing party used ESI before its production. Biomet, at *4.

The opinion goes on to discuss Biomet’s position that it had produced all discoverable documents to the Steering Committee. However, this is where Judge Miller made a judicial warning: Biomet did not need to identify its seed set, but the “unexplained lack of cooperation in discovery can lead a court to question why the uncooperative party is hiding something, and such questions can affect the exercise of discretion.” Biomet, at *5-6.

The Court held it would not order Biomet to disclose its seed set, but did “urge” them to “re-think its refusal.” Biomet, at *6.

Bow Tie Thoughts

There is no good answer to the issue in this case. Technology issues should be worked out by experts in a non-combative way when it comes to production formats, scope of data, date ranges, custodians and other objective factors in conducting a search. Courts really do not want to get sucked into it. However, one issue since Da Silva Moore v Publicis Groupe & MSL Group is the idea that parties need to have transparent process that both sides agree to for predictive coding. I do not think the Federal Rules of Civil Procedure require such disclosures at all. Moreover, it intrudes into attorney work product.

What is the answer? I would offer a requesting party to demonstrate there is a production gap or otherwise show how the production is deficient. This easily escalates into a quagmire over discovery about discovery. No body wins when that happens.

As for a producing party, I would not take a position that could incur the wrath of a Court if the requesting party later demonstrates a production was deficient.