What is De-duplication and why should you care about it?
Sometimes in dealing with electronically stored information, your tech specialist or litigation support vendor might ask if you want deduplication done on your data.
Say what now? De-who?
Yep, it's another one of those techie words that we throw around. I guess it would be easier to ask if you want to keep ALL the exact copies of data to review. That still might not be much better in explaining it.
De-duplication is the process of removing duplicate data and eliminates excessive copies of data. Many litigation support applications can identify duplicates through the digital fingerprint in the creation of a document. When anything is created there is a unique identifier given to the data.
Why should you care? Well, you will care if you are looking through lots of data because it will save you time in reviewing the same documents.
So how are duplicates created?
Let’s say that you send me an email with an attached Microsoft Word document. If I save the attachment, I now have a duplicate of that Word document. Then if I copy it to another folder, I have another duplicate as long as I don’t open the document and make any changes to it.
There are times you might want ALL duplicates and then sometimes you might not.
So what types of de-duplication are there?
- Global De-duplication Exact: All duplicates of a document are suppressed (meaning held back and retained in an archive) and only a single copy is submitted for review.
- Global De-duplication Exact & Content: This is when a document is exactly the same document but created in different formats. For instance, if you converted a Microsoft Word document to an Adobe PDF file. They would essentially be the SAME data but the only difference is the format.
- Custodian De-duplication Exact: This option removes exact copies within a custodians set of data. So, if I had three copies of the same Microsoft Word document, two would be removed so you would only have to review one copy since the others are exact copies.
- Custodian De-duplication Exact & Content: This is where you still remove duplicates that were in different formats. If I had three copies of the same Microsoft Word document and converted them to PDF format or another version.
The whole point of this de-duplication process is to save you time in reviewing documents.
I know this process makes some attorneys very nervous because they don't feel as if they have "everything" and fear that "what if" factor of not including the exact copies. We understand there is that nagging problem that data is missing but it really isn't missing, just the exact copies are missing. This would be the same as if you went to court and left a copy of a document on your desk. If it's the exact duplicate of what you have, you are not missing anything.
I have had the issue come up before when the opposing counsel accused our side of not turning over everything because there were bates numbers missing in the productions. They were convinced in their minds that they didn't have "everything".
De-duplication can be tricky but it really doesn't have to be. Just remember that many eDiscovery vendors have applications which can identify these duplicates for you.