Many companies hope to benefit from amassing large amounts of data by mining it for market insights, creating internal business models, and supporting strategic, data-driven decisions. But as companies collect and store increasingly enormous volumes of data, they may unknowingly take on significant legal risks, including potential violations of data privacy laws and increased exposure to U.S. litigation discovery obligations. One way that businesses can mitigate these risks is to de-identify the data they collect and store.
Done properly, data de-identification can minimize risks to privacy interests—e.g., in the event of a data breach— and can help ensure that companies comply with many privacy laws. It can also reduce the likelihood that a company will need to collect, review, and produce such data in discovery as part of litigation or a government investigation.
This remains an emerging area, however, with no bright-line rules. Consequently, it is important for companies to understand how de-identification changes data and what those changes may mean in the context of a party's discovery obligations.
This article offers several guiding principles to assist companies in developing data de-identification practices that will reduce the risks associated with large-scale compilations of data.