The title of this article might seem contradictory, but it is not as conflicting as you might think. Sure, we all know that the General Data Protection Regulation (GDPR) prescribes us to protect personal data, wherever it may be. Production, development, testing, QA, training environments – data is stored everywhere. Most people assume that all of this data needs protection at all times, but actually that is not the case. Test data, for example, doesn’t always need to be protected as long as it meets certain conditions.
General Data Protection Regulation
The General Data Protection Regulation (GDPR) states that personal data may only be processed for the purpose for which it has been collected. So when a customer places an order and fills in his address, this address is only for the purpose of sending the order to him. If you also want to use the address for commercial purposes, the customer needs to give specific permission for this. There’s little to no data that is being collected for testing or development purposes, so therefore it cannot be used for those purposes. But does that mean you can’t use production-like data for testing and development? Luckily not!
Personal data versus privacy sensitive data
The GDPR encompasses personal data. According to the definitions section, personal data is “information relating to an identified or identifiably natural person. An identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.” But what if we could use this ‘personal’ data in a way that it is not traceable to a natural person anymore? One might still call it personal data, but it is not privacy sensitive anymore.
Non-sensitive data is non-personal
Imagine you want to test a newly developed client portal. For this you’d need client data containing names, email addresses, phone numbers, etc. Let’s shuffle all the first names and last names: “Oliver” becomes “Harry”, “Smith” becomes “Jones”, and so on. Next we generate synthetic phone numbers. After that we build new email addresses (based on the new first name / last name combination) and last we blank fields we don’t need for testing purposes. Actually we still have personal and thus real data, but with the use of a masking template (a list of masking rules in combination with generating synthetic data) we made sure the test data is anonymous; not traceable to a natural person and therefore not privacy sensitive anymore.
No protection needed
Think about it: if the data in your test database is irreversibly shuffled, blanked, scrambled and partly overwritten with synthetic data – if all the information left does not relate to an identified or identifiable natural person, why treat it that way?
According to GDPR Recital 26, the GDPR is not applicable to anonymous data: “The principles of data protection should not apply to anonymous information; information which does not relate to an identified or identifiable natural person.” From this we can conclude that masked or anonymized test data does not need to be protected. It falls outside the scope of the GDPR!
How do you know that your data is completely anonymous? Well, how are you sure you won’t get eaten by a bear today? You’re never 100% sure. But you can do everything within your power to eliminate all possible risks. The first step in this is to analyze the data. Profile your databases to discover where and which privacy sensitive information is stored. Testers usually have a good understanding of the data model, so it’s wise to include them in the process as well. Based on the data analysis, the required masking rules can be determined and the masking template can be built. Note that this is not a quick and easy task – anyone who tells you otherwise has never had to do the job or didn’t do it thoroughly. It takes time and a high degree of diligence since any omission is an information security event which must be remedied.
Contributed by Nynke Hogeveen, team member of DATPROF