What are the Differences Between Anonymisation and Pseudonymisation

March 6, 2023

I. Introduction

Pseudonymisation and anonymisation are often confused. Both techniques are relevant within the context of the GDPR. This confusion arises from differences between 1 the legal definition of personal data and recognizing data that is not directly identifying someone as a form of personal data in practice. In contrast to the binary, legal understanding of personal data, the data that is being processed today is more appropriately placed in a spectrum between what is clearly personal data and anonymous data and anything in between. The issue is that a significant amount of economic value can be derived from data that appears non-personal (anonymous) at first glance, but with enough effort, it can be transformed into personal information.  

Precisely at this idea is where the confusion lies. Pseudonymisation enables the personal data to become unidentifiable unless more information is available whereas anonymization allows the processing of personal data to irreversibly prevent re-identification. 2

Understanding the difference between these two techniques is very important because it essentially determines whether the GDPR applies to the data processing or not.  

Confusing pseudonymisation with anonymisation can create a false sense of security and put individuals' personal data at risk. If data is pseudonymised but not properly protected, it could still be re-identified by linking it with other information, and therefore could be subject to the same privacy risks as  personal data that is not pseudonymised. Moreover, if the pseudonymised data is subject to a data breach, the individuals personal data may still be exposed and identifiable. On the other hand, if data is incorrectly treated as anonymised when it is only pseudonymised, it may be subject to unnecessary restrictions on its use or retention, which could impact its value for research or other purposes.3  

Therefore, it is important to distinguish between pseudonymisation and anonymisation, and to ensure that appropriate safeguards are in place for each type of data processing, as required by the GDPR.

II. Pseudonymisation

A. Definition of pseudonymisation

In simple terms, pseudonymization enables the personal data to go through a process that makes the personal data unidentifiable to a specific person without any extra information.4 This is usually done by replacing directly identifying information such as name, Social Security number, or date of birth with a random code.5 However, the data can still be indirectly linked to the person, so it remains to be considered personal data. Additionally, any extra information that could be used to identify the original person, such as an encryption key, must be kept separate from the data using technical or organizational measures.6 This separation can even occur within the same organization and must ensure that the data cannot be accidentally or unauthorizedly linked to a person.

A common example of pseudonymization is key-coding data, which is often used in medical research. Key-coded data is information about a person that has been labeled with a code. There are many other ways to pseudonymize data, including encryption with a secret key, using a hash function or using a keyed-hash function with a stored key.7

B. Advantages and disadvantages of pseudonymisation

Advantages of Pseudonymisation under the GDPR  

  1. Improved privacy: By removing or replacing direct identifiers, pseudonymization can reduce the risk of personal data being misused or shared without consent.
  1. Increased data sharing: Pseudonymization can make it easier for organizations to share personal data for research or business purposes, as long as appropriate measures are taken to protect the data.  
  1. Better security: By separating data from direct identifiers, organizations can limit the impact of a data breach, as hackers will only be able to access pseudonymous data, not directly identifiable data.

Disadvantages of Pseudonymization under GDPR:

  1. Re-identification risk: Despite the measures in place, there is always a risk of re-identification, especially if the pseudonymization method is not properly implemented.
  1. Data utility: Depending on the method used, pseudonymized data may have reduced utility, as some data elements may be lost or altered in the pseudonymization process.
  1. Complexity: Implementing pseudonymization can be complex and resource-intensive, requiring specialized expertise, technology, and processes.

IV. Anonymisation

A. Definition of anonymisation

According to Recital 26 of the GDPR, data is considered anonymous if there is a reasonable likelihood that it cannot be linked to an identified or identifiable natural person.8 In data protection and privacy terms data that are not personal are typically referred to as ‘anonymous data’, and the process of rendering personal data non-personal is typically termed ‘anonymisation’. Non-personal data do not fall within the scope of application of the GDPR.

B. How anonymisation works

Anonymous data or non-personal data as information are combined (aggregated) to the point where specific events (such as a person's travel patterns) can no longer be linked to a specific individual. This type of data is commonly used in statistics or sales reports to analyze things like product popularity and features. Some other examples of anonymous data include information on high-frequency trading in the finance industry and data on precision farming, which helps to monitor and improve the use of things like pesticides, nutrients, and water.9

It must be noted that, when deciding if a person can be identified from their data, all possible ways of identification should be considered, whether it's by the data controller or someone else. To determine whether these methods are likely to be used, various factors should be taken into account, such as how much time and effort it would take to identify the person, and the technology available at the time of the data processing.10

C. Advantages and disadvantages of anonymisation

Advantages:

  • Increased data privacy: Anonymization helps to protect the privacy of individuals by removing personal information that could be used to identify them. This reduces the risk of data breaches, unauthorized access, or misuse of personal data.
  • Facilitation of data sharing: Anonymized data can be shared more freely than personal data, as it eliminates privacy concerns. This makes it easier for organizations to collaborate and share data for research, statistical analysis, and other purposes.
  • Compliance with regulations: Anonymization can help organizations meet their obligations under the GDPR and other data protection regulations by reducing the risks associated with processing personal data.

Disadvantages:

  • Information loss: Anonymization can result in a loss of information, as certain details about the data subjects may be removed. This can impact the accuracy and usefulness of the data for certain purposes.
  • Re-identification risk: While anonymization reduces the risk of identifying individuals, it does not eliminate it completely. With advances in technology and data analysis, it may still be possible to re-identify individuals from anonymized data.
  • Complexity: The process of anonymization can be complex and time-consuming, particularly when dealing with large amounts of data. There may also be a risk of human error in the process, which could result in personal data being inadvertently disclosed.

V. Comparison of pseudonymisation and anonymisation  

C. When to use pseudonymisation and when to use anonymisation

Pseudonymization is not the same as anonymization. Pseudonymized data is not the same as anonymous data. Data is considered anonymous only when it's impossible to identify the person the data concernsbelongs to.11 While it may never be possible to completely exclude the possibility of re-identifying the data, EU regulations have a very high standard for what counts as anonymous.

Pseudonymization and anonymization are both techniques used to protect the privacy of individuals in the processing of personal data. The choice between the two depends on the specific situation and the level of privacy protection that is required. Nevertheless, it is important to mention that while the GDPR does not require pseudonymisation by default (per Article 25(2)), certain national data protection laws, such as the German GDPR Implementation Law, do impose strict pseudonymisation requirements.12 This German law, for instance, states that personal data must be anonymized or pseudonymized as early as possible and in accordance with the purpose of processing.13

Although there is an idea that the GDPR regards pseudonymisation mainly as a data security measure; this is inaccurate.  Although Article 32(1)(a) does acknowledge pseudonymisation as a way of achieving proper security, it is also associated with the broader obligation of 'data protection by design' (as per Article 25(1)), as well as data minimisation measures applicable to processing for archival, scientific, historical, or statistical purposes (per Article 89(1)).

The following elaborates on specific examples of the usages of pseudonymization and anonymization.  

Pseudonymization is best used when:

  • The data is being processed for a specific purpose, and some level of personal information is needed to achieve that purpose, but the information should not be directly linked to an individual's identity.
  • There is a need to retain some personal information in order to monitor or enforce compliance with data protection laws, but the data should not be directly linked to an individual's identity.  
  • The data is being processed for scientific or statistical purposes, and personal information is needed for research purposes, but the data should not be directly linked to an individual's identity.

Anonymization, on the other hand, is best used when:

  • The data is no longer needed for any specific purpose and the data controller has no intention of ever using it again.
  • The data is being processed for scientific or statistical purposes, and it is not necessary to retain any personal information.
  • The data is being shared with third parties and the data controller has no intention of using the data for any specific purpose.

In conclusion, pseudonymization is a good compromise between privacy protection and the need for personal information, while anonymization offers the highest level of privacy protection, but also results in the loss of the data qualifying as personal data, thus removing the applicability of the GDPR.  

VI. Conclusion  

In conclusion, there is a significant difference between pseudonymisation and anonymization. This is essential to understand as it determines whether the GDPR would be applicable or not. Pseudonymisation is the process of replacing identifying information with random codes, which can be linked back to the original person with extra information, whereas anonymisation is the irreversible process of rendering personal data non-personal, and not subject to the GDPR. The advantages of pseudonymisation include improved privacy, increased data sharing, and better security, whereas its disadvantages include a risk of re-identification, reduced data utility, and complexity. Understanding the differences between these two techniques is crucial to safeguard individuals' personal data as confusing them could lead to unnecessary restrictions on data use or retention, impacting its value for research or other purposes.

Footnotes

1 Nadezhda Purtova, ‘The Law ofEverything. Broad Concept of Personal Data and Future of EU Data ProtectionLaw’ (2018) 10 Law, Innovation and Technology 40.

2 Recital 26 and Recital 26, GDPR.

3 Michèle Finck, Frank Pallas, Theywho must not be identified—distinguishing personal from non-personal data underthe GDPR, International Data Privacy Law,Volume 10, Issue 1, February 2020, Pages 11–36

4 Article 4(5), GDPR.

5 Article 29 Working Party onAnonymisation Techniques p.20

6 Tosoni, Luca,' Article 4(5).Pseudonymisation',in Christopher Kuner and others (eds),The EU General DataProtection Regulation (GDPR): A Commentary (New York,2020;online edn,OxfordAcademic)

7 EC Guidance2019, p. 5 (noting that ‘[f]or instance, a research study on the effects of anew medicine would qualify as pseudonymisation [within the meaning of Art. 4(5)GDPR], if the personal data of study participants would be replaced by uniqueattributes (e.g. number or code) in the research documentation and theirpersonal data would be kept separately with the assigned unique attributes in asecured document (e.g. in a password protected database)’).

8 Recital 26 GDPR.

9 EC Guidance 2019, pp. 6–7.

10 Recital 26, GDPR.

11 Tosoni, Luca,' Article 4(5).Pseudonymisation',in Christopher Kuner and others (eds),The EU General DataProtection Regulation (GDPR): A Commentary (New York,2020;online edn,OxfordAcademic),

12 GermanGDPR Implementation Law

13 Tosoni, Luca,' Article 4(5).Pseudonymisation',in Christopher Kuner and others (eds),The EU General DataProtection Regulation (GDPR): A Commentary (New York,2020;online edn,OxfordAcademic)

Download