On 23 May 2024, the European Data Protection Board (EDPB) published a report (Report) on the work undertaken by its ChatGPT Taskforce (Taskforce), comprising each EU supervisory authority (SA).
The Report is the first official document from the EDPB setting out its “preliminary views” regarding the interplay between the General Data Protection Regulation (GDPR) and artificial intelligence (AI), including the EU AI Act.
While the Report focuses on the various ongoing SA GDPR investigations into OpenAI’s ChatGPT large language model (LLM), it gives early indicators about the EDPB’s direction of regulation regarding the co-existence of these two areas of law. It offers valuable insights for businesses acting as controllers of personal data, whether as a deployer or developer of an AI system.
The Report also includes a questionnaire that such controllers may consider useful when ensuring compliance with the GDPR (e.g., as a checklist).
1. Background to the Report
In November 2022, OpenAI OpCo, LLC (OpenAI) launched its trailblazing ChatGPT LLM. ChatGPT grew rapidly, amassing a reported one million users in just five days from being made available, and now has around 180.5 million users.
However, at the time of launching, ChatGPT did not meet the requirements of the GDPR. This led the Italian SA and several other SAs to initiate their own investigations under Article 58(1)(a) GDPR.
The Taskforce was established to: (a) foster coordination; (b) exchange information; and (c) establish a common approach to the SAs’ investigations into ChatGPT. At the time, OpenAI had no establishment in the EU, meaning the GDPR’s “one-stop-shop” consistency mechanism did not apply.
On 15 February 2024, OpenAI established OpenAI Ireland Limited, making the Data Protection Commission of Ireland its lead SA for cross-border investigations and complaints under the GDPR. Nonetheless, several SAs have open investigations (that pre-date February 2024) and which have yet to conclude.
The Report covers the period of investigations between November 2022 and February 2024.
2. What’s in the Report? “Preliminary views”
The Report outlines the EDPB’s “preliminary views” on key areas of the GDPR, namely lawfulness, fairness, accuracy, transparency and data subject rights.
(a) Lawfulness
A fundamental aspect of the GDPR is that a controller must have a legal basis to process any personal data. This relates to the GDPR’s principle of lawfulness. The lawfulness principle is particularly complex when it comes to the AI and data protection interplay.
In assessing the legal basis for processing, the Report breaks down the different stages of processing by ChatGPT’s LLM during its lifecycle:
- Collection of training data; pre-processing of data; and training; and
- Input data/prompts and output data; and training ChatGPT with prompts.
Legitimate interests & appropriate safeguards
The Report focuses on OpenAI’s reliance on the legal basis that processing is necessary for the purposes of its legitimate interests (Article 6(1)(f) GDPR) at the various stages above. The EDPB emphasises the associated compliance obligations on controllers to rely on this legal basis, reiterating the requirement to conduct a legitimate interest assessment documenting (i) the identified legitimate interest(s); (ii) the necessity of pursuing the legitimate interest(s); and (iii) the balancing of individual rights with the interests of the controller.
Notably, the Report outlines that adequate safeguards play a “special role” in reducing the undue impact of processing on data subjects, thereby shifting the balancing test in favour of the controller.
The Report includes four examples of adequate safeguards, by way of privacy-enhancing techniques, for controllers to consider:
(i) defining precise collection criteria;
(ii) ensuring certain data categories are not collected;
(iii) ensuring certain sources are excluded from data collection (e.g. public social media profiles); and
(iv) measures to delete or anonymise personal data (including special category data) collected via web-scraping before and after the training stage.
The Report acknowledges that where large amounts of personal data are being collected, conducting a “case-by-case examination” of each dataset is not possible. However, appropriate safeguards must be implemented to meet the requirements of the GDPR.
The Report notes that under the GDPR, the controller, here OpenAI, bears the burden of proof for demonstrating the effectiveness of the chosen measures.
(b) Data accuracy – hallucinations versus fact, users must understand the difference
The Report distinguishes between input data (prompts) and output data regarding the GDPR’s data accuracy principle. It acknowledges that the purpose of processing for ChatGPT is not to provide accurate information but to train the LLM. For the EDPB, the concern is that output data generated by ChatGPT may be biased or “made up” (e.g., AI hallucinations and deep fakes), yet end users may mistakenly treat it as factual.
To address this risk of misinterpretation, the Report indicates that ChatGPT is held to a particularly high standard and must provide ‘proper information’ on the probabilistic nature of the output data and its limited level of reliability. This includes informing individuals that output data may be biased or “made up”. However, according to the EDPB, this is insufficient to comply with the data accuracy principle under the GDPR.
(c) Fairness – responsibility for ensuring GDPR compliance should not be transferred to data subjects
The principle of fairness is another crucial aspect of GDPR compliance regarding the interplay between AI and data protection. This is due to the potential for bias and discrimination. The Report emphasises that personal data must not be processed in a manner that is unjustifiably detrimental, unlawfully discriminatory, unexpected or misleading to individuals.
The EDPB outlines a “crucial aspect” of compliance with this principle: the responsibility for ensuring compliance with the GDPR should not be transferred from a controller to end users. The Report essentially prohibits a controller from including a clause in the terms and conditions of use that data subjects are responsible for their chat inputs. While OpenAI has implemented measures to address this issue, the Report clarifies that it is ultimately responsible for complying with the GDPR.
(d) Transparency – information obligations must be followed, potential exemption for indirect collection of personal data
The Report gives limited attention to the GDPR’s principle of transparency. However, in the context of web-scraped data, it acknowledges that Article 14(5)(b) GDPR may be relied on by controllers subject to its requirements being met. This article is an exemption to providing transparency information (by way of a privacy notice) to individuals where personal data are indirectly collected from them, and the provision of transparency information ‘proves impossible or would involve a disproportionate effort’.
The Report further provides that where personal data collected via prompts will be used to train the LLM, individuals must be informed of such processing under Article 13 of GDPR.
(e) Data Subject Rights – end users must be able to exercise their fundamental rights
The Report emphasises the importance of data subjects being able to exercise their rights under the GDPR. It acknowledges the methods by which a data subject can exercise their GDPR rights with OpenAI but that it must continue to improve on these methods. For example, OpenAI encourages end users to exercise their right of erasure rather than rectification due to the technical challenges associated with the development and operation of its LLMs. However, the Report merely scratches the surface of this complex area of GDPR obligations.
3. What happens next?
(a) Takeaways from the Report
Although the Report specifically relates to OpenAI’s ChatGPT, it gives early indicators about the EDPB’s approach to and expected compliance standards regarding the interplay between the GDPR and AI systems. As expected, full compliance with the GDPR is the regulatory approach the EDPB appears to be taking.
The overarching takeaway from the Report is that controllers, such as OpenAI, must comply with the GDPR and demonstrate compliance with its accountability legal framework. This means leveraging existing data protection governance frameworks to deploy AI systems.
While there is no acknowledgement regarding the opportunities LLMs and other types of AI present to society, nor the GDPR’s technology neutrality (Recital 15), expectations are high for these issues to be addressed in the EDPB’s imminent guidance on the interplay between the GDPR and the AI Act. In this regard, businesses will need more certainty when navigating compliance – especially regarding the issues highlighted in the Report at each stage of an AI system’s lifecycle.
The Report is not formal guidance but can be used as a starting framework for AI developers and deployers to comply with GDPR and prepare to comply with the AI Act. Of particular note is the questionnaire in the Annex of the Report.
(b) Expected developments
The EDPB must address many issues for businesses regarding the interplay between the GDPR, the AI Act, and the coexistence of these legal regimes. From an Irish law perspective, the Data Protection Commission (DPC) is expected to issue guidance on this interplay, as other European SAs have done.
More recently, the DPC engaged with Meta regarding its plans to train its LLM using the personal data of Facebook and Instagram end users in the European Union. Meta also relies on the legal basis of legitimate interests for such processing activities. For now, Meta has paused these plans following consultation with the DPC.
Further developments are eagerly awaited regarding how data protection and AI interplay will be regulated, particularly as the first parts of the AI Act (regarding Prohibited AI Systems) will come into force on 1 August 2024.
Contact Us and Follow Us for Updates
We will continue to post updates on these legal developments to help your business break down key issues and navigate this evolving area of law.
If you require further information on the issues raised or have any queries about data protection compliance when deploying AI, please contact Rachel Hayes, Barry Scannell, or your usual William Fry contact.
Contributed by Jamie Mac Uiginn.