The Information Commissioner’s Office (ICO) has launched a consultation series on how aspects of data protection law should apply to the development and use of generative AI. Generative AI is defined by the ICO as a type of artificial intelligence that can create new content such as text, computer code, audio, music, images, and videos. A well-known example of generative AI is ChatGPT.
The ICO has given some thought to the risks posed by the development and use of generative AI. Having considered these risks, the ICO has put together a list of questions they feel need to be addressed in the upcoming consultations. The first of these questions is “what is the appropriate lawful basis for training generative AI models?” This is the focus of the ICO’s first consultation in the series.
To understand how data protection law applies to the development of generative AI, it is important to understand how generative AI is actually developed. Within the consultation, the ICO explain that there are several stages involved in developing generative AI. The first step is collecting training data. This data is usually obtained through web scraping which is the use of automated software to go through web pages, gather, copy, and extract information from the pages, and store the information for further use.
When collecting training data, developers need to ensure that they have a valid lawful basis for processing the data and that they are not in breach of any other laws. There are six lawful bases for the processing of data. These are consent, contract, legal obligation, vital interests, public task, and legitimate interests. See our previous blog on the lawful bases.
The ICO within the consultation has given its thoughts in relation to this but has also asked for input from developers. The deadline for response is 1 March 2024. It will be interesting to see what feedback the ICO receives.
I have summarised the ICO’s analysis of the position below. The ICO has confirmed that the purpose of this is to set out its thinking and should not be treated as confirmation that web scraping is always legally compliant.
The ICO’s view is that based on the current practices involved in the development of generative AI the only lawful basis that could apply to the processing is legitimate interest. To meet the legitimate interest basis, the controller must demonstrate the following:
- The purpose of the processing is legitimate;
- The processing is necessary for that purpose; and
- The individual’s interests do not override the interest being pursued.
Developers must therefore consider whether there is a valid interest for the processing of the web scraped data. The ICO suggested that a developer could rely on a business need either on their own platform or in bringing it to market for a third party. To be able to rely on this, a developer would need to show the specific purpose and use of their model. The ICO went on to suggest that whilst a developer could theoretically rely on broad social interests, they would need to ensure that those interests were actually being realised rather than just assuming that they are.
After establishing a legitimate interest in the processing, a developer would then need to prove that web scraping is necessary to achieve the purpose of the processing. The ICO has accepted that the most generative AI development is only possible using a large volume of data which is obtained through web scraping. Therefore, at this current time, web scraping seems to be the only way of achieving the purpose of the processing.
Finally, if a developer is able to establish that there is a legitimate purpose and the processing is necessary for the purpose, they would then need to assess the impact of the processing on individuals. The ICO expressed concerns that web scraping involves invisible processing whereby the person does not know that their personal data is being processed. This can cause harm to a data subject as they lose control over who is processing their data and therefore, they are prevented from exercising their information rights. Because of this, the ICO considers web scraping to be a high-risk activity. This does not mean that developers cannot process data by web scraping, but they do however have to try and mitigate harm to data subjects.
Comment
Data protection laws will need to change and adapt as and when new technology is developed. It is good to see that the ICO is tackling this head-on. It will be interesting to see what response the ICO receives to this consultation and what other topics they cover in the future.
How can we help
Ruby Ashby is a Senior Associate in our expert Dispute Resolution team, specialising in data breach claims, inheritance and Trust disputes and defamation claims.
If you need any advice, please do not hesitate to contact Ruby or another member of the team in Derby, Leicester, or Nottingham on 0800 024 1976 or via our online enquiry form.
Contact us