GPT models’ learning and disclosure of personal data: An experimental vulnerability analysis

Gladden, Matthew E. “GPT models’ learning and disclosure of personal data: An experimental vulnerability analysis.”, April 10, 2023.

This article’s full text can be viewed on or LinkedIn.

GPTf-PDVS banner (centered)Summary. The possible gathering, retention, and later dissemination of individuals’ personal data by AI systems utilizing Generative Pretrained Transformers (GPTs) is an area that’s of growing concern from legal, ethical, and business perspectives. On March 31, 2023, for example, the Italian Data Protection Authority (GPDP) implemented temporary restrictions on the processing of ChatGPT users’ data by OpenAI, contending that “there appears to be no legal basis underpinning the massive collection and processing of personal data in order to ‘train’ the algorithms on which the platform relies” and that “the information made available by ChatGPT does not always match factual circumstances, so that inaccurate personal data are processed.”

To develop a better understanding of at least one aspect of the privacy risks involved with the rapidly expanding use of GPT-type systems and other large language models (LLMs) by the public, we conducted an experimental analysis in which we prepared a series of GPT models that were fine-tuned on a Wikipedia text corpus into which we had purposefully inserted personal data for hundreds of imaginary persons. (We refer to these as “GPT Personal Data Vulnerability Simulator” or “GPT-PDVS” models.) We then used customized input sequences (or prompts) to seek information about these individuals, in an attempt to ascertain how much of their personal data a model had absorbed and to what extent it was able to output that information without confusing or distorting it.

The results of our analysis are described in this article. They suggest that – at least with regard to the class of models tested – it’s unlikely for personal data to be “inadvertently” learned by a model during its fine-tuning process in a way that makes the data available for extraction by system users, without a concentrated effort on the part of the model’s developers. In particular, the analysis found that:

  • In response to targeted requests, models whose fine-tuning corpus had included personal data for certain individuals came marginally closer to being able to output their personal data than models whose fine-tuning corpora had lacked the individuals’ data.
  • Models became slightly more adept at “interpreting” requests for individuals’ personal data if they had been fine-tuned on a corpus containing personal data relating to many persons than if they’d been fine-tuned on a corpus containing personal data that had the same overall quantity but which all related to a single individual.
  • Even those models that came the “closest” to being able to learn and later output personal data still proved completely incapable of doing so in a way that would be of value to unauthorized parties seeking to acquire and exploit such information. For example, when asked 6,000 times to output the year of birth of individuals whose personal data had been inserted into their fine-tuning corpora, on four occasions various GPT-PDVS models generated the text string “19” for persons who had indeed been born in a year between 1935-1999; however, in no instance did a model manage to successfully generate an individual’s complete year of birth.

In sum, there appears to be only a negligible risk that sensitive personal data might be absorbed during fine-tuning and later outputted to users by the sort of GPT models developed for this study. Nevertheless, the development of ever more powerful models – and the existence of other avenues by which models might possibly absorb individuals’ personal data – means that the findings of this analysis are better taken as guideposts for further scrutiny of GPT-type models than as definitive answers regarding any potential InfoSec vulnerabilities inherent in such LLMs.