Home
Publications
Contact
privacy
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models
We measure the frequency at which LLMs reproduce training data when not prompted to do so adversarially, and find that it can happen frequently even on accident.
Michael Aerni
,
Javier Rando
,
Edoardo Debenedetti
,
Nicholas Carlini
,
Daphne Ippolito
,
Florian Tramèr
PDF
Cite
Code
Blog post
Twitter
Cite
×