privacy

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

We measure the frequency at which LLMs reproduce training data when not prompted to do so adversarially, and find that it can happen frequently even on accident.

Michael Aerni, Javier Rando, Edoardo Debenedetti, Nicholas Carlini, Daphne Ippolito, Florian Tramèr