1

Measuring Non-Adversarial Reproduction of Training Data in Large Language Models

We measure the frequency at which LLMs reproduce training data when not prompted to do so adversarially, and find that it can happen frequently even on accident.

Michael Aerni, Javier Rando, Edoardo Debenedetti, Nicholas Carlini, Daphne Ippolito, Florian Tramèr

Evaluations of Machine Learning Privacy Defenses are Misleading

We find that empirical evaluations of heuristic privacy defenses can be highly misleading, and propose a new evaluation protocol that is reliable and efficient.

Michael Aerni, Jie Zhang, Florian Tramèr

Strong inductive biases provably prevent harmless interpolation

We show that the strength of a model’s inductive bias determines whether interpolation of noisy data is harmless or harmful.

Michael Aerni, Marco Milanta, Konstantin Donhauser, Fanny Yang

Interpolation can hurt robust generalization even when there is no noise

We reveal unexpected benefits of regularization even in the overparameterized regime by proving that for both linear regression and classification, avoiding interpolation significantly improves generalization.

Konstantin Donhauser, Alexandru Țifrea, Michael Aerni, Reinhard Heckel, Fanny Yang