Researcher, Engineer, Musician

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

Posted Jun 1, 2026

By Peihan Liu, Lucas Rosenblatt, Weiwei Kong, Natalia Ponomareva, Gautam Kamath, Rachel Cummings, Roxana Geambasu, Yu Gan, Lillian Tsai, Alex Bie

TPDP 2026

We built ContinuousBench, a benchmark for whether DP synthetic text retains the corpus-specific knowledge that motivated using a sensitive dataset in the first place. Existing benchmarks like IMDB and OpenReview are saturated — train-on-real beats no-training by only a few points (51% → 62% on OpenReview, 94% → 97% on IMDB), leaving no room to distinguish DP synthesis methods.

On ContinuousBench, training on the real corpus jumps a Gemma 3 4B model from ~1% → ~97% on Geminon and from ~14% → ~70% on News. Yet with differential privacy, even at ε = 100, SOTA DP synthesis only reaches ~4% on Geminon and ~26% on News, nowhere near train-on-real.

paper

differential privacy ai benchmark