Home Research ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?
Research

ContinuousBench: Can Differentially Private Synthetic Text Improve Capabilities?

TPDP 2026

We built ContinuousBench, a benchmark for whether DP synthetic text retains the corpus-specific knowledge that motivated using a sensitive dataset in the first place. Existing benchmarks like IMDB and OpenReview are saturated — train-on-real beats no-training by only a few points (51% → 62% on OpenReview, 94% → 97% on IMDB), leaving no room to distinguish DP synthesis methods.

On ContinuousBench, training on the real corpus jumps a Gemma 3 4B model from ~1% → ~97% on Geminon and from ~14% → ~70% on News. Yet with differential privacy, even at ε = 100, SOTA DP synthesis only reaches ~4% on Geminon and ~26% on News, nowhere near train-on-real.