Differential Privacy

Motivation & Historical Context

Privacy in statistical databases is the foundational problem: how to release useful aggregate information about a population without leaking information about individuals. Pre-DP approaches all failed under composition with auxiliary information.

Dalenius's Desideratum

"Anything that can be learned about a respondent from the statistical database should be learnable without access to the database." ~ Tore Dalenius, 1977

Impossibility result (Dwork 2006): This semantic notion is unachievable for any non-trivial database utility. The proof is essentially a Bayesian argument: if the database is informative, releasing it shifts posteriors over individuals, even for individuals not in the database. Example: a study showing "smoking causes cancer" reveals information about a smoker not in the study.

This forced the field to move from absolute privacy to relative privacy — i.e., the database's release should not significantly increase what an adversary learns about any individual compared to a counterfactual world where that individual opted out. This is the conceptual seed of DP.

[!note] The opt-out semantics DP guarantees: your privacy loss is bounded by the marginal effect of your data being in the database. It does not guarantee that an adversary cannot learn things about you — only that they learn essentially the same things whether you participated or not.

Why Anonymization Fails

k-anonymity (Samarati–Sweeney 1998): Each record indistinguishable from k−1 others on quasi-identifiers. ℓ-diversity (Machanavajjhala 2007): Each equivalence class has ℓ "well-represented" sensitive values. t-closeness (Li 2007): Distribution of sensitive attributes in each class close to global distribution.

All three fall to linkage attacks with auxiliary information:

  • GIC Massachusetts (1997): Latanya Sweeney re-identified Governor Weld from "anonymized" hospital records using voter rolls (ZIP + DOB + sex).
  • AOL search log release (2006): Searcher #4417749 → Thelma Arnold.
  • Netflix Prize (2008): Narayanan–Shmatikov linked Netflix ratings to IMDb profiles.
  • Genome-wide association studies: Homer et al. (2008) detected presence in pooled allele frequencies.

The deeper lesson: syntactic privacy guarantees (about the data format) cannot survive arbitrary auxiliary information; only semantic guarantees (about the mechanism) can.

Core Definitions

Neighboring Datasets

Two datasets are neighbors, written , if they differ in one record. Two conventions:

ConventionDefinitionNotes
Add/Remove (unbounded DP) obtained by adding or removing one record; Hides participation; standard in theory
Replace (bounded DP) same size, differ in one record's valueHides record value; assumed in many applied papers

These differ by a factor of 2 in sensitivity for many queries. Always check which convention a paper uses. The Census 2020 release uses bounded.

ε-Differential Privacy

A randomized mechanism is ε-differentially private if for all neighboring and all measurable :

This is the pure DP definition (Dwork–McSherry–Nissim–Smith 2006). The privacy budget is the only parameter. Smaller = more privacy.

Interpretation via likelihood ratio: bounds the log-likelihood ratio of the output under any two neighbor inputs:

This makes DP a hypothesis-testing statement: no test of " vs " can do significantly better than chance.

(ε,δ)-Differential Privacy

is (ε,δ)-DP if for all and all :

Semantic meaning of δ: with probability at most , the pure ε-DP guarantee fails catastrophically. Standard rule: (often or cryptographically small), otherwise "release a uniformly random record with probability δ" satisfies (0, δ)-DP — clearly not private.

[!tip] Picking δ A common choice in DP-ML is for users. The "δ should be sub-polynomial in n" heuristic is a safety check, not a theorem.

Privacy Loss Random Variable

For neighbors and output , the privacy loss:

ε-DP says pointwise. (ε,δ)-DP can be characterized as: in a certain sense (this is roughly the statement, modulo δ-on-both-sides subtleties).

This RV-based view is the gateway to Rényi DP, CDP, and f-DP.

Fundamental Properties

Post-Processing Immunity

If is (ε,δ)-DP and is any (possibly randomized) function on , then is (ε,δ)-DP.

This is the most useful property in practice: once data is privatized, any downstream computation preserves the guarantee. No need to track privacy through ML training, plotting, summary tables, etc., as long as the original release was DP and no fresh data is touched.

Proof sketch: For any , is measurable, then apply DP definition to that set.

Group Privacy

If is ε-DP and differ in records, then:

So DP degrades gracefully for correlated groups (families, social network ties) — but linearly in . For tightly correlated populations (e.g., genomes of identical twins), the effective privacy degrades by the correlation horizon.

This is the formal sense in which DP "fails" for correlated data — see Pufferfish Privacy for explicit correlation modeling.

Composition Theorems

[!note] Composition is the heart of DP accounting. Every released statistic, every gradient step, every query consumes privacy budget.

Basic (Sequential) Composition: If is -DP and is -DP (possibly adaptive on 's output), then their composition is -DP.

Advanced Composition (Dwork–Rothblum–Vadhan 2010): adaptive -DP mechanisms compose to:

The key gain: the -term scales as instead of — a CLT-style improvement. Proof uses Azuma/martingale concentration on the privacy-loss RV.

Optimal Composition (Kairouz–Oh–Viswanath 2017): There is an exact formula for the tightest valid after composing copies of -DP. The formula is #P-hard to compute exactly for heterogeneous compositions (Murtagh–Vadhan 2016), but tractable in homogeneous cases.

Parallel Composition: If acts on disjoint subsets of , the overall privacy is (no summation). Used in histogram releases.

[!tip] Why advanced composition matters for DP-SGD A training run of steps with per-step would naively cost — useless. Advanced composition gives — still too high. Modern accountants (RDP, GDP, PRV) push this much lower.

Convexity

The set of DP mechanisms is closed under convex combinations: if are ε-DP and , then is ε-DP. Useful for randomized mechanism selection but does NOT mean ε is preserved under data-dependent mixing.

Sensitivity & Basic Mechanisms

Global Sensitivity

For a query :

used for Laplace, for Gaussian. The sup is over the worst-case pair of neighbors — this is what makes DP a worst-case guarantee.

Examples:

Query
Count1
Sum of values in
Mean of values in
Histogram (add/remove)1
Histogram (replace)2
Medianunbounded in general (use smooth sensitivity)

Laplace Mechanism

where has density .

Theorem: is ε-DP.

Proof is one line: the ratio of Laplace densities at vs is at most .

Gaussian Mechanism

Classical Theorem: is -DP if

for . (The classical bound has a known gap; the Analytic Gaussian Mechanism of Balle–Wang 2018 gives tight for any .)

MechanismPrivacyNoise scaleUse when
LaplacePure ε-DPLow-dim, want pure DP
Gaussian(ε,δ)-DPHigh-dim, composes well

The Gaussian wins in high dimensions because — the sensitivity can be smaller.

Exponential Mechanism

For utility function with sensitivity :

Theorem: ε-DP. Used for selecting from a discrete set (e.g., picking a "best" model, releasing a median, mechanism design).

Utility guarantee: with probability ,

[!note] Connection to Boltzmann/Gibbs measures The exponential mechanism is just the Gibbs distribution at temperature over utilities. This is not coincidental: see McSherry–Talwar 2007 for the connection to mechanism design, where the exponential mechanism gives approximately truthful mechanisms because the player's gain from misreporting is bounded by their privacy loss.

This last point is exactly relevant to your work: DP gives a clean notion of "approximate truthfulness" for free, since changing one's report changes one's output by at most .

Randomized Response

The original DP mechanism (Warner 1965, predating the formalism by 40 years). For a yes/no question:

  • With probability , answer truthfully.
  • With probability , flip a fair coin and answer.

This is -DP at the individual level. Foundational for Local DP.

Sparse Vector Technique (SVT)

Answers a stream of threshold queries but only pays privacy for the YES answers.

Algorithm: Add noise to the threshold once, then add fresh noise to each . Output "above" if noisy exceeds noisy ; stop after "above"s.

Privacy: -DP with budget independent of the number of NO answers — pay only for above-threshold answers. Foundational for monitoring rare events.

[!question] Why is SVT subtle? The original "improvements" in literature were buggy: Lyu–Su–Li (2017) catalog six wrong variants in published papers, including the original textbook version. Always verify SVT proofs carefully.

Refined Privacy Notions

The classical definition has poor compositional properties (need messy advanced composition). Modern variants give tighter accounting.

Concentrated DP (CDP)

Dwork–Rothblum 2016: A mechanism is -CDP if the privacy loss RV has subgaussian tails. More precisely, for all :

Zero-Concentrated DP (zCDP) (Bun–Steinke 2016): Defined via Rényi divergence (see below). Equivalent characterizations.

Composition is additive in : mechanisms with compose to . No tricks needed.

Conversion: -zCDP -DP.

Rényi Differential Privacy (RDP)

Mironov 2017: Defined via Rényi divergence:

is -RDP if for all : .

Key properties:

  • : KL divergence
  • : max divergence = pure DP
  • Composition: additive in at fixed
  • Gaussian mechanism: -RDP for all

This last identity is huge: it gives closed-form RDP for Gaussian, which makes RDP the dominant accounting framework for DP-SGD.

Conversion: -RDP -DP for any . Optimize over .

f-Differential Privacy / Gaussian DP

Dong–Roth–Su 2019: Define DP directly via the hypothesis-testing tradeoff.

For trade-off function with optimal type-II error at type-I error , mechanism is -DP if its testing tradeoff curve dominates .

Gaussian DP: special case where is the tradeoff curve of vs . A mechanism is -GDP if its tradeoff dominates this Gaussian curve.

Central Limit Theorem for DP: Compositions converge to Gaussian DP — composition of mechanisms each -GDP gives -GDP. This is the cleanest composition theorem in DP.

NotionCompositionMechanism characterizationTightness
-DPAdvanced composition (lossy)OriginalLoose under composition
zCDP / RDPAdditiveClosed-form for GaussianTight for Gaussian
-DP / GDP"Hockey-stick" or CLTTight via tradeoff functionTightest known
PRV accountantNumericalPrivacy Loss DistributionNumerically tight

Privacy Loss Distribution (PLD) Accountants

Koskela–Jälkö–Honkela 2020, Gopi–Lee–Wutschitz 2021: Numerically track the entire distribution of , then compose via FFT-based convolution. Currently the tightest practical accountant, used in Opacus and TF-Privacy.

Local Differential Privacy

Local DP

Mechanism is ε-LDP if for all , all : .

Each user perturbs their own data before sending. No trusted aggregator needed.

Tradeoff: LDP requires more samples than central DP for the same accuracy (Duchi–Jordan–Wainwright 2013). The information-theoretic price of removing the trusted curator.

Implementations:

  • Google's RAPPOR (Erlingsson–Pihur–Korolova 2014): Chrome telemetry. Uses randomized response on Bloom filters.
  • Apple iOS: Emoji counts, Safari crash reports. Used ε per query of 1–4, with potentially thousands of queries — controversial.
  • Microsoft: Windows telemetry.

Shuffle DP

Cheu–Smith–Ullman–Zeber–Zhilyaev 2019: Intermediate trust model — users add LDP noise, a shuffler anonymizes, then aggregation.

Amplification by shuffling: A user's becomes in the central view. Closes much of the LDP–central gap without a fully trusted curator.

DP in Machine Learning

DP-SGD

Abadi–Chu–Goodfellow–McMahan–Mironov–Talwar–Zhang 2016: The dominant approach to training DP models.

for batch in batches:
    grads = []
    for x in batch:
        g = compute_per_example_gradient(x)
        g = g * min(1, C / ||g||_2)   # per-example clipping
        grads.append(g)
    g_sum = sum(grads)
    g_sum = g_sum + N(0, sigma^2 * C^2 * I)   # Gaussian noise
    theta = theta - lr * g_sum / batch_size

Key ingredients:

  1. Per-example gradient clipping to bound to
  2. Gaussian noise scaled to for -DP
  3. Privacy amplification by subsampling: a -DP mechanism applied to a -Poisson-subsample of is -DP for small (Beimel et al. 2014, Balle et al. 2018)
  4. RDP accounting across steps

Practical reality: Strong DP guarantees () cost ~10-20% accuracy on ImageNet-scale tasks (De et al. 2022, Berrada et al. 2023). Recent work (Sander et al., Bu et al.) closes much of this gap via large batches, foundation-model fine-tuning, and architectural choices (group normalization beats batch normalization).

Privacy Amplification

TechniqueAmplificationReference
Subsampling at rate Beimel et al. 2014
ShufflingErlingsson et al. 2019
Iteration (PABI)Decaying under strong convexityFeldman–Mironov–Talwar–Thakurta 2018

PATE

Papernot et al. 2017: Privacy-preserving aggregation of teachers. Train models on disjoint partitions, aggregate predictions via noisy voting, distill into a student. Achieves strong privacy because individual records affect only one teacher, and the noisy aggregation provides DP.

Theoretical Foundations

Reconstruction Attacks

Dinur–Nissim 2003 (pre-DP, motivating the field): Releasing subset-sum queries with noise allows reconstructing fraction of the database. This is the fundamental lower bound that forces noise per query in any private mechanism.

Generalized to: Bun–Ullman–Vadhan 2014, Dwork–Smith–Steinke–Ullman 2017 — modern "fingerprinting codes" give tight lower bounds.

The Census 2020 redistricting controversy was downstream of this: the Census reconstruction attack by Garfinkel et al. showed that the 2010 SF1 summary file allowed reconstruction of microdata for of US population — driving the move to DP for 2020.

Sample Complexity Lower Bounds

For ε-DP estimation of a -dim parameter:

  • Mean estimation: for -DP
  • Histogram release: for -DP
  • PAC learning: for proper learning

Pure ε-DP often costs an extra vs — this is one of the main reasons to admit small δ.

DP and Generalization

Dwork–Feldman–Hardt–Pitassi–Reingold–Roth 2015: If a model is selected from a -DP procedure, then its empirical and population losses are close with high probability. This is the foundation of adaptive data analysis:

[!note] DP Generalization If a hypothesis is output by a -DP algorithm with , , then w.p. .

DP gives a stability-based generalization bound that survives adversarial query reuse — this is what makes DP relevant outside privacy proper, e.g., for safely re-using validation sets.

This is also why DP is sometimes proposed as a route to robust statistics and even safety: DP-trained models are by construction less sensitive to individual training points.

Variants & Extensions

Pufferfish Privacy

Kifer–Machanavajjhala 2014: Generalizes DP to handle:

  1. Secrets : discriminating pairs of facts to hide
  2. Discriminative pairs : which secrets are sensitive
  3. Data evolution model : distribution over inputs (handles correlations)

A mechanism is -Pufferfish if for all in and all priors :

DP is recovered as the Pufferfish instance where secrets = "record takes value " and the data model is i.i.d. Pufferfish gives the correct notion for correlated/sequential data.

Geo-Indistinguishability

Andrés–Bordenabe–Chatzikokolakis–Palamidessi 2013: DP variant for location data, where neighboring datasets are replaced by neighboring points in . A mechanism is -geo-indistinguishable if for all locations with :

Implemented via planar Laplace noise. Used in Apple Maps and several research systems.

Distributed/Federated DP

Federated learning naturally pairs with DP to give two layers of protection. Two variants:

  • Central DP under FL: Server adds noise to aggregated gradients (still needs trusted server).
  • Local DP per client: Each client adds noise; needs aggregation tricks (secure aggregation + DP) to recover utility.

Combination with secure aggregation (cryptographic MPC) lets the server see only the noisy sum, getting central-DP utility with local-DP trust.

DP, Mechanism Design & Game Theory

[!note] Direct link to your research DP is mechanism design under a specific objective (privacy as a property of the mechanism), and the proof techniques migrate in both directions.

Privacy as Approximate Truthfulness

McSherry–Talwar 2007: If is -DP and an agent's utility is bounded in , then for any agent and any misreport :

So any ε-DP mechanism is approximately dominant-strategy truthful with multiplicative slack . Exponential mechanism thereby gives a generic approach to approximately truthful mechanism design.

This is significantly weaker than VCG (it's -truthful, not exactly truthful), but it gives truthfulness in settings VCG cannot handle — e.g., when payments cannot be assumed or there are no transferable utilities.

Joint Differential Privacy

Kearns–Pai–Roth–Ullman 2014: Allows agent 's own output to depend strongly on their input (e.g., they receive their allocation), but the joint output to others is DP in . This is the right notion for matching markets, congestion games, and large-game equilibrium computation.

Application: JDP mechanisms compute approximate correlated equilibria of large anonymous games. Connection point with your mechanism-design-for-AI-cooperation work: in multi-agent settings with strategic LLM agents, JDP might be the right notion of "private mechanism" since each agent observes its own action recommendation.

Mechanism Design via DP

Nissim–Smorodinsky–Tennenholtz 2012: Use DP to compute approximate equilibria in digital goods auctions, matching with constraints, and facility location when exact truthful mechanisms don't exist.

The pattern: DP gives mechanisms that are robust to a single agent's misreport, which is essentially the definition of approximate strategyproofness. This trade — privacy for truthfulness — is a candidate framework for incentive-compatible AI agent protocols under your Cooperation Gap framework, where contractual incompleteness already forces approximate guarantees.

[!tip] Speculative connection to your work Your Irreducibility Theorem shows that mechanisms cannot fully substitute for -cooperative preferences under contractual incompleteness. DP-style approximate truthfulness might be exactly the right quantitative refinement: ask whether -DP mechanisms close the gap up to utility loss. This would parallel the way "smoothed analysis" gave a quantitative refinement to worst-case complexity.

Practical Realities & Critiques

What does ε mean in practice?

ε has no universally agreed semantics. Common heuristics:

  • ε ≤ 1: Strong privacy, recommended by Dwork.
  • ε ∈ [1, 10]: "Moderate," used in DP-SGD ML papers.
  • ε > 10: Effectively no meaningful pure-DP guarantee, though some hypothesis-testing interpretation remains.

Apple's per-query ε of 4–8: Defensible only because of (a) bounded query rate and (b) the use of LDP, where one bad query cannot fully leak any user.

Census 2020 PLB: Privacy Loss Budget per geographic level — different for total, P1, P2, … (P1 = sex by age, P2 = race, etc.) Total ε across all queries is small per person but compositional accounting is fragile.

Critiques

  • Tao Xu, Lin Liu et al.: DP guarantees rely on the worst-case adversary, but real adversaries may be weaker — DP overcorrects in some settings.
  • Cynthia Dwork on δ: "δ is a probability of completely embarrassing failure" — needs to be cryptographically small.
  • Kohli–Laskowski 2018: Survey participants don't understand ε; "differential privacy" becomes marketing.
  • Bagdasaryan–Poursaeed–Shmatikov 2019: DP-SGD has disparate impact — minorities suffer larger accuracy loss because their data is more "atypical" and gets clipped/noised more.

[!question] Open question Is there a notion of group-fair DP that controls per-subgroup accuracy loss? Recent work by Yaghini, Mironov et al. on "individual privacy accounting" partially addresses this.

Drawbacks

  • Worst-case sensitivity often dominates noise scale; smooth/local sensitivity helps but is harder to use.
  • Strong assumption: single-record sensitivity is the right unit. Households, time-series, social networks break this.
  • Hard to set ε from first principles (privacy budget allocation is an art).
  • Composition can quickly exhaust budget; once is spent, you cannot re-query.

Connections & Further Reading

  • Robust Statistics: DP estimators are automatically robust to single-point contamination; explicit connection via the "propose-test-release" framework (Dwork–Lei 2009).
  • Stability and Generalization: DP "perfect stability" in the sense of Bousquet–Elisseeff.
  • Mechanism Design: Approximate truthfulness via exponential mechanism; JDP for large games.
  • Information Theory: Hypothesis-testing tradeoff function -DP makes the connection to Le Cam–style le bounds explicit.
  • Cryptography: DP secure computation. MPC + DP gives strongest joint guarantee. The "shuffle model" is essentially a weak MPC primitive.
  • Algorithmic Fairness: Some tensions (disparate impact of DP-SGD) and some synergies (DP as a form of stability that helps generalize fairness constraints).
  • Pufferfish Privacy: The right generalization for correlated data (genomes, time series, networks).
  • Your Cooperation Gap framework: DP-style -truthfulness may give a quantitative version of "how close can mechanisms get to substituting cooperation" under contractual incompleteness.

Key References

  • Dwork–Roth 2014: The Algorithmic Foundations of Differential Privacy — the canonical monograph.
  • Vadhan 2017: The Complexity of Differential Privacy — theoretical survey.
  • Near–Abuah 2021: Programming Differential Privacy — applied/computational perspective.
  • Mironov 2017 (RDP), Bun–Steinke 2016 (zCDP), Dong–Roth–Su 2019 (f-DP) — modern accounting trio.
  • Abadi et al. 2016 — DP-SGD.
  • McSherry–Talwar 2007 — DP in mechanism design.