Benchmarking differentially private synthetic data generation algorithms

Which synthetic data generation algorithms for tabular datasets offer the best privacy/utility trade-offs? Tumult Labs did the research. Read the results below.

Research

Michael Hay

Summary: ‍

A synthetic dataset consists of a collection of records which are generated to match the properties of an original data source. It is an appealing option for a range of data sharing settings and differential privacy is the best way to guarantee that synthetically generated data protects the privacy of the original source data.

In this paper, we benchmark twelve published methods for generating differentially private tabular synthetic data to see which most accurately preserve the properties of the source data. We present a systematic benchmark where the utility of the synthetic data is evaluated by measuring whether it preserves the distribution of individual and pairs of attributes, pairwise correlation, as well as on the accuracy of an ML classification model. In a comprehensive empirical evaluation we identify the top performing algorithms and those that consistently fail to beat baseline approaches.

‍

Read paper

other Research articles

View All

Research

An innovative programming framework for authoring accurate, efficient and private algorithms

Designing a programming framework for writing complex yet safe differential privacy programs is no small task. This paper co-authored by Tumult Labs founders laid the foundation of the privacy framework used by our customers.

Research

Benchmarking differentially private synthetic data generation algorithms

Which synthetic data generation algorithms for tabular datasets offer the best privacy/utility trade-offs? Tumult Labs did the research. Read the results below.

Research

HDMM: Automatic optimization for accurately answering sets of high-dimensional queries under differential privacy

How to publish the output of large workloads of queries on datasets with many dimensions, achieving accuracy and scalability with strong privacy guarantees? Read the research co-authored by Tumult Labs founders

Research

AIMing Higher: A Smarter Approach to Privacy-Preserving Synthetic Data

Learn how the AIM algorithm, co-invented by Tumult Labs CEO Gerome Miklau, improves upon existing algorithms for synthetic data generation by adapting to the user’s analysis needs and capturing key patterns in the input data.

Research

A Winning Approach to Generating Synthetic Data

A scientific paper, co-authored by our CEO Gerome Miklau, introduces a cutting-edge method for generating differentially private synthetic data.

Research

Evaluating the usability of differential privacy tools with data practitioners

Researchers at University of Vermont ran a usability study to compare various differential privacy tools. Can you guess which platform study participants found easiest to use correctly?

Unleash the power and value of your data.

Request a Demo