Understanding Guarantees and Pitfalls of Differential Privacy
Description
Many popular applications, such as recommender systems, data mining, or machine learning require the collection and processing of large datasets. In view of the ubiquitous data collecting, users claim for privacy, but guaranteeing privacy while providing useful data is challenging. Differential Privacy (DP) [1] has become a popular notion for data privacy, which quantifies the ability of a curious data analyst to identify the value of sensitive data points. To use DP in practical systems, it is important to understand the fundamental guarantees of a system that claims to ensure DP.
While it is sometimes believed that DP guarantees hold unconditionally and even in the presence of arbitrary side information, it has been shown that it is not possible to provide privacy and utility without making assumptions about how the data are generated [2]. In particular, dependence (correlation) between different database entries can be exploited to break the alleged privacy guarantees [3].
In this seminar topic, the student will familiarize herself with the definition and formal guarantees of DP and study the issues and pitalls of DP, particularly with a focus on dependent data distributions. The student will summarize the findings in the form of a scientific presentation and a scientific article, based on her own reading of scientific papers. These can include but are not necessarily limited to the recommended references [1-3].
[1] C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” TCS, 2014.
[2] D. Kifer and A. Machanavajjhala, "No free lunch in data privacy," Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11).
[3] C. Liu, S. Chakraborty, and P. Mittal, “Dependence Makes You Vulnerable: Differential Privacy Under Dependent Tuples,” in Proceedings of the Network and Distributed System Security Symposium, 2016.
Contact
Luis Maßny (luis.massny@tum.de)