Understanding Guarantees and Pitfalls of Differential Privacy
Description
Many data-driven applications can be modeled as a communication between a data curator and a data analyst, which queries a database for particular population statistics. When the individual database entries are considered sensitive information, the data curator can undertake additional measures to ensure privacy of individual database entries.
Differential Privacy (DP) [1] has become a popular notion for data privacy, measuring the ability of a curious data analyst to discriminate between the value of different sensitive database entries. To use DP in practical systems, it is important to understand the fundamental guarantees of a system that claims to ensure DP.
While it is sometimes believed that DP guarantees hold unconditionally and even in the presence of arbitrary side information, it has been shown that it is not possible to provide privacy and utility without making assumptions about how the data are generated [2]. In particular, dependence (correlation) between different database entries can be exploited to break the alleged privacy guarantees [3].
In this seminar topic, the student will make himself familiar with the definition and formal guarantees of DP and study the issues and pitalls of DP, particularly with a focus on dependent data distributions. The student will summarize his results in the form of a scientific presentation and a scientific article, based on her own reading of scientific papers. These include but are not necessarily limited to the recommended references [1-3].
[1] C. Dwork and A. Roth, “The Algorithmic Foundations of Differential Privacy,” TCS, 2014.
[2] D. Kifer and A. Machanavajjhala, "No free lunch in data privacy," Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data (SIGMOD '11).
[3] C. Liu, S. Chakraborty, and P. Mittal, “Dependence Makes You Vulnerable: Differential Privacy Under Dependent Tuples,” in Proceedings of the Network and Distributed System Security Symposium, 2016.
Contact
Luis Maßny (luis.massny@tum.de)
Supervisor:
Differentially-Private and Robust Federated Learning
Description
Federated learning is a machine learning paradigm that aims to learn collaboratively from decentralized private data owned by entities referred to as clients. However, due to its decentralized nature, federated learning is susceptible to poisoning attacks, where malicious clients try to corrupt the learning process by modifying their data or local model updates. Moreover, the updates sent by the clients might leak information about the private data involved in the learning. This thesis aims to investigate and combine existing robust aggregation techniques in FL with differential privacy techniques.
References:
[1] - https://arxiv.org/pdf/2304.09762.pdf
[2] - https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9757841
[3] - https://dl.acm.org/doi/abs/10.1145/3465084.3467919
Prerequisites
- Knowledge about machine learning and gradient descent optimization
- Proficiency in Python and PyTorch
- Undergraduate statistics courses
- Prior knowledge about differential privacy is a plus
Contact
marvin.xhemrishi@tum.de
luis.massny@tum.de