Seminars
Geno-Weaving: Capacity-Achieving Coding for DNA Data Storage
Description
Background:
DNA data storage has emerged as a compelling medium for archival data, offering extraordinary density and durability. In practice, data is encoded into millions of short synthetic DNA strands stored in a liquid container. When retrieved, the strands are read by nanopore sequencers in random order and a random number of times, and are corrupted by substitution and deletion errors. Weinberger and Merhav [4] formally characterized the capacity of this channel, accounting for the Poisson-distributed read counts and the permutation of strands.
The classical approach to protect data in this setting is a concatenation scheme: a per-strand inner code handles sequence errors, and an outer Reed–Solomon code handles strand-level erasures. A fundamental limitation of this approach is that current DNA synthesis technology restricts strand lengths to roughly 200–300 nucleotides—a regime in which inner codes suffer a significant finite-block-length rate penalty.
Modern Relevance:
Wang and Guruswami [1] proposed Geno-Weaving to overcome this bottleneck. Rather than coding along each strand, the key idea is to code across strands: a single block code protects the same nucleotide position across all strands simultaneously. Since the number of strands (millions) is orders of magnitude larger than the strand length, this orthogonal direction provides a vastly larger effective block length, eliminating the finite-length penalty. The scheme uses polar codes and provably achieves the capacity of the DNA storage channel. Lin, Wang, and Guruswami [2] extended this framework to deletion errors, demonstrating—theoretically and through simulations—that a Geno-Weaving design for substitution channels already performs well on deletion channels at realistic rates of 0.1%–10%, outperforming even optimal concatenation schemes.
Seminar Focus:
This seminar explores the Geno-Weaving framework and its extension to deletion errors. The goal is to develop a deep understanding of the channel model, the capacity-achieving code construction, and the block-length advantage over concatenation. The student will work through the encoding and decoding procedures in detail, develop illustrative examples, and carry out a quantitative comparison of Geno-Weaving's code rates against those of concatenation schemes for different deletion probabilities.
Expected Outcomes:
- Channel Model & Capacity: A clear explanation of the DNA storage channel, including Poissonization and the three challenge factors (noise, permutation, and sampling), with worked examples.
- Code Construction: Step-by-step illustrations of the Geno-Weaving encoding and decoding, including the push/pull synchronization mechanism for deletion errors.
- Rate Comparison: Plots comparing Geno-Weaving's achievable rates against concatenation schemes across a range of deletion probabilities, with a discussion of the block-length gain.
References:
[1] H.-P. Wang and V. Guruswami, "Geno-Weaving: A Framework for Low-Complexity Capacity-Achieving DNA Data Storage," *IEEE Journal on Selected Areas in Information Theory*, vol. 6, pp. 383–393, 2025, doi: [10.1109/JSAIT.2025.3610643](https://doi.org/10.1109/JSAIT.2025.3610643).
[2] Y.-T. Lin, H.-P. Wang, and V. Guruswami, "Block Length Gain for Nanopore Channels," *arXiv preprint arXiv:2511.18027*, Nov. 2025.
[3] L. Organick, S. D. Ang, Y.-J. Chen, R. Lopez, S. Yekhanin, K. Makarychev, et al., "Random Access in Large-Scale DNA Data Storage," *Nature Biotechnology*, vol. 36, no. 3, pp. 242–248, Mar. 2018, doi: [10.1038/nbt.4079](https://doi.org/10.1038/nbt.4079).
[4] N. Weinberger and N. Merhav, "The DNA Storage Channel: Capacity and Error Probability Bounds," *IEEE Transactions on Information Theory*, vol. 68, no. 9, pp. 5657–5700, Sep. 2022, doi: [10.1109/TIT.2022.3176371](https://doi.org/10.1109/TIT.2022.3176371).
Prerequisites
- Channel Coding
Supervisor:
Solvers for Matrix Code Equivalence
complexity cryptography matrix-codes
Description
A matrix code is a subspace of Fq^{m x n}. Given two K-dimensional matrix codes C and D, the Matrix Code Equivalence (MCE) problem asks whether there exists invertible matrices P \in Fq^{m x m} and Q \in Fq^{n x n} such that C = PDQ where the latter notation means {PAQ : A \in D}. In other words, we want to determine if C and D are equivalent up the left- and right-scrambling by invertible matrices. There exist any many solvers for this problem, though all the solvers run in exponential time.
The security of certain cryptographic schemes in the current NIST competition rely on the MCE problem and therefore it is important to understand the true complexity of solving it. The task of the student is to understand and explain how the existing solvers for MCE work. In particular, they should they do so for at least the following listed solvers in the literature.
[1] A. Couvreur and C. Levrat, “Highway to Hull: An Algorithm for Solving the General Matrix Code Equivalence Problem,” Advances in Cryptology – CRYPTO 2025, pp. 253–283, 2025, doi: 10.1007/978-3-032-01855-7_9.
[2] A. K. Narayanan, Y. Qiao, and G. Tang, “Algorithms for Matrix Code and Alternating Trilinear Form Equivalences via New Isomorphism Invariants,” Advances in Cryptology – EUROCRYPT 2024, pp. 160–187, 2024, doi: 10.1007/978-3-031-58734-4_6.
[3] W. Beullens, “Graph-Theoretic Algorithms for the Alternating Trilinear Form Equivalence Problem,” Advances in Cryptology – CRYPTO 2023, pp. 101–126, 2023, doi: 10.1007/978-3-031-38548-3_4.
Prerequisites
- Linear Algebra
- Familiarity with finite fields and basic coding theory
Supervisor:
Codes for Substring Edits
file synchronization, coding theory
Description
The document exchange problem considers the scenario where two parties have slightly different versions of the same file, and one of them wants to update their version to match the other. Instead of sending the whole file, the idea is to share only a small description of the changes. Substring edits, where changes happen locally within the file, are a natural way to model these differences. This seminar will examine how coding theory can be applied to enhance the efficiency of document exchange under substring edits.
Supervisor:
Post-Quantum Key Exchange
Description
Post-quantum cryptography (PQC) has been an active area of research since the seminal work of Shor.
Indeed, most public-key cryptography currently deployed (such as RSA and elliptic-curve-based schemes) is vulnerable to quantum adversaries.
This applies in particular to public-key encryption (PKE) schemes.
An attacker could record encrypted traffic and later decrypt it once a sufficiently capable quantum computer becomes available - a strategy known as "harvest now, decrypt later."
Post-quantum secure alternatives can be constructed from hard problems on codes and lattices. The recently standardized Kyber and HQC follow an encryption-based approach, while, e.g., NewHope is based on a key reconciliation mechanism:
Alice and Bob obtain noise variants of a common secret, and error correction removes this noise, allowing them to agree on the same shared key.
This project will survey constructions for exchanging a key in code-based and lattice-based cryptography. These constructions are to be categorized based on key properties, such as underlying metric, bandwidth requirements, and underlying assumptions.
An overview of the key techniques is to be developed, with particular focus on the question of whether they can be transferred from codes to lattices and vice versa. The project requires reading and understanding several references; a good starting point can be the following works:
Aguilar-Melchor, Carlos, et al. "Efficient encryption from random quasi-cyclic codes." IEEE Transactions on Information Theory 64.5 (2018): 3927-3943.
Bos, Joppe, et al. "CRYSTALS-Kyber: a CCA-secure module-lattice-based KEM." 2018 IEEE European symposium on security and privacy (EuroS&P). IEEE, 2018.
Alkim, Erdem, et al. "Post-quantum Key Exchange — A new hope." 25th USENIX security symposium (USENIX Security 16). 2016.
Supervisor:
Graph Entropy in Combinatorics
Description
Information theory and combinatorics are deeply intertwined. Beyond the use of combinatorics in coding theory and compression, there are many -sometimes surprising- connections.
One such connection is the use of graph entropy in combinatorial existence proofs.
This seminar topic is about explaining the proof technique introduced in [1] and [2] and applied in [3]. The goal is a tutorial-style paper with the focus on clear exposition through well chosen worked examples and visualizations.
[1] M. Fredman, and J. Komlós, On the Size of Separating Systems and Perfect Hash Functions, SIAM J. Alg. Disc. Meth., 5 (1984), pp. 61-68.
[2] J. Körner, Fredman-Komlós bounds and information theory, SIAM J. on Algebraic and Discrete Meth., 4(7), (1986), pp. 560–570.
[3] N. Alon, E. Fachini, and J. Körner, Locally Thin Set Families, Combinatorics, Probability and Computing, vol. 9 (Nov. 2000), pp. 481–488.