M.Sc. Frederik Walter
Technical University of Munich
Associate Professorship of Coding and Cryptography (Prof. Wachter-Zeh)
Postal address
Theresienstr. 90
80333 München
- Phone: +49 (89) 289 - 23492
- Room: 0104.04.403
- frederik.walter@tum.de
Biography
Frederik is currently a doctoral candidate in the Associate Professorship of Coding and Cryptography (Prof. Wachter-Zeh) with the Institute for Communications Engineering, School of Computation, Information and Technology, TUM, Munich, Germany.
Before joining the institute, he was part of a medical technology startup that was funded by the eXist program of the German Ministry of Economic Affairs. From 2017 to 2020, he worked in a management consultancy, advising clients on complex strategic transformations.
Frederik received his M.Sc degree with high distinction (1.0) in Electrical Engineering and Information Technology from TUM in 2016. During this time, he spent a semester at the National University of Singapore. He received his B.Sc. also with high distinction (1.0) in Electrical Engineering from Ulm University in 2014.
Research Interests
Frederik's research interests focus on coding and information theory applied to the area of DNA data storage within the DiDAX project. In particular, he studies improvements in the synthesis of DNA strands.
One field is the use of composite DNA as a tool to increase the alphabet size and, therefore, the information density. Instead of synthesizing only one nucleotide in each cycle, mixing several nucleotides and encoding information in the occurring ratio is possible. Therefore, many interesting coding-related challenges arise when the strands are exposed to substitution, deletion, and/or insertion errors.
Another area in his research is the efficient synthesis for gene expression analysis. He applies coding theoretic concepts to minimize the synthesis time while ensuring all genes are uniquely represented on a testing array.
Teaching
Teaching Assistant at TUM
Teaching Assistant at Ulm University
- Fundamentals of Electrical Engineering I
- Fundamentals of Electrical Engineering II
- Signals and Systems
Available Theses
Theses in Progress
Two-Deletion Correcting Codes
Description
Background:
This seminar explores the design of error-correcting codes capable of correcting two deletions—a natural yet significantly challenging progression from single-deletion codes. The study of deletion (and insertion) correcting codes dates back to the early 1960s. Levenshtein’s work [3] showed that VT codes [5], originally designed for asymmetric error correction, also handle single deletion/insertion errors with near-optimal redundancy of at most log?(n) bits. This elegant construction spurred extensive research to extend these ideas to correct multiple deletions. However, extending VT codes to the two-deletion scenario has proven difficult; natural augmentations with additional check equations have not achieved the desired performance.
Modern Relevance & DNA Data Storage:
Deletion-correcting codes have gained practical importance in emerging fields like DNA data storage, where sequencing errors often manifest as deletions. In these applications, efficient and low-redundancy codes are crucial for reliable data retrieval, highlighting the importance of advancements in two-deletion correcting codes.
Seminar Focus:
Recent years have seen renewed interest in this topic, with several works proposing two-deletion correcting codes that achieve good asymptotic redundancy [1][2][4]. The goal of this seminar is to deeply understand two of these approaches. The student will develop detailed examples and illustrations to explain the code constructions, followed by an analysis comparing the redundancy of the two codes for various code lengths
Expected Outcomes:
- Detailed Examples: Step-by-step encoding and decoding demonstrations for two codes, illustrating the underlying principles.
- Visual Illustrations: Graphics that clearly depict the error-correcting capabilities.
- Comparative Analysis: A discussion comparing the two approaches, supported by quantitative analysis and visual data showing redundancy for different code lengths.
References:
[1] R. Gabrys and F. Sala, “Codes Correcting Two Deletions,” IEEE Transactions on Information Theory, vol. 65, no. 2, pp. 965–974, Feb. 2019, doi: 10.1109/TIT.2018.2876281.
[2] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.
[3] V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, Soviet Union, 1966, pp. 707–710.
[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.
[5] R. R. Varshamov and G. M. Tenengolts, “Codes which correct single asymmetric errors (in Russian),” Automatika i Telemkhanika, vol. 161, no. 3, pp. 288–292, 1965.
Prerequisites
- Channel Coding course completed
- Very good understanding of linear algebra and combinatorics
Supervisor:
Marker Codes for Strand Breaks in Composite DNA
Description
DNA Data Storage
Data storage on DNA molecules is a promising approach for archiving massive data.
In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called strands using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via DNA sequencing and decoded back into the binary representation.
The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).
A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3
Composite DNA
Recently, to allow higher potential information capacity, [1,2] introduced the DNA composite synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create composite DNA symbols, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.
Problem description
ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences.
Another important error model, especially for long-term storage, is dealing with breaks in the DNA sequence. The problem of identifying fragments caused by a single break is addressed using marker sequences, which help determine if a resulting DNA fragment is a prefix or suffix of the original sequence. Marker sequences are inserted at the beginning and end of DNA strands to locate breaks and reconstruct the original sequence. To prevent marker sequences from appearing within the data, Run-Length Limited (RLL) coding is used, which limits consecutive identical symbols and ensures marker sequences do not occur in the data part. The goal is to reconstruct the original random composite matrix from individual fragments after a single break, restoring the stored information by identifying the position of each fragment and capturing base frequencies to restore the original distribution for each column of the DNA sequence.
References
[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat Biotechnol, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.
[2] Y. Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” Sci Rep, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.
[3] I. Shomorony and A. Vahid, “Torn-Paper Coding,” IEEE Transactions on Information Theory, vol. 67, no. 12, pp. 7904–7913, Dec. 2021, doi: 10.1109/TIT.2021.3120920.
[4] D. Bar-Lev, E. Yaakobi, Y. Yehezkeally, and S. Marcovich, “Adversarial Torn-paper Codes,” IEEE Trans. Inform. Theory, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3292895.
[5] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.
Supervisor:
Coding for Composite DNA
Description
DNA Data Storage
Data storage on DNA molecules is a promising approach for archiving massive data.
In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called strands using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via DNA sequencing and decoded back into the binary representation.
The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).
A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3
Composite DNA
Recently, to allow higher potential information capacity, [1,2] introduced the DNA composite synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create composite DNA symbols, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.
Problem description
ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences. Especially for composite DNA data storage, these error types received only little attention.
The most related work to this problem was recently studied by Zhang et al. in [6]. The authors initiated the study of error-correcting codes for DNA composite. They considered an error model for composite symbols, which assumes that errors occur in at most t symbols, and their magnitude is limited by l. They presented several code constructions as well as bounds for this model. In this thesis, we want to analyse a different way to model the composite synthesis method and studies additional error models. We already have some results for substitution and single deletion errors. This thesis should focus on evaluating more error models in the channel model.
This should only roughly introduce the problem. No need to review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.
References
[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat Biotechnol, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.
[2] Y. Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” Sci Rep, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.
[3] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.
[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.
[5] I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-Deletion Single-Substitution Correcting Codes,” IEEE Transactions on Information Theory, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3319088.
[6] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.
Supervisor:
Publications
- Coding for Composite DNA to Correct Substitutions, Strand Losses, and Deletions. 2024 IEEE International Symposium on Information Theory (ISIT), IEEE, 2024, 97-102 more… Full text ( DOI )
- Coding for Composite DNA to Correct Substitutions, Strand Losses and Deletions. Summer Doctoral Seminar 2024, 2024 more…
- In-Product Authentication. Dagstuhl Seminar 24511, 2024 more…
- Towards a Cryptographic Framework for DNA Data Storage. Coding Theory and Algorithms for DNA-based Data Storage - ISIT2024 Satellite Workshop, 2024 more…
- Coding for Synthesis of Composite DNA. Meeting on DNA Data Applications, 2023 more…