Walter, Frederik

M.Sc. Frederik Walter

Technische Universität München

Professur für Codierung und Kryptographie (Prof. Wachter-Zeh)

Postadresse

Theresienstr. 90
80333 München

Tel.: +49 (89) 289 - 23492
Raum: 0104.04.403
frederik.walter@tum.de

Biography

Frederik is currently a doctoral candidate in the Associate Professorship of Coding and Cryptography (Prof. Wachter-Zeh) with the Institute for Communications Engineering, School of Computation, Information and Technology, TUM, Munich, Germany.

Before joining the institute, he was part of a medical technology startup that was funded by the eXist program of the German Ministry of Economic Affairs. From 2017 to 2020, he worked in a management consultancy, advising clients on complex strategic transformations.

Frederik received his M.Sc degree with high distinction (1.0) in Electrical Engineering and Information Technology from TUM in 2016. During this time, he spent a semester at the National University of Singapore. He received his B.Sc. also with high distinction (1.0) in Electrical Engineering from Ulm University in 2014.

Research Interests

Frederik's research interests focus on coding and information theory applied to the area of DNA data storage within the DiDAX project. In particular, he studies improvements in the synthesis of DNA strands.

One field is the use of composite DNA as a tool to increase the alphabet size and, therefore, the information density. Instead of synthesizing only one nucleotide in each cycle, mixing several nucleotides and encoding information in the occurring ratio is possible. Therefore, many interesting coding-related challenges arise when the strands are exposed to substitution, deletion, and/or insertion errors.

Another area in his research is the efficient synthesis for gene expression analysis. He applies coding theoretic concepts to minimize the synthesis time while ensuring all genes are uniquely represented on a testing array.

Teaching

Teaching Assistant at TUM

Teaching Assistant an der Universität Ulm

Grundlagen der Elektrotechnik I
Grundlagen der Elektrotechnik II
Signale und Systeme

Available Theses

Download Arbeit als PDF

LDPC Codes for Deletion Correction

Beschreibung

Overview:
Traditional error correction codes like LDPC (Low-Density Parity-Check) are highly effective for substitution noise but struggle when applied to deletion channels, where bits are removed and the sequence alignment is shifted. This project explores an experimental, graph-based decoding approach for handling such synchronization errors, inspired by methods in the references below.

You will design and simulate a novel decoder that tracks and corrects drift—the cumulative effect of deletions—by using a factor graph with locally connected variable nodes and drift constraints. A central idea is to model the decoding process as inference on this structured graph, leveraging probabilistic message passing.

What You'll Learn:

How deletion errors affect classical coding theory
Principles of LDPC codes and belief propagation
Techniques for modeling time-varying constraints (drift) in a graphical setting
Implementation of custom decoders and simulators in Python

References:

R. Shibata, G. Hosoya, and H. Yashima, “Design of Irregular LDPC Codes without Markers for Insertion/Deletion Channels,” IEEE GLOBECOM, 2019. doi:
F. Wang, D. Fertonani, and T. M. Duman, “Symbol-Level Synchronization and LDPC Code Design for Insertion/Deletion Channels,” IEEE Trans. Commun., vol. 59, no. 5, 2011. doi:

Voraussetzungen

Channel Coding
Codes on Graphs or Channel Codes for Iterative Decoding (Understanding of LDPC codes and factor graphs)
Good understanding of probability and linear algebra
Familiarity with Python programming

Betreuer:

Frederik Walter

Download Arbeit als PDF

Coding for Composite DNA

Beschreibung

DNA Data Storage

Data storage on DNA molecules is a promising approach for archiving massive data.

In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called strands using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via DNA sequencing and decoded back into the binary representation.

The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).

A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3

Composite DNA

Recently, to allow higher potential information capacity, [1,2] introduced the DNA composite synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create composite DNA symbols, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.

Problem description

ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences. Especially for composite DNA data storage, these error types received only little attention.

The most related work to this problem was recently studied by Zhang et al. in [6]. The authors initiated the study of error-correcting codes for DNA composite. They considered an error model for composite symbols, which assumes that errors occur in at most t symbols, and their magnitude is limited by l. They presented several code constructions as well as bounds for this model. In this thesis, we want to analyse a different way to model the composite synthesis method and studies additional error models. We already have some results for substitution and single deletion errors. This thesis should focus on evaluating more error models in the channel model.

This should only roughly introduce the problem. No need to review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.

References

[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat Biotechnol, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.

[2] Y. Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” Sci Rep, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.

[3] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.

[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.

[5] I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-Deletion Single-Substitution Correcting Codes,” IEEE Transactions on Information Theory, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3319088.

[6] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.

Voraussetzungen

- Channel Coding

Betreuer:

Frederik Walter

Theses in Progress

Download Arbeit als PDF

Extraction Algorithm for CRISPR-based DNA Authentication

Beschreibung

Developing a Robust Extraction Algorithm for CRISPR-Cas Authentication

This project bridges molecular biology and cryptography. We have developed a novel authentication method using CRISPR-Cas to create unique DNA "molecular fingerprints" on a gel. However, this analog output is noisy and not directly usable for secure digital applications.

Your goal is to transform this system into a robust Chemical Function System (CFS) by developing a digital extraction algorithm.

Your tasks will involve:

Developing an image processing pipeline to analyze the noisy gel electrophoresis patterns.
Designing and implementing an algorithm, based on fuzzy extractor principles, to convert the noisy data into a stable, reproducible digital signature.
Evaluating the robustness of your algorithm using real experimental data to ensure high reliability.

Outcome: A functional software prototype that turns a cutting-edge molecular authentication technique into a complete, cryptographically secure system. This project is ideal for students interested in bioinformatics, data science, and applied cryptography.

Betreuer:

Frederik Walter

Download Arbeit als PDF

Coding for Composite DNA

Beschreibung

Modern information-based society relies on trusted information storage, from long-term digital archiving to embedding information into products. DNA is a promising future storage medium as it offers sustainable and robust long-term information storage at an extraordinary information density. One approach to increase this density is through composite DNA [1], which uses mixtures of nucleotides at each position instead of a single A, C, G, or T, thereby increasing the "logical density" (bits per synthesis cycle). This concept has been extended to combinatorial composite DNA [2], where alphabet symbols are composed of sets of short DNA fragments, known as shortmers.

The primary challenge in combinatorial composite DNA is a unique error model that arises during the reading process. It's possible that some shortmers making up a symbol are missed during sequencing, causing a composite asymmetric error. This seminar will focus on the specialized coding techniques designed to address this. You will explore how Sabary et al. [2] model these errors and construct error-correcting codes to overcome them. Furthermore, we will analyze the trade-off between sequencing cost and reliability by studying the coverage depth problem [3]: determining the expected number of reads needed to successfully decode a symbol. The goal is to understand these novel challenges and the specific coding and optimization methods developed to handle them.

References

A 5-minute video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3

[1] L. Anavy et al., "Data storage in DNA with fewer synthesis cycles using composite DNA letters," Nature Biotechnology, vol. 37, no. 10, pp. 1229-1236, 2019.

[2] O. Sabary et al., "Error-Correcting Codes for Combinatorial Composite DNA," in 2024 IEEE International Symposium on Information Theory (ISIT), 2024.

[3] T. Cohen and E. Yaakobi, "Optimizing the Decoding Probability and Coverage Ratio of Composite DNA," IEEE Journal on Selected Areas in Information Theory, 2025 (accepted for publication).

Voraussetzungen

Channel Coding course completed
Good understanding of linear algebra, probability theory and combinatorics

Betreuer:

Frederik Walter

Download Arbeit als PDF

Marker Codes for Composite DNA

Beschreibung

DNA Data Storage

Data storage on DNA molecules is a promising approach for archiving massive data.

In classical DNA storage systems, binary information is encoded into sequences consisting of the four DNA bases {A, C, G, T}. The encoded sequences are used to generate DNA molecules called strands using the biochemical process of DNA synthesis. The synthesized strands are stored together in a tube. To retrieve the binary information, the strand must be read via DNA sequencing and decoded back into the binary representation.

The synthesis and the sequencing procedures are error-prone, and with the natural degradation of DNA they introduce errors to the DNA strands. To ensure data reliability, the errors have to be corrected by algorithms and error-correcting codes (ECCs).

A 5min video with an overview of DNA storage: https://youtu.be/r8qWc9X4f6k?si=Yzm5sOW-a6VDnBu3

Composite DNA

Recently, to allow higher potential information capacity, [1,2] introduced the DNA composite synthesis method. In this method, the multiple copies created by the standard DNA synthesis method are utilized to create composite DNA symbols, defined by a mixture of DNA bases and their ratios in a specific position of the strands. By defining different mixtures and ratios, the alphabet can be extended to have more than 4 symbols. More formally, a composite DNA symbol in a specific position can be abstracted as a quartet of probabilities {p_A, p_C, p_G, p_T}, in which p_X, 0 ≤ p_X ≤ 1, is the fraction of the base X in {A, C, G, T} in the mixture and p_A+p_C+ p_G+ p_T =1. Thus, to identify composite symbols it is required to sequence multiple reads and then to estimate p_A, p_C, p_G, p_T in each position.

Problem description

ECCs for DNA data storage differ in many aspects from classical error correction codes. In this model, new error type gain relevance, like deletions and insertions which affect the synchronization of the sequences. Especially for composite DNA data storage, these error types received only little attention.

The most related work to this problem was recently studied by Zhang et al. in [6]. The authors initiated the study of error-correcting codes for DNA composite. They considered an error model for composite symbols, which assumes that errors occur in at most t symbols, and their magnitude is limited by l. They presented several code constructions as well as bounds for this model. In this thesis, we want to analyse how to use marker codes to protect against deletion errors in this setting. We aim to define the trade off between embracing the errors in this probabilistic setting and using coding strategies.

This should only roughly introduce the problem. No need to review all references. If you are interested, please reach out to me, and we can discuss a suitable direction for you.

References

[1] L. Anavy, I. Vaknin, O. Atar, R. Amit, and Z. Yakhini, “Data storage in DNA with fewer synthesis cycles using composite DNA letters,” Nat Biotechnol, vol. 37, no. 10, pp. 1229–1236, Oct. 2019, doi: 10.1038/s41587-019-0240-x.

[2] Y. Choi et al., “High information capacity DNA-based data storage with augmented encoding characters using degenerate bases,” Sci Rep, vol. 9, no. 1, Art. no. 1, Apr. 2019, doi: 10.1038/s41598-019-43105-w.

[3] V. Guruswami and J. Håstad, “Explicit Two-Deletion Codes With Redundancy Matching the Existential Bound,” IEEE Transactions on Information Theory, vol. 67, no. 10, pp. 6384–6394, Oct. 2021, doi: 10.1109/TIT.2021.3069446.

[4] J. Sima, N. Raviv, and J. Bruck, “Two Deletion Correcting Codes From Indicator Vectors,” IEEE Trans. Inform. Theory, vol. 66, no. 4, pp. 2375–2391, Apr. 2020, doi: 10.1109/TIT.2019.2950290.

[5] I. Smagloy, L. Welter, A. Wachter-Zeh, and E. Yaakobi, “Single-Deletion Single-Substitution Correcting Codes,” IEEE Transactions on Information Theory, pp. 1–1, 2023, doi: 10.1109/TIT.2023.3319088.

[6] W. Zhang, Z. Chen, and Z. Wang, “Limited-Magnitude Error Correction for Probability Vectors in DNA Storage,” in ICC 2022 - IEEE International Conference on Communications, Seoul, Korea, Republic of: IEEE, May 2022, pp. 3460–3465. doi: 10.1109/ICC45855.2022.9838471.

Voraussetzungen

- Channel Coding

Betreuer:

Frederik Walter

Publications

Walter, Frederik: In-Product Authentication. 2025 6G-life and 6G-RIC Workshop on Post-Shannon Theory and Molecular Communication, 2025 mehr…
Walter, Frederik: Properties of Chemical Functions. TUM ICE Coding and Hiking, 2025 mehr…
Walter, Frederik; Yehezkeally, Yonatan: Coding for Strand Breaks in Composite DNA. 2025 IEEE International Symposium on Information Theory (ISIT), IEEE, 2025, 1-6 mehr… Volltext ( DOI )
Walter, Frederik; Lüscher, Anne; Groen, Jasper; Banerjee, Anisha; Bariffi, Jessica; Wachter-Zeh, Antonia; Yaakobi, Eitan; Somoza, Mark; Grass, Robert; Yakhini, Zohar: State of the art and commercial needs for authentication and in-product documentation. 2024 mehr…
Walter, Frederik; Sabary, Omer; Wachter-Zeh, Antonia; Yaakobi, Eitan: Coding for Composite DNA to Correct Substitutions, Strand Losses and Deletions. Summer Doctoral Seminar 2024, 2024 mehr…
Walter, Frederik; Banerjee, Anisha; Wachter-Zeh, Antonia; Das, Arya; Lüscher, Anne; Bar-Lev, Daniella; Yaakobi, Eitan; Granito, Francesca; Sabzalipoor, Hamed; Bariffi, Jessica; Somoza, Mark; Sabary, Omer; Grass, Robert; Istvánffy, Sharon; Yehezkeally, Yonatan; Yakhini, Zohar: Towards a Cryptographic Framework for DNA Data Storage. Coding Theory and Algorithms for DNA-based Data Storage - ISIT2024 Satellite Workshop, 2024 mehr…

M.Sc. Frederik Walter

Postadresse

Biography

Research Interests

Teaching

Teaching Assistant at TUM

Teaching Assistant an der Universität Ulm

Available Theses

FP: LDPC Codes for Deletion Correction

LDPC Codes for Deletion Correction

Beschreibung

Voraussetzungen

Betreuer:

MA, FP: Coding for Composite DNA

Coding for Composite DNA

Beschreibung

Voraussetzungen

Betreuer:

Theses in Progress

BA: Extraction Algorithm for CRISPR-based DNA Authentication

Extraction Algorithm for CRISPR-based DNA Authentication

Beschreibung

Developing a Robust Extraction Algorithm for CRISPR-Cas Authentication

Betreuer:

SEM: Coding for Composite DNA

Coding for Composite DNA

Beschreibung

Voraussetzungen

Betreuer:

FP: Marker Codes for Composite DNA

Marker Codes for Composite DNA

Beschreibung

Voraussetzungen

Betreuer:

Publications