Wissenschaftliches Seminar VLSI-Entwurfsverfahren

Vortragende/r (Mitwirkende/r)	Conrad Foik Philipp Fengler Ulf Schlichtmann
Nummer	0820073263
Art	Seminar
Umfang	3 SWS
Semester	Sommersemester 2025
Unterrichtssprache	Deutsch
Stellung in Studienplänen	Siehe TUMonline

Termine

28.04.2025 16:45-18:15 2999, Seminarraum
19.05.2025 16:45-18:15 2999, Seminarraum
26.05.2025 16:45-18:15 N1005ZG, Seminarraum
02.06.2025 16:45-18:15 2999, Seminarraum
18.07.2025 08:00-17:00 4905, Mehrzweck-U-Raum

Teilnahmekriterien

Anmerkung: Die Studierenden wählen VOR der Einführungsveranstaltung ein Thema aus. Dazu setzen sie sich mit dem entsprechenden Betreuer in Verbindung. Themen werden nach dem Prinzip "first come, first serve" verteilt. Erst wenn der Betreuer das gewählte Thema bestätigt hat, gilt der/die Studierende als registriert. Eine Liste von Themen ist unter folgendem Link zu finden: https://www.ce.cit.tum.de/eda/lehrveranstaltungen/seminare/wissenschaftliches-seminar-vlsi-entwurfsverfahren/

Lernziele

Nach erfolgreichem Abschluss des Seminares sind die Studierenden in der Lage, eine neue Idee oder einen bestehenden Ansatz auf dem Gebiet des rechnergestützten Schaltungs- und Systementwurfs in verständlicher und überzeugender Weise zu präsentieren.
Zu diesem Zwecke werden im Einzelnen folgende Fähigkeiten erworben:
• Die teilnehmende Person kann sich selbstständig ein wissenschaftliches Thema aus dem Bereich des rechnergestützten Schaltungs- und Systementwurfs aneignen.
• Die teilnehmende Person ist fähig, ein Thema strukturiert nach Problemstellung, Stand der Technik, Ziele, Methoden und Ergebnisse darzustellen.
• Die teilnehmende Person ist in der Lage, ein Thema in der genannten Strukturierung mündlich zu präsentieren, in einem Foliensatz zu visualisieren, und in einem wissenschaftlichen Bericht schriftlich darzustellen.
• Die teilnehmende Person ist mit den Grundlagen einer konstruktiven Begutachtung vertraut und kann diese auf eine fremde Arbeit anwenden.

Beschreibung

Spezifische Seminarthemen aus dem Bereich der Entwurfsautomatisierung für elektronische Schaltungen und Systeme werden angeboten. Beispiele sind Analogentwurfsmethodik, Entwurfsmethodik für digitale Schaltungen, Layoutsynthese, und Entwurfsmethodik auf der Systemebene.
Teilnehmende arbeiten eigenständig auf einem wissenschaftlichen Thema, schreiben ein Paper von 4 Seiten. Außerdem fertigen die Teilnehmenden ein Gutachten über die schriftliche Ausarbeitung anderer Teilnehmender in einem Peer-Review Verfahren an. Abschließend präsentieren die Teilnehmenden ihr Thema in einem Vortrag. In einer anschließenden Diskussion wird ihr Thema detailliert behandelt.

Inhaltliche Voraussetzungen

Keine spezifischen Voraussetzungen.

Lehr- und Lernmethoden

Lernmethode:
Die Studierenden arbeiten eigenständig und unter Beratung durch einen wissenschaftlichen Assistenten ein wissenschaftliches Thema aus.
Lehrmethode:
In Einführungsveranstaltungen werden den Teilnehmenden Hinweise zur fachlichen Arbeit, schriftlichen Ausarbeitung sowie zur Erstellung der Präsentation und zum mündlichen Vortrag gegeben. Während eines zusätzlichen interaktiven Präsentationtrainings können Techniken für einen gelungenen Vortrag von den Studierenden erlernt und geprobt werden.
Weitere Details werden zwischen Studierenden und wissenschaftlichen Assistenten auf individueller Basis diskutiert.

Alle geläufigen Techniken zur Vorbereitung und Präsentation von Papern und Vorträgen werden angewendet, z. B.:
- Klassische Tafel, Weißwandtafel
- Elektronische Folien, Beamer
- Elektronische Textverarbeitung
- Elektronische Folienbearbeitung

Studien-, Prüfungsleistung

Die Prüfung wird in Form einer wissenschaftlichen Ausarbeitung vorgenommen. Sie besteht zum einen aus einem schriftlichen Teil (50%), welcher sich aus einem Paper (4 Seiten) und einem Gutachten (ca. 2000-3000 Zeichen), das im Rahmen einer Peer-Review erarbeitet wird, zusammensetzt. Zum anderen besteht sie aus einem mündlichen Teil (50%) in Form einer ca. 30-minütigen Präsentation (inklusive nachfolgender Diskussion). Mit der wissenschaftlichen Ausarbeitung weisen die Studierenden nach, dass sie z. B. den wissenschaftlichen Stand der Technik, eine neue Idee oder einen bestehenden Ansatz auf dem Gebiet des rechnergestützten Schaltungs- und Systementwurfs für ein Fachpublikum aufbereiten, strukturiert darstellen und präsentieren können.

Empfohlene Literatur

Ein Satz an Themen und zugehöriger Literatur wird am Anfang des Kurses bereitgestellt. Die Studierenden wählen ihr Thema selbst aus.

Themenwahl - offen

Die Themenliste für das Sommersemester 25 finden Sie unten.

Themen werden im FCFS Verfahren vergegeben. Bitte kontaktieren Sie dann direkt den Betreuer per E-Mail. Bitte versichern Sie sich, dass Sie eine Bestätigung Ihres Betreues erhalten, wenn Sie sich für ein Thema entschieden haben.

Seminare

Download Arbeit als PDF

From High-Level to Low-Level: Exploring Intermediate Representations in ML Compiler

Beschreibung

Intermediate Representations (IRs) play a crucial role in machine learning compilers, enabling optimization and code generation for various hardware platforms. Machine learning compilers translate an ML model into multi-level IRs in upper and lower layers. The upper layer is focused on hardware-independent but framework-related transformations and optimizations, while the lower layer is responsible for hardware-related optimizations, code generation, and compilation. To support cross-device and cross-framework optimization and code generation, the lower layer may not optimize hardware-specific but still hardware-related low-level IRs, and the upper layer may not be limited to a specific framework but still focus on framework-related high-level IRs.High-level IRs, in particular, provide an abstract representation of machine learning models that can be transformed and optimized before being compiled into executable code. Unlike low-level compiler IRs, which are closer to the hardware, high-level IRs are hardware-agnostic and provide a much-needed abstraction required for common transformations like kernel specialization, layout transformations, operator fusion, shape inference, and quantization.

Task:

The seminar should cover the following topics:

Benefits and challenges of using IRs in machine learning compilers and how they impact the development of machine learning models and applications.
Elaborate on how the abstractions at each level enable the transformation of the machine learning model into executable code and how they impact the performance and efficiency of the compiled code.
What are high-level IRs, and how do they differ from low-level IRs?
Examples of how multi-level IRs are used in machine learning compilers

References:

1. Li, Mingzhen & Liu, Yi & Liu et al., The Deep Learning Compiler: A Comprehensive Survey. IEEE Transactions on Parallel and Distributed Systems. 32. 708-727. 10.1109/TPDS.2020.3030548.

2. Chris Lattner, et al., - In proceedings of 2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) - pp. 2-14

3. Jared Roesch et al., Relay: a new IR for machine learning frameworks. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages (MAPL 2018). Association for Computing Machinery, New York, NY, USA, 58–68. https://doi.org/10.1145/3211346.3211348

Kontakt

mayuri.bhadra@infineon.com ; Wolfgang.ecker@tum.de

Betreuer:

Conrad Foik - Mayuri Bhadra (Infineon Technologies )

Download Arbeit als PDF

Exploring Sparse Tensor Computations in Different ML Frameworks and Compilers

Beschreibung

Sparse tensor computations are a crucial aspect of many machine learning and data analytics applications, enabling the efficient processing of large datasets with sparse structures. However, developing and maintaining sparse software by hand is a complex and error-prone task, requiring a deep understanding of the underlying algorithms and data structures. The complexity is further exacerbated by the numerous possible combinations of storage formats, nonzero structures, operations, and target architectures, making it challenging to write effective sparse code. As a result, programmers often resort to hand-optimizing a small set of library methods for specific operations and storage formats and then building larger sparse programs by composing these available library methods. This seminar will delve into the various approaches to sparse tensor computations in different machine learning frameworks and compilers, examining the current state-of-the-art in sparse tensor computation libraries and compilers and investigating the challenges and opportunities of exploiting sparsity in tensor computations.

Task:

Research and present on the current state-of-the-art in sparse tensor computation libraries and compilers.
Investigate the challenges and opportunities of exploiting sparsity in tensor computations
Analyze the performance benefits and limitations of different sparse tensor computation approaches

References:

1. Fredrik Kjolstad. 2020. Sparse Tensor Algebra Compilation. Ph.D. Dissertation. Massachusetts Institute of Technology, Cambridge, MA.

2. Maxim Naumov, L. Chien, Philippe Vandermersch, and Ujval Kapasi. 2010. Cusparse library. In GPU Technology Conference

3. Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A high-performance sparse tensor Algebra compiler in MLIR. In 2021 IEEE/ACM 7th Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC’21), 27–38.

Kontakt

mayuri.bhadra@infineon.com ; Wolfgang.ecker@tum.de

Betreuer:

Conrad Foik - Mayuri Bhadra (Infineon Technologies )

Download Arbeit als PDF

Redistribution Layer Routing Algorithms

Beschreibung

The Redistribution Layer (RDL) is a critical component of advanced packaging technology in integrated circuits (ICs). It is an additional metal interconnect layer located between the chip and the package, allowing signals to be redistributed from their original pad locations to new positions to accommodate different packaging formats. Redistribution layers are commonly used for signal transmission among chips, and vias are used for communication among different layers. In this seminar topic, a survey on the RDL routing algorithms from the initial single-layer structure to the more advanced multi-layer structure should be made.

Kontakt

jiahuipeng@tum.de

Betreuer:

Jiahui Peng

Download Arbeit als PDF

Exploring RTL Coding Impacts and PPA Correlations Across ASIC and FPGA Targets

Beschreibung

Abstract:
The increasing reliance on generated RTL (Register-Transfer Level) designs, crafted in languages like SystemVerilog and VHDL, has streamlined development for both ASIC and FPGA targets. However, the physical design automation (PDA) process—spanning synthesis, placement, and routing—must adapt to ensure these designs meet power, performance, and area (PPA) goals across both platforms. This seminar proposes a research investigation into how RTL coding styles in SystemVerilog and VHDL influence physical design outcomes, and whether meaningful PPA correlations exist between FPGA and ASIC implementations of the same RTL. By analyzing automation workflows and leveraging experimental data, the study aims to uncover actionable insights for optimizing generated RTL designs, enhancing their portability and efficiency across target technologies.

Research Motivation:
Generated RTL, often produced via high-level synthesis or IP tools, is widely used for both ASICs and FPGAs, yet the impact of coding practices on physical design remains poorly understood. Differences in FPGA and ASIC toolchains and architectures suggest potential PPA discrepancies, raising questions about design portability and optimization strategies. This research tackles these gaps, offering a student the chance to explore a practical, industry-relevant problem at the intersection of RTL coding and physical design automation.
Research Objectives:
RTL Coding Analysis: Investigate how SystemVerilog and VHDL coding styles (e.g., modularity, hierarchy, constraints) affect synthesis and physical design outcomes for generated RTL.
PDA Workflow Evaluation: Assess the effectiveness of automation tools in translating RTL to physical layouts for ASIC and FPGA targets.
PPA Comparison: Measure and compare power, performance, and area metrics between FPGA and ASIC implementations of identical RTL designs.
Correlation Study: Identify if consistent PPA relationships exist across platforms and propose guidelines for RTL optimization based on findings.

References:
•
RTL Architect: Physically-Aware RTL Analysis | Synopsys. (n.d.). https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/rtl-architect.html
•
X. Yao, Y. Wang, X. Li, Y. Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, RTLRewriter: Methodologies for Large Models aided RTL Code Optimization, 2024. arXiv: 2409.11414 [cs.AR]. [Online]. Available: https://arxiv.org/abs/2409.11414.
•
S. Y. Neyaz, I. Saxena, N. Alam and S. A. Rahman, "FPGA and ASIC Implementation and Comparison of Multipliers," 2020 International Symposium on Devices, Circuits and Systems (ISDCS), Howrah, India, 2020, pp. 1-4, doi: 10.1109/ISDCS49393.2020.9263027.
•
Heyden, Malin. (2023). High Level Synthesis for ASIC and FPGA.

Kontakt

mohamed.badawy@infineon.com

Betreuer:

Conrad Foik - (Infineon Technologies )

Download Arbeit als PDF

Equality Saturation for Tensor Graph Optimization

Stichworte:
Equality Saturation, Graph Optimization, Intermediate Representation

Beschreibung

For deep neural networks have a low latency and high accuracy are of big importance during inference. To achieve that ML compilers can apply certain transformations on the networks graph. Traditionally, such transformations are applied sequentially, which introduces the phase-ordering problem, in which certain transformations may yield a better result if they are applied in a later stage. Equality saturation aims at tackling this issue by first creating an Intermediate Representation of the network and storing different optimized versions in its first phase. In the second phase, it chooses the best solution. For this topic, the student will familiarize themselves wiith Equality Saturation and different techniques that can lead to optimized Neural Network compilation.

Voraussetzungen

- Interest in ML Compilers

- Good understanding of ML architectures

- Very good math skills

Kontakt

Leonidas Kontopoulos, M.Sc.

leonidas.kontopoulos@tum.de

Betreuer:

Leonidas Kontopoulos

Download Arbeit als PDF

Modeling and Simulation of Silicon Photonics Systems in SystemVerilog/XMODEL

Beschreibung

Silicon photonics integrates both photonic and electronic components on the same silicon chip and promises ultra-dense, high-bandwidth interconnects via wavelength division multiplexing (WDM). However, when verifying such silicon photonic systems, the existing IC simulators face challenges due to the WDM signals containing multiple frequency tones at ~200-THz with ~50-GHz spacing. In this seminar, the student will investigate the modeling approach for the silicon photonic elements and devices as equivalent multi-port transmission lines using XMODEL primitives and simulating the WDM link models in an efficient, event-driven fashion in SystemVerilog.

Kontakt

liaoyuan.cheng@tum.de

Betreuer:

Liaoyuan Cheng

Download Arbeit als PDF

SPICE-Compatible Modeling and Design for Electronic-Photonic Integrated Circuits

Beschreibung

Electronic-photonic integrated circuit (EPIC) technologies are revolutionizing computing systems by improving their performance and energy efficiency. However, simulating EPIC is challenging and time-consuming. In this seminar, the student will investigate the modeling method for EPIC.

Kontakt

liaoyuan.cheng@tum.de

Betreuer:

Liaoyuan Cheng

Download Arbeit als PDF

Dynamic Neural Networks

Beschreibung

Deep Neural Networks (DNNs) have shown high predictive performance on various tasks. However, the large compute requirements of DNNs restrict their potential deployment on embedded devices with limited resources.

Dynamic Neural Networks (DyNNs) are a class of neural networks that can adapt their structure, parameters, or computation graph based on input data. Unlike the conventional DNNs, which have a fixed architecture once trained, DyNNs offer greater efficiency and adaptability. In particular, DyNNs can reduce latency, memory usage, and energy consumption during inference by activating only the necessary subset of its structure based on the difficulties of the input data.

The most recent survey paper in [1] provides an overview of DyNNs methods until the year 2021. This seminar topic covers a literature research on the more recent DyNNs methods with the focus on DyNNs for computer vision tasks (cf. Section 2 & 3 in [1]) and their training methodologies (cf. Section 5 in [1]). You are expected to find 3-4 more recent papers on this topic, and review and compare their methods including their advantages and drawbacks.

[1] Han, Yizeng, et al. "Dynamic neural networks: A survey." IEEE transactions on pattern analysis and machine intelligence 44.11 (2021): 7436-7456.

Kontakt

mikhael.djajapermana@tum.de

Betreuer:

Mikhael Djajapermana

Download Arbeit als PDF

The Dark Art of Embedded Unsafe Code: Arguments in Favor of Breaking Rusts Safety Requirements

Beschreibung

The development of efficient and reliable code for embedded systems necessitates a delicate balance between speed and safety. Programming languages like Rust have been designed to facilitate the achievement of these objectives, with a particular emphasis on memory safety. However, certain interactions with the underlying system may require the use of unsafe code, which can compromise the integrity of the system.

Research has shown that the judicious use of unsafe code blocks is crucial in writing high-quality code [1, 2]. A comprehensive understanding of unsafe Rust patterns can therefore enable developers to properly handled these unsafe segments, thereby mitigating the associated risks. The Rustonomicon [3] serves as a testament to the complexity of unsafe Rust programming, highlighting the need for a nuanced approach to dealing with such code.

This seminar aims to investigate the current perceptions surrounding the use of unsafe code in embedded systems. Through a critical analysis of existing literature and case studies, participants will explore the scenarios in which unsafe code is unavoidable and those in which it is employed as a workaround for performance limitations. By examining the intricacies of unsafe Rust programming, students will gain a deeper understanding of the Rust language and the challenges associated with ensuring code safety and performance in embedded systems.

Bibliography

[1] ASTRAUSKAS, Vytautas, et al. How do programmers use unsafe rust?. Proceedings of the ACM on Programming Languages, 2020, 4. Jg., Nr. OOPSLA, S. 1-27. https://dl.acm.org/doi/abs/10.1145/3428204

[2] ZHANG, Yuchen, et al. On the dual nature of necessity in use of Rust unsafe code. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023. S. 2032-2037.ZHANG, Yuchen, et al. On the dual nature of necessity in use of Rust unsafe code. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023. S. 2032-2037. https://dl.acm.org/doi/pdf/10.1145/3611643.3613878

[3] The Rustonomicon, https://doc.rust-lang.org/nomicon/

Kontakt

Raphael.Kunz@infineon.com, Wolfgang.Ecker@tum.de

Betreuer:

Conrad Foik - (Infineon Technologies )

Download Arbeit als PDF

Safe, Safer, Rust? Safe Cross Compilation of Embedded C code to Rust

Beschreibung

Embedded software development poses significant challenges in software engineering, particularly in ensuring the reliability and safety of complex systems. To address these challenges, various tooling solutions have been developed to facilitate the creation of code that meets stringent requirements. A key concern in this context is memory safety, as memory access violations can lead to unpredictable and undesirable behavior.

Since quite a while, embedded code was written in C, which means that a lot of legacy code exists in C but not in Rust. While one can assume this C code by now to be memory safe, due to continuous efforts over the years, new code might introduce new issues into legacy software. For this purpose, the tendency shifted towards coding in Rust to achieve memory safe code. The Rust programming language and its ecosystem have been designed to mitigate this challenge by enforcing strict memory safety guarantees. To achieve this, Rust imposes limitations on its feature set and requires developers to explicitly define and manage concepts such as lifetime and ownership.

However, the transpiling of existing legacy C to Rust code poses significant challenges due to the differences in language design and safety guarantees. C code often employs pointer manipulation and direct memory access without additional checks, which can lead to safety violations. To overcome these challenges, it is essential to understand the limitations and mitigation strategies required for transcompiling C code to Rust.

This seminar aims to conduct a comprehensive literature review to identify and evaluate existing research on safe subsets of common language features in Rust and C. Students will analyze and extract the common safe features of both languages and investigate techniques for transforming potentially unsafe code into safe code. By exploring the intersection of Rust and C, this seminar seeks to contribute to the development of more reliable and secure embedded software systems.

Bibliography

[1] LING, Michael, et al. In Rust we trust: a transpiler from unsafe C to safer Rust. In: Proceedings of the ACM/IEEE 44th international conference on software engineering: companion proceedings. 2022. S. 354-355. https://dl.acm.org/doi/pdf/10.1145/3510454.3528640

[2] FROMHERZ, Aymeric; PROTZENKO, Jonathan. Compiling C to Safe Rust, Formalized. arXiv preprint arXiv:2412.15042, 2024. https://arxiv.org/pdf/2412.15042

[3] ZHANG, Hanliang, et al. Ownership guided C to Rust translation. In: International Conference on Computer Aided Verification. Cham: Springer Nature Switzerland, 2023. S. 459-482. https://komaec.github.io/files/ownership.pdf

Kontakt

Raphael.Kunz@infineon.com, Wolfgang.Ecker@tum.de

Betreuer:

Conrad Foik - (Infineon Technologies )

Download Arbeit als PDF

Placement of Systolic Arrays for Neural Network Accelerators

Beschreibung

Systolic arrays are a proven architecture for parallel processing across various applications, offering design flexibility, scalability, and high efficiency. With the growing importance of neural networks in many areas, there is a need for efficient processing of the underlying computations, such as matrix multiplications and convolutions. These computations can be executed with a high degree of parallelism on neural network accelerators utilizing systolic arrays.

Just as any application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) design, neural network accelerators go through the standard phases of chip design, however, treating systolic array hardware designs the same way as any other design may lead to suboptimal results, as utilizing the regular structure of systolic arrays can lead to better solution quality[1].

Relevant works for this seminar topic include the work of Fang et al. [2], where a regular placement is used as an initial solution and then iteratively improved using the RePlAce[3] placement algorithm. The placement of systolic arrays on FPGAs is discussed by Hu et al., where the processing elements of the systolic array are placed on the DSP columns in a manner that is more efficient than the default placement of commercial placement tools[4].

In this seminar, you will investigate different macro and cell placement approaches, focusing on methods that specifically consider systolic array placement. If you have questions regarding this topic, please feel free to contact me.

[1] S. I. Ward et al., "Structure-Aware Placement Techniques for Designs With Datapaths," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 2, pp. 228-241, Feb. 2013, doi: https://doi.org/10.1109/TCAD.2012.2233862

[2] D. Fang, B. Zhang, H. Hu, W. Li, B. Yuan and J. Hu, "Global Placement Exploiting Soft 2D Regularity". in ACM Transactions on Design Automation of Electronic Systems, vol. 30, no. 2, pp. 1-21, Jan. 2025, doi: https://doi.org/10.1145/3705729

[3] C. -K. Cheng, A. B. Kahng, I. Kang and L. Wang, "RePlAce: Advancing Solution Quality and Routability Validation in Global Placement," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 9, pp. 1717-1730, Sept. 2019, doi: https://doi.org/10.1109/TCAD.2018.2859220

[4] H. Hu, D. Fang, W. Li, B. Yuan and J. Hu, "Systolic Array Placement on FPGAs," 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA, 2023, pp. 1-9, doi: https://doi.org/10.1109/ICCAD57390.2023.10323742

Kontakt

benedikt.schaible@tum.de

Betreuer:

Benedikt Schaible

Download Arbeit als PDF

Thermal-aware Optical-electrical Routing Codesign for On-chip Signal Communications

Beschreibung

Abstract - The optical interconnection is a promising solution for on-chip signal communication in modern system-on-chip (SoC) and heterogeneous integration designs, providing large bandwidth and high-speed transmission with low power consumption. Previous works do not handle two main issues for on-chip optical-electrical (O-E) co-design: the thermal impact during O-E routing and the trade-offs among power consumption, wirelength, and congestion. As a result, the thermal-induced band shift might incur transmission malfunction; the power consumption estimation is inaccurate; thus, only suboptimal results are obtained. To remedy these disadvantages, we present a thermal-aware optical-electrical routing co-design flow to minimize power consumption, thermal impact, and wirelength. Experimental results based on the ISPD 2019 contest benchmarks show that our co-design flow significantly outperforms state-of-the-art works in power consumption, thermal impact, and wirelength.

Kontakt

alex.truppel@tum.de

Betreuer:

Alexandre Truppel

Download Arbeit als PDF

Lithium tantalate photonic integrated circuits for volume manufacturing

Beschreibung

Electro-optical photonic integrated circuits (PICs) based on lithium niobate (LiNbO3) have demonstrated the vast capabilities of materials with a high Pockels coefficient1,2. They enable linear and high-speed modulators operating at complementary metal–oxide–semiconductor voltage levels3 to be used in applications including data-centre communications4, high-performance computing and photonic accelerators for AI5. However, industrial use of this technology is hindered by the high cost per wafer and the limited wafer size. The high cost results from the lack of existing high-volume applications in other domains of the sort that accelerated the adoption of silicon-on-insulator (SOI) photonics, which was driven by vast investment in microelectronics. Here we report low-loss PICs made of lithium tantalate (LiTaO3), a material that has already been adopted commercially for 5G radiofrequency filters6 and therefore enables scalable manufacturing at low cost, and it has equal, and in some cases superior, properties to LiNbO3. We show that LiTaO3 can be etched to create low-loss (5.6 dB m−1) PICs using a deep ultraviolet (DUV) stepper-based manufacturing process7. We demonstrate a LiTaO3 Mach–Zehnder modulator (MZM) with a half-wave voltage–length product of 1.9 V cm and an electro-optic bandwidth of up to 40 GHz. In comparison with LiNbO3, LiTaO3 exhibits a much lower birefringence, enabling high-density circuits and broadband operation over all telecommunication bands. Moreover, the platform supports the generation of soliton microcombs. Our work paves the way for the scalable manufacture of low-cost and large-volume next-generation electro-optical PICs.

Kontakt

zhidan.zheng@tum.de

Betreuer:

Zhidan Zheng

Download Arbeit als PDF

On Memory Optimization of Tensor Programs

Kurzbeschreibung:
In this seminar the student will review state-of-the art memory-aware optimization techniques applied to tensor-level AI programs.

Beschreibung

Compact electronic edge devices have limited memory resources. As AI models can require large amounts of memory, running AI models on edge devices becomes challenging. Thus, optimizing AI programs that can be deployed on edge devices is necessary while saving costly memory transfers.

This need has motivated current works exploring different memory-aware optimization techniques that reduce memory utilization but do not modify the DNN parameters (as during compression or network architecture search (NAS)), such as fused tiling, memory-aware scheduling, and memory layout planning [1]. For instance, DORY (Deployment Oriented to memoRY) is an automated tool designed for deploying deep neural networks (DNNs) on low-cost microcontroller units with less than 1MB of on-chip SRAM memory. It tackles the challenge of tiling by framing it as a Constraint Programming (CP) problem, aiming to maximize the utilization of L1 memory while adhering to the topological constraints of each DNN layer. DORY then generates ANSI C code to manage the transfers between off-chip and on-chip memory and the computation phases [2]. DORY has been integrated with TVM to ease the support for heterogeneous compilation and offloading operations not supported by the accelerator to a regular host CPU [3].

This seminar topic reviews state-of-the-art approaches for memory-aware optimization techniques of ML tensor programs targeting constrained edge devices. The different methods and results shall be reviewed and compared.

References:

[1] Rafael Christopher Stahl.Code Optimization and Generation of Machine Learning and Driver Software for Memory-Constrained Edge Devices. 2024. Technical University of Munich, PhD Thesis. URL: https://mediatum.ub.tum.de/doc/1730282/1730282.pdf

[2] A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi and F. Conti, "DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs," in IEEE Transactions on Computers, vol. 70, no. 8, pp. 1253-1268, 2021, https://doi.org/10.1109/TC.2021.3066883

[3] Van Delm, Josse, et al. "HTVM: Efficient neural network deployment on heterogeneous TinyML platforms." 2023 60th ACM/IEEE Design Automation Conference (DAC). IEEE, 2023. https://doi.org/10.1109/DAC56929.2023.10247664

Kontakt

Andrew.stevens@infineon.com

Daniela.sanchezlopera@infineon.com

Betreuer:

Daniela Sanchez Lopera - Andrew Stevens (Infineon Technologies )

Download Arbeit als PDF

On-device learning

Kurzbeschreibung:
In this seminar the student will review state-of-the art contributions to the on-device learning research area.

Beschreibung

TinyML is a research area aiming to bring machine learning models to resource-constrained IoT devices and microcontrollers. Current research mainly focuses on enabling inference on such devices, tackling challenges such as limited memory and computation resources available. But for specific sensing and IoT applications, on-device learning would allow retraining and refining ML models directly on small and low-power devices. However, on-device learning on edge devices is much more challenging than inference due to larger memory footprints and increased computing operations to store intermediate activations and gradients [1].

To tackle those challenges, different strategies involving, among others, quantization, sparse backpropagation, or new layer types have been proposed and summarized [1, 2]. This seminar will review state-of-the-art approaches for on-device learning techniques targeting constrained edge devices. The different methods and results shall be reviewed and compared.

References:

[1] J. Lin, L. Zhu, W. -M. Chen, W. -C. Wang and S. Han, "Tiny Machine Learning: Progress and Futures [Feature]," in IEEE Circuits and Systems Magazine, vol. 23, no. 3, pp. 8-34, 2023, https://doi.org/10.1109/MCAS.2023.3302182

[2] Shuai Zhu, Thiemo Voigt, Fatemeh Rahimian, and JeongGil Ko. 2024. On-device Training: A First Overview on Existing Systems. ACM Trans. Sen. Netw. Just Accepted (September 2024). https://doi.org/10.1145/3696003

Kontakt

Andrew.stevens@infineon.com

Daniela.sanchezlopera@infineon.com

Betreuer:

Daniela Sanchez Lopera - Andrew Stevens (Infineon Technologies )

Download Arbeit als PDF

Innovative Memory Architectures in DNN Accelerators

Beschreibung

With the growing complexity of neural networks, more efficient and faster processing solutions are vital to enable the widespread use of artificial intelligence. Systolic arrays are among the most popular architectures for energy-efficient and high-throughput DNN hardware accelerators.

While many works implement DNN accelerators using systolic arrays on FPGAs, several (ASIC) designs from industry and academia have been presented [1-3]. To fulfill the requirements that such accelerators place on memory accesses, both in terms of data availability and latency hiding, innovative memory architectures can enable more efficient data access, reducing latency and bridging the gap towards even more powerful DNN accelerators.

One example is the Eyeriss v2 ASIC [1], which uses a distributed Global Buffer (GB) layout tailored to the demands of their row-stationary systolic array dataflow.

In this seminar, a survey of state-of-the-art DNN accelerator designs and design frameworks shall be created, focusing on their memory hierarchy.

References and Further Resources:

[1] Y. -H. Chen, T. -J. Yang, J. Emer and V. Sze. 2019 "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019, doi: https://doi.org/10.1109/JETCAS.2019.2910232

[2] Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. "DianNao family: energy-efficient hardware accelerators for machine learning." In Commun. ACM 59, 11 (November 2016), 105–112. https://doi.org/10.1145/2996864

[3] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, et al. 2017. "In-Datacenter Performance Analysis of a Tensor Processing Unit." In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3079856.3080246

[4] Rui Xu, Sheng Ma, Yang Guo, and Dongsheng Li. 2023. A Survey of Design and Optimization for Systolic Array-based DNN Accelerators. ACM Comput. Surv. 56, 1, Article 20 (January 2024), 37 pages. https://doi.org/10.1145/3604802

[5] Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, and Tiejun Li. 2024. "SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow." ACM Trans. Des. Autom. Electron. Syst. 29, 2, Article 26 (March 2024), 32 pages. https://doi.org/10.1145/3634703

Kontakt

benedikt.schaible@tum.de

Betreuer:

Benedikt Schaible

Download Arbeit als PDF

Post-processing Flow-Layer Routing with Length-Matching Constraint for Flow-Based Microfluidic Biochips

Beschreibung

Here's a consolidated project description based on your provided information:

This project addresses the challenges in the current process of synthesizing microfluidic chips, particularly focusing on the gap in the complete synthesis flow which can lead to reduced performance, resource wastage, or infeasible designs. The general synthesis process typically involves three stages: high-level synthesis, followed by the design of the flow layer, and finally, the design of the control layer.

Current state-of-the-art synthesis methods, primarily operating at the operation- and device-level, make assumptions regarding the availability of fluid transportation paths. They often overlook the physical layout of control and flow channels and neglect the flow rate. This oversight can lead to biased scheduling of fluid transportation time during synthesis.

Our project proposes an innovative approach to bridge this gap. By considering the known physical design of microfluidic chips and the desired experiments, represented as sequence graphs, we aim to improve the physical design. The approach involves adjusting the lengths of the channels according to the required fluid volume. This adjustment is expected to reduce the number of valves and control ports in the original physical design, thereby enhancing the efficiency and feasibility of microfluidic chip designs.

Kontakt

m.lian@tum.de

Betreuer:

Meng Lian

Download Arbeit als PDF

Reliability-Aware Design Flow for Silicon Photonics On-Chip Interconnect

Beschreibung

Intercore communication in many-core processors presently faces scalability issues similar to those that plagued intracity telecommunications in the 1960s. Optical communication promises to address these challenges now, as then, by providing low latency, high bandwidth, and low power communication. Silicon photonic devices presently are vulnerable to fabrication and temperature-induced variability. Our fabrication and measurement results indicate that such variations degrade interconnection performance and, in extreme cases, the interconnection may fail to function at all. In this paper, we propose a reliability-aware design flow to address variation-induced reliability issues. To mitigate effects of variations, limits of device design techniques are analyzed and requirements from architecture-level design are revealed. Based on this flow, a multilevel reliability management solution is proposed, which includes athermal coating at fabrication-level, voltage tuning at device-level, as well as channel hopping at architecture-level. Simulation results indicate that our solution can fully compensate variations thereby sustaining reliable on-chip optical communication with power efficiency.

Kontakt

zhidan.zheng@tum.de

Betreuer:

Zhidan Zheng

Download Arbeit als PDF

Percolation on complex networks: Theory and application

Beschreibung

In the last two decades, network science has blossomed and influenced various fields, such as statistical physics, computer science, biology and sociology, from the perspective of the heterogeneous interaction patterns of components composing the complex systems. As a paradigm for random and semi-random connectivity, percolation model plays a key role in the development of network science and its applications. On the one hand, the concepts and analytical methods, such as the emergence of the giant cluster, the finite-size scaling, and the mean-field method, which are intimately related to the percolation theory, are employed to quantify and solve some core problems of networks. On the other hand, the insights into the percolation theory also facilitate the understanding of networked systems, such as robustness, epidemic spreading, vital node identification, and community detection. Meanwhile, network science also brings some new issues to the percolation theory itself, such as percolation of strong heterogeneous systems, topological transition of networks beyond pairwise interactions, and emergence of a giant cluster with mutual connections. So far, the percolation theory has already percolated into the researches of structure analysis and dynamic modeling in network science. Understanding the percolation theory should help the study of many fields in network science, including the still opening questions in the frontiers of networks, such as networks beyond pairwise interactions, temporal networks, and network of networks. The intention of this paper is to offer an overview of these applications, as well as the basic theory of percolation transition on network systems.

Kontakt

m.lian@tum.de

Betreuer:

Meng Lian

Download Arbeit als PDF

Physically Aware Wavelength-Routed Optical NoC Design for Customized Topologies with Parallel Switching Elements and Sequence-Based Models

Beschreibung

Abstract - The wavelength-routed optical network-on-chip (WRONoC) is a promising solution for system-on-chip designs. Recent work in the WRONoC topology designs mainly utilizes crossing switching elements (CSEs) as switching mechanisms on predefined templates. However, using CSEs incurs more microring resonator (MRR) usage and waveguide crossings than parallel switching elements (PSEs), and their predefined templates constrain the solution spaces. To remedy these disadvantages, we propose a fully automated topology design flow that utilizes PSE structures to reduce MRR usage and waveguide crossings. Our add-drop filter sequence model expands the solution space and leverages the advantage of the crossing-free PSE structure. Our fixed-node crossing-aware edge routing effectively minimizes the waveguide crossings, and our A*-search preserves the admissibility property and guarantees an optimal routing solution. Besides, our design flow thoroughly considers the physical layout information. Experimental results show that our design substantially outperforms state-of-the-art works on customized designs.

Kontakt

alex.truppel@tum.de

Betreuer:

Alexandre Truppel

To top

Lehrstuhl für Entwurfs- automatisierung

Prof. Dr.-Ing. Ulf Schlichtmann

Technische Universität München
Arcisstr. 21
80333 München

Tel: +49.89.289.23666
Fax: +49.89.289.63666
office.eda(at)xcit.tum.de