Seminare
Redistribution Layer Routing Algorithms
Beschreibung
The Redistribution Layer (RDL) is a critical component of advanced packaging technology in integrated circuits (ICs). It is an additional metal interconnect layer located between the chip and the package, allowing signals to be redistributed from their original pad locations to new positions to accommodate different packaging formats. Redistribution layers are commonly used for signal transmission among chips, and vias are used for communication among different layers. In this seminar topic, a survey on the RDL routing algorithms from the initial single-layer structure to the more advanced multi-layer structure should be made.
Kontakt
jiahuipeng@tum.de
Betreuer:
Exploring RTL Coding Impacts and PPA Correlations Across ASIC and FPGA Targets
Beschreibung
Abstract:
The increasing reliance on generated RTL (Register-Transfer Level) designs, crafted in languages like SystemVerilog and VHDL, has streamlined development for both ASIC and FPGA targets. However, the physical design automation (PDA) process—spanning synthesis, placement, and routing—must adapt to ensure these designs meet power, performance, and area (PPA) goals across both platforms. This seminar proposes a research investigation into how RTL coding styles in SystemVerilog and VHDL influence physical design outcomes, and whether meaningful PPA correlations exist between FPGA and ASIC implementations of the same RTL. By analyzing automation workflows and leveraging experimental data, the study aims to uncover actionable insights for optimizing generated RTL designs, enhancing their portability and efficiency across target technologies.
Research Motivation:
Generated RTL, often produced via high-level synthesis or IP tools, is widely used for both ASICs and FPGAs, yet the impact of coding practices on physical design remains poorly understood. Differences in FPGA and ASIC toolchains and architectures suggest potential PPA discrepancies, raising questions about design portability and optimization strategies. This research tackles these gaps, offering a student the chance to explore a practical, industry-relevant problem at the intersection of RTL coding and physical design automation.
Research Objectives:
RTL Coding Analysis: Investigate how SystemVerilog and VHDL coding styles (e.g., modularity, hierarchy, constraints) affect synthesis and physical design outcomes for generated RTL.
PDA Workflow Evaluation: Assess the effectiveness of automation tools in translating RTL to physical layouts for ASIC and FPGA targets.
PPA Comparison: Measure and compare power, performance, and area metrics between FPGA and ASIC implementations of identical RTL designs.
Correlation Study: Identify if consistent PPA relationships exist across platforms and propose guidelines for RTL optimization based on findings.
References:
•
RTL Architect: Physically-Aware RTL Analysis | Synopsys. (n.d.). https://www.synopsys.com/implementation-and-signoff/rtl-synthesis-test/rtl-architect.html
•
X. Yao, Y. Wang, X. Li, Y. Lian, R. Chen, L. Chen, M. Yuan, H. Xu, and B. Yu, RTLRewriter: Methodologies for Large Models aided RTL Code Optimization, 2024. arXiv: 2409.11414 [cs.AR]. [Online]. Available: https://arxiv.org/abs/2409.11414.
•
S. Y. Neyaz, I. Saxena, N. Alam and S. A. Rahman, "FPGA and ASIC Implementation and Comparison of Multipliers," 2020 International Symposium on Devices, Circuits and Systems (ISDCS), Howrah, India, 2020, pp. 1-4, doi: 10.1109/ISDCS49393.2020.9263027.
•
Heyden, Malin. (2023). High Level Synthesis for ASIC and FPGA.
Kontakt
mohamed.badawy@infineon.com
Betreuer:
Modeling and Simulation of Silicon Photonics Systems in SystemVerilog/XMODEL
Beschreibung
Silicon photonics integrates both photonic and electronic components on the same silicon chip and promises ultra-dense, high-bandwidth interconnects via wavelength division multiplexing (WDM). However, when verifying such silicon photonic systems, the existing IC simulators face challenges due to the WDM signals containing multiple frequency tones at ~200-THz with ~50-GHz spacing. In this seminar, the student will investigate the modeling approach for the silicon photonic elements and devices as equivalent multi-port transmission lines using XMODEL primitives and simulating the WDM link models in an efficient, event-driven fashion in SystemVerilog.
Kontakt
liaoyuan.cheng@tum.de
Betreuer:
SPICE-Compatible Modeling and Design for Electronic-Photonic Integrated Circuits
Beschreibung
Electronic-photonic integrated circuit (EPIC) technologies are revolutionizing computing systems by improving their performance and energy efficiency. However, simulating EPIC is challenging and time-consuming. In this seminar, the student will investigate the modeling method for EPIC.
Kontakt
liaoyuan.cheng@tum.de
Betreuer:
Dynamic Neural Networks
Beschreibung
Deep Neural Networks (DNNs) have shown high predictive performance on various tasks. However, the large compute requirements of DNNs restrict their potential deployment on embedded devices with limited resources.
Dynamic Neural Networks (DyNNs) are a class of neural networks that can adapt their structure, parameters, or computation graph based on input data. Unlike the conventional DNNs, which have a fixed architecture once trained, DyNNs offer greater efficiency and adaptability. In particular, DyNNs can reduce latency, memory usage, and energy consumption during inference by activating only the necessary subset of its structure based on the difficulties of the input data.
The most recent survey paper in [1] provides an overview of DyNNs methods until the year 2021. This seminar topic covers a literature research on the more recent DyNNs methods with the focus on DyNNs for computer vision tasks (cf. Section 2 & 3 in [1]) and their training methodologies (cf. Section 5 in [1]). You are expected to find 3-4 more recent papers on this topic, and review and compare their methods including their advantages and drawbacks.
[1] Han, Yizeng, et al. "Dynamic neural networks: A survey." IEEE transactions on pattern analysis and machine intelligence 44.11 (2021): 7436-7456.
Kontakt
mikhael.djajapermana@tum.de
Betreuer:
Programmable Shape Memory Polymers and Their Integration into 4D-Printed Microfluidic Systems
Shape Memory Polymer, Microfluidic, Design Automation, 4D Printing
This seminar topic will explore the principles of programmable SMPs, their synthesis and characterization, the methodologies for integrating them into 4D-printed microfluidic systems, and, more importantly, how to consider the additional 4th dimension in the design automation process for printed microfluidic devices.
Beschreibung
Shape memory polymers (SMPs) are a class of smart materials capable of returning from a deformed state to their original shape upon exposure to specific stimuli, such as temperature changes. The paper "Shape memory polymer with programmable recovery onset" introduces an innovative SMP with a tunable recovery onset temperature, enabling precise control over the activation conditions of the shape recovery process. This advancement significantly broadens the potential applications of SMPs in various fields.
In the realm of microfluidics, the integration of SMPs offers promising opportunities for the development of adaptive and responsive devices. 3D printing technologies have revolutionized the fabrication of microfluidic systems, allowing for rapid prototyping and complex geometries that were previously challenging to achieve. The combination of programmable SMPs with 3D-printed microfluidic devices paves the way for novel 4D-printed microfluidics, creating components such as valves, pumps, and actuators that can dynamically respond to environmental stimuli.
The integration of SMPs into 4D-printed microfluidic devices enhances design by introducing elements that can change shape or function in response to specific triggers, thereby reducing the need for external controls and simplifying device architectures. For instance, SMP-based valves can be designed to open or close at predetermined temperatures, enabling automated flow regulation within microfluidic channels. This self-actuating behavior can be leveraged to design more efficient and autonomous lab-on-a-chip systems.
Furthermore, the use of 3D printing in fabricating these SMP-integrated microfluidic devices offers unparalleled design flexibility and rapid prototyping capabilities. Techniques such as stereolithography (SLA) have been employed to create intricate microfluidic components with integrated functionalities. For example, SLA has been used to print fluidic valves and pumps in optically clear, biocompatible plastics, facilitating the development of user-friendly fluid automation devices that can replace costly robotic pipettors or manual pipetting processes.
This seminar topic will explore the principles of programmable SMPs, their synthesis and characterization, the methodologies for integrating them into 4D-printed microfluidic systems, and, more importantly, how to consider the additional 4th dimension in the design automation process for printed microfluidic devices.
Kontakt
Yushen.Zhang+Seminar@tum.de
Betreuer:
The Dark Art of Embedded Unsafe Code: Arguments in Favor of Breaking Rusts Safety Requirements
Beschreibung
The development of efficient and reliable code for embedded systems necessitates a delicate balance between speed and safety. Programming languages like Rust have been designed to facilitate the achievement of these objectives, with a particular emphasis on memory safety. However, certain interactions with the underlying system may require the use of unsafe code, which can compromise the integrity of the system.
Research has shown that the judicious use of unsafe code blocks is crucial in writing high-quality code [1, 2]. A comprehensive understanding of unsafe Rust patterns can therefore enable developers to properly handled these unsafe segments, thereby mitigating the associated risks. The Rustonomicon [3] serves as a testament to the complexity of unsafe Rust programming, highlighting the need for a nuanced approach to dealing with such code.
This seminar aims to investigate the current perceptions surrounding the use of unsafe code in embedded systems. Through a critical analysis of existing literature and case studies, participants will explore the scenarios in which unsafe code is unavoidable and those in which it is employed as a workaround for performance limitations. By examining the intricacies of unsafe Rust programming, students will gain a deeper understanding of the Rust language and the challenges associated with ensuring code safety and performance in embedded systems.
Bibliography
[1] ASTRAUSKAS, Vytautas, et al. How do programmers use unsafe rust?. Proceedings of the ACM on Programming Languages, 2020, 4. Jg., Nr. OOPSLA, S. 1-27. https://dl.acm.org/doi/abs/10.1145/3428204
[2] ZHANG, Yuchen, et al. On the dual nature of necessity in use of Rust unsafe code. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023. S. 2032-2037.ZHANG, Yuchen, et al. On the dual nature of necessity in use of Rust unsafe code. In: Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering. 2023. S. 2032-2037. https://dl.acm.org/doi/pdf/10.1145/3611643.3613878
[3] The Rustonomicon, https://doc.rust-lang.org/nomicon/
Kontakt
Betreuer:
Safe, Safer, Rust? Safe Cross Compilation of Embedded C code to Rust
Beschreibung
Embedded software development poses significant challenges in software engineering, particularly in ensuring the reliability and safety of complex systems. To address these challenges, various tooling solutions have been developed to facilitate the creation of code that meets stringent requirements. A key concern in this context is memory safety, as memory access violations can lead to unpredictable and undesirable behavior.
Since quite a while, embedded code was written in C, which means that a lot of legacy code exists in C but not in Rust. While one can assume this C code by now to be memory safe, due to continuous efforts over the years, new code might introduce new issues into legacy software. For this purpose, the tendency shifted towards coding in Rust to achieve memory safe code. The Rust programming language and its ecosystem have been designed to mitigate this challenge by enforcing strict memory safety guarantees. To achieve this, Rust imposes limitations on its feature set and requires developers to explicitly define and manage concepts such as lifetime and ownership.
However, the transpiling of existing legacy C to Rust code poses significant challenges due to the differences in language design and safety guarantees. C code often employs pointer manipulation and direct memory access without additional checks, which can lead to safety violations. To overcome these challenges, it is essential to understand the limitations and mitigation strategies required for transcompiling C code to Rust.
This seminar aims to conduct a comprehensive literature review to identify and evaluate existing research on safe subsets of common language features in Rust and C. Students will analyze and extract the common safe features of both languages and investigate techniques for transforming potentially unsafe code into safe code. By exploring the intersection of Rust and C, this seminar seeks to contribute to the development of more reliable and secure embedded software systems.
Bibliography
[1] LING, Michael, et al. In Rust we trust: a transpiler from unsafe C to safer Rust. In: Proceedings of the ACM/IEEE 44th international conference on software engineering: companion proceedings. 2022. S. 354-355. https://dl.acm.org/doi/pdf/10.1145/3510454.3528640
[2] FROMHERZ, Aymeric; PROTZENKO, Jonathan. Compiling C to Safe Rust, Formalized. arXiv preprint arXiv:2412.15042, 2024. https://arxiv.org/pdf/2412.15042
[3] ZHANG, Hanliang, et al. Ownership guided C to Rust translation. In: International Conference on Computer Aided Verification. Cham: Springer Nature Switzerland, 2023. S. 459-482. https://komaec.github.io/files/ownership.pdf
Kontakt
Betreuer:
Power Estimation for DRAM
Beschreibung
Precise power estimation is getting increasingly important, not only to optimize battery usage, but also for cooling large computing clusters. Along all integrated circuit design stages, power modeling is required. There are many factors that influence power: circuit structure, workload, technology node. Especially in early design stages, a precise power estimation would allow for more efficient design space exploration.
While many methods focus on power dissipation of combinational logic or on system level, memory structures need further consideration. In system on chip designs, tha majority of energy consumption is due to memory accesses.
In this project, a survey on current methodologies on DRAM power modeling should be conducted. A focus should be set on the DRAMPower framework of the RTPU Kaiserslautern [1].
[1] DRAMPower: Open-source DRAM Power & Energy Estimation Tool
Karthik Chandrasekar, Christian Weis, Yonghui Li, Sven Goossens, Matthias Jung, Omar Naji, Benny Akesson, Norbert Wehn, and Kees Goossens
URL: http://www.drampower.info
Kontakt
If you are interested in this topic, send me an email to: philipp.fengler@tum.de
Betreuer:
Placement of Systolic Arrays for Neural Network Accelerators
Beschreibung
Systolic arrays are a proven architecture for parallel processing across various applications, offering design flexibility, scalability, and high efficiency. With the growing importance of neural networks in many areas, there is a need for efficient processing of the underlying computations, such as matrix multiplications and convolutions. These computations can be executed with a high degree of parallelism on neural network accelerators utilizing systolic arrays.
Just as any application-specific integrated circuit (ASIC) or field-programmable gate array (FPGA) design, neural network accelerators go through the standard phases of chip design, however, treating systolic array hardware designs the same way as any other design may lead to suboptimal results, as utilizing the regular structure of systolic arrays can lead to better solution quality[1].
Relevant works for this seminar topic include the work of Fang et al. [2], where a regular placement is used as an initial solution and then iteratively improved using the RePlAce[3] placement algorithm. The placement of systolic arrays on FPGAs is discussed by Hu et al., where the processing elements of the systolic array are placed on the DSP columns in a manner that is more efficient than the default placement of commercial placement tools[4].
In this seminar, you will investigate different macro and cell placement approaches, focusing on methods that specifically consider systolic array placement. If you have questions regarding this topic, please feel free to contact me.
[1] S. I. Ward et al., "Structure-Aware Placement Techniques for Designs With Datapaths," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 2, pp. 228-241, Feb. 2013, doi: https://doi.org/10.1109/TCAD.2012.2233862
[2] D. Fang, B. Zhang, H. Hu, W. Li, B. Yuan and J. Hu, "Global Placement Exploiting Soft 2D Regularity". in ACM Transactions on Design Automation of Electronic Systems, vol. 30, no. 2, pp. 1-21, Jan. 2025, doi: https://doi.org/10.1145/3705729
[3] C. -K. Cheng, A. B. Kahng, I. Kang and L. Wang, "RePlAce: Advancing Solution Quality and Routability Validation in Global Placement," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 9, pp. 1717-1730, Sept. 2019, doi: https://doi.org/10.1109/TCAD.2018.2859220
[4] H. Hu, D. Fang, W. Li, B. Yuan and J. Hu, "Systolic Array Placement on FPGAs," 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, CA, USA, 2023, pp. 1-9, doi: https://doi.org/10.1109/ICCAD57390.2023.10323742
Kontakt
benedikt.schaible@tum.de
Betreuer:
Sub-Gate-Level Power Estimation
Beschreibung
Precise power estimation is getting increasingly important, not only to optimize battery usage, but also for cooling large computing clusters. Along all integrated circuit design stages, power modeling is required. There are many factors that influence power: circuit structure, workload, technology node. Especially in early design stages, a precise power estimation would allow for more efficient design space exploration.
Artificial intelligence for electronic design automation is getting more and more attention over the last decade. Also power modeling is affected by this trend. However, AI is mainly based on data-driven modeling approaches. Here, it is questionable, what is a precise ground truth for power modeling? In many papers, commercial tools for power estimation with data from gate-level, timing-annotated simulation are named as "gold standard". However, it is questionable, if effects, which are observable on sub-gate levels, like in SPICE simulations on physical layer, are not considered by this ground truth.
In this seminar topic, a survey on power estimation with data from sub-gate design levels should be made. The obstacles and the gain in precision of power estimates should be tackled.
Voraussetzungen
- Interest in integrated circuit design flow (especially low design levels)
- Knowledge on power dissipation of CMOS transistors
- Familiar with SPICE simulation
Kontakt
If you are interested in this seminar topic, write a mail to: philipp.fengler@tum.de
Betreuer:
Thermal-aware Optical-electrical Routing Codesign for On-chip Signal Communications
Beschreibung
Abstract - The optical interconnection is a promising solution for on-chip signal communication in modern system-on-chip (SoC) and heterogeneous integration designs, providing large bandwidth and high-speed transmission with low power consumption. Previous works do not handle two main issues for on-chip optical-electrical (O-E) co-design: the thermal impact during O-E routing and the trade-offs among power consumption, wirelength, and congestion. As a result, the thermal-induced band shift might incur transmission malfunction; the power consumption estimation is inaccurate; thus, only suboptimal results are obtained. To remedy these disadvantages, we present a thermal-aware optical-electrical routing co-design flow to minimize power consumption, thermal impact, and wirelength. Experimental results based on the ISPD 2019 contest benchmarks show that our co-design flow significantly outperforms state-of-the-art works in power consumption, thermal impact, and wirelength.
Kontakt
alex.truppel@tum.de
Betreuer:
Lithium tantalate photonic integrated circuits for volume manufacturing
Beschreibung
Electro-optical photonic integrated circuits (PICs) based on lithium niobate (LiNbO3) have demonstrated the vast capabilities of materials with a high Pockels coefficient1,2. They enable linear and high-speed modulators operating at complementary metal–oxide–semiconductor voltage levels3 to be used in applications including data-centre communications4, high-performance computing and photonic accelerators for AI5. However, industrial use of this technology is hindered by the high cost per wafer and the limited wafer size. The high cost results from the lack of existing high-volume applications in other domains of the sort that accelerated the adoption of silicon-on-insulator (SOI) photonics, which was driven by vast investment in microelectronics. Here we report low-loss PICs made of lithium tantalate (LiTaO3), a material that has already been adopted commercially for 5G radiofrequency filters6 and therefore enables scalable manufacturing at low cost, and it has equal, and in some cases superior, properties to LiNbO3. We show that LiTaO3 can be etched to create low-loss (5.6 dB m−1) PICs using a deep ultraviolet (DUV) stepper-based manufacturing process7. We demonstrate a LiTaO3 Mach–Zehnder modulator (MZM) with a half-wave voltage–length product of 1.9 V cm and an electro-optic bandwidth of up to 40 GHz. In comparison with LiNbO3, LiTaO3 exhibits a much lower birefringence, enabling high-density circuits and broadband operation over all telecommunication bands. Moreover, the platform supports the generation of soliton microcombs. Our work paves the way for the scalable manufacture of low-cost and large-volume next-generation electro-optical PICs.
Kontakt
zhidan.zheng@tum.de
Betreuer:
On Memory Optimization of Tensor Programs
In this seminar the student will review state-of-the art memory-aware optimization techniques applied to tensor-level AI programs.
Beschreibung
Compact electronic edge devices have limited memory resources. As AI models can require large amounts of memory, running AI models on edge devices becomes challenging. Thus, optimizing AI programs that can be deployed on edge devices is necessary while saving costly memory transfers.
This need has motivated current works exploring different memory-aware optimization techniques that reduce memory utilization but do not modify the DNN parameters (as during compression or network architecture search (NAS)), such as fused tiling, memory-aware scheduling, and memory layout planning [1]. For instance, DORY (Deployment Oriented to memoRY) is an automated tool designed for deploying deep neural networks (DNNs) on low-cost microcontroller units with less than 1MB of on-chip SRAM memory. It tackles the challenge of tiling by framing it as a Constraint Programming (CP) problem, aiming to maximize the utilization of L1 memory while adhering to the topological constraints of each DNN layer. DORY then generates ANSI C code to manage the transfers between off-chip and on-chip memory and the computation phases [2]. DORY has been integrated with TVM to ease the support for heterogeneous compilation and offloading operations not supported by the accelerator to a regular host CPU [3].
This seminar topic reviews state-of-the-art approaches for memory-aware optimization techniques of ML tensor programs targeting constrained edge devices. The different methods and results shall be reviewed and compared.
References:
[1] Rafael Christopher Stahl.Code Optimization and Generation of Machine Learning and Driver Software for Memory-Constrained Edge Devices. 2024. Technical University of Munich, PhD Thesis. URL: https://mediatum.ub.tum.de/doc/1730282/1730282.pdf
[2] A. Burrello, A. Garofalo, N. Bruschi, G. Tagliavini, D. Rossi and F. Conti, "DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs," in IEEE Transactions on Computers, vol. 70, no. 8, pp. 1253-1268, 2021, https://doi.org/10.1109/TC.2021.3066883
[3] Van Delm, Josse, et al. "HTVM: Efficient neural network deployment on heterogeneous TinyML platforms." 2023 60th ACM/IEEE Design Automation Conference (DAC). IEEE, 2023. https://doi.org/10.1109/DAC56929.2023.10247664
Kontakt
Andrew.stevens@infineon.com
Daniela.sanchezlopera@infineon.com
Betreuer:
On-device learning
In this seminar the student will review state-of-the art contributions to the on-device learning research area.
Beschreibung
TinyML is a research area aiming to bring machine learning models to resource-constrained IoT devices and microcontrollers. Current research mainly focuses on enabling inference on such devices, tackling challenges such as limited memory and computation resources available. But for specific sensing and IoT applications, on-device learning would allow retraining and refining ML models directly on small and low-power devices. However, on-device learning on edge devices is much more challenging than inference due to larger memory footprints and increased computing operations to store intermediate activations and gradients [1].
To tackle those challenges, different strategies involving, among others, quantization, sparse backpropagation, or new layer types have been proposed and summarized [1, 2]. This seminar will review state-of-the-art approaches for on-device learning techniques targeting constrained edge devices. The different methods and results shall be reviewed and compared.
References:
[1] J. Lin, L. Zhu, W. -M. Chen, W. -C. Wang and S. Han, "Tiny Machine Learning: Progress and Futures [Feature]," in IEEE Circuits and Systems Magazine, vol. 23, no. 3, pp. 8-34, 2023, https://doi.org/10.1109/MCAS.2023.3302182
[2] Shuai Zhu, Thiemo Voigt, Fatemeh Rahimian, and JeongGil Ko. 2024. On-device Training: A First Overview on Existing Systems. ACM Trans. Sen. Netw. Just Accepted (September 2024). https://doi.org/10.1145/3696003
Kontakt
Betreuer:
Innovative Memory Architectures in DNN Accelerators
Beschreibung
With the growing complexity of neural networks, more efficient and faster processing solutions are vital to enable the widespread use of artificial intelligence. Systolic arrays are among the most popular architectures for energy-efficient and high-throughput DNN hardware accelerators.
While many works implement DNN accelerators using systolic arrays on FPGAs, several (ASIC) designs from industry and academia have been presented [1-3]. To fulfill the requirements that such accelerators place on memory accesses, both in terms of data availability and latency hiding, innovative memory architectures can enable more efficient data access, reducing latency and bridging the gap towards even more powerful DNN accelerators.
One example is the Eyeriss v2 ASIC [1], which uses a distributed Global Buffer (GB) layout tailored to the demands of their row-stationary systolic array dataflow.
In this seminar, a survey of state-of-the-art DNN accelerator designs and design frameworks shall be created, focusing on their memory hierarchy.
References and Further Resources:
[1] Y. -H. Chen, T. -J. Yang, J. Emer and V. Sze. 2019 "Eyeriss v2: A Flexible Accelerator for Emerging Deep Neural Networks on Mobile Devices," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 9, no. 2, pp. 292-308, June 2019, doi: https://doi.org/10.1109/JETCAS.2019.2910232
[2] Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. "DianNao family: energy-efficient hardware accelerators for machine learning." In Commun. ACM 59, 11 (November 2016), 105–112. https://doi.org/10.1145/2996864
[3] Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, et al. 2017. "In-Datacenter Performance Analysis of a Tensor Processing Unit." In Proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA '17). Association for Computing Machinery, New York, NY, USA, 1–12. https://doi.org/10.1145/3079856.3080246
[4] Rui Xu, Sheng Ma, Yang Guo, and Dongsheng Li. 2023. A Survey of Design and Optimization for Systolic Array-based DNN Accelerators. ACM Comput. Surv. 56, 1, Article 20 (January 2024), 37 pages. https://doi.org/10.1145/3604802
[5] Bo Wang, Sheng Ma, Shengbai Luo, Lizhou Wu, Jianmin Zhang, Chunyuan Zhang, and Tiejun Li. 2024. "SparGD: A Sparse GEMM Accelerator with Dynamic Dataflow." ACM Trans. Des. Autom. Electron. Syst. 29, 2, Article 26 (March 2024), 32 pages. https://doi.org/10.1145/3634703
Kontakt
benedikt.schaible@tum.de
Betreuer:
Design Space Exploration Methods for Neural Network Accelerators
Beschreibung
The efficiency of an accelerator depends on three factors—mapping, deep neural network (DNN) layers, and hardware. The process of hardware design space exploration requires both hardware parameters and mappings from the algorithm onto the target hardware to be discovered and optimized.
This project aims to identify the most prominent approaches and compare them.
Kontakt
samira.ahmadifarsani@tum.de
Betreuer:
Comparative Study of Hardware Architectures for Neural Network Accelerators
Beschreibung
This literature review will focus on exploring and comparing different hardware architectures designed specifically for neural network accelerators, examining how each architecture is optimized for specific neural network tasks (e.g., convolutional neural networks (CNNs)).
The study could highlight the trade-offs between various design choices, such as parallelism, memory hierarchy, dataflow, flexibility, and integration with CPUs.
Kontakt
samira.ahmadifarsani@tum.de
Betreuer:
Post-processing Flow-Layer Routing with Length-Matching Constraint for Flow-Based Microfluidic Biochips
Beschreibung
Here's a consolidated project description based on your provided information:
This project addresses the challenges in the current process of synthesizing microfluidic chips, particularly focusing on the gap in the complete synthesis flow which can lead to reduced performance, resource wastage, or infeasible designs. The general synthesis process typically involves three stages: high-level synthesis, followed by the design of the flow layer, and finally, the design of the control layer.
Current state-of-the-art synthesis methods, primarily operating at the operation- and device-level, make assumptions regarding the availability of fluid transportation paths. They often overlook the physical layout of control and flow channels and neglect the flow rate. This oversight can lead to biased scheduling of fluid transportation time during synthesis.
Our project proposes an innovative approach to bridge this gap. By considering the known physical design of microfluidic chips and the desired experiments, represented as sequence graphs, we aim to improve the physical design. The approach involves adjusting the lengths of the channels according to the required fluid volume. This adjustment is expected to reduce the number of valves and control ports in the original physical design, thereby enhancing the efficiency and feasibility of microfluidic chip designs.
Kontakt
m.lian@tum.de
Betreuer:
Pre-training Network Pruning
In this seminar the student will review state-of-the art pruning techniques applied before training such as SNIP.
Beschreibung
“Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. Conventionally, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters during training or using statistically heuristics after training. However, using suitable heuristic criteria, inspired by the “Lottery Ticket” hypothesis networks can also be pruned before training. This eliminates the need for both pretraining and the complex pruning schedules and is well suited to use in combination with neural architecture search. making it robust to architecture variations. The canonical method SNIP [1] introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. These methods can obtain extremely sparse networks and are claimed to retain the same accuracy as reference network on benchmark classification tasks.” As such pre-training pruning methods are potentially a highly attractive alternative to post-training training-time co-optimization methods for use in automated industrial machine learning deployment toolchains. References: [1] Lee, Namhoon, Thalaiyasingam Ajanthan, and Philip HS Torr. "Snip: Single-shot network pruning based on connection sensitivity." arXiv 2018. https://arxiv.org/abs/1810.02340 [2] Artem Vysogorets and Julia Kempe .“Connectivity Matters: “Neural Network Pruning Through the Lens of Effective Sparsity.” https://www.jmlr.org/papers/volume24/22-0415/22-0415.pdf [3] Jonathan Frankle, Gintare Karolina Dziugaite, Daniel M. Roy, Michael Carbin: “Pruning Neural Networks at Initialization: Why are We Missing the Mark?” ICLR 2021. https://arxiv.org/abs/2009.08576 [4] Pau de Jorge, Amartya Sanyal, Harkirat S. Behl, Philip H.S. Torr, Gregory Rogez, Puneet K. Dokania: “Progressive Skeletonization: Trimming more fat from a network at initialization”. https://arxiv.org/abs/2006.09081
Kontakt
Andrew.stevens@infineon.com
Daniela.sanchezlopera@infineon.com
Betreuer:
Checksum-based Error Detection for Reliable Computing
Beschreibung
In safety-critical systems, random hardware faults, such as transient soft errors (e.g., due to radiation) or permanent circuit faults, can lead to disastrous failures. Detecting these errors is, therefore, one major design goal. A state-of-the-art solution is redundancy, where a computation is performed multiple times, and their respective results are compared. This can be achieved either sequentially (temporal redundancy) or at the same time, e.g., through lock-stepped computational units (spatial redundancy). The baseline is that a fault does not happen in a close vicinity to a former one.
However, this redundancy method introduces a significant overhead to the system: The required multiplicity of computational resources - execution time or processing nodes. Checksum-based computation aims to mitigate the amount of computational overhead by introducing redundancy into the algorithms, e.g., filter and input checksums for convolution algorithms.
Betreuer:
Reliability-Aware Design Flow for Silicon Photonics On-Chip Interconnect
Beschreibung
Intercore communication in many-core processors presently faces scalability issues similar to those that plagued intracity telecommunications in the 1960s. Optical communication promises to address these challenges now, as then, by providing low latency, high bandwidth, and low power communication. Silicon photonic devices presently are vulnerable to fabrication and temperature-induced variability. Our fabrication and measurement results indicate that such variations degrade interconnection performance and, in extreme cases, the interconnection may fail to function at all. In this paper, we propose a reliability-aware design flow to address variation-induced reliability issues. To mitigate effects of variations, limits of device design techniques are analyzed and requirements from architecture-level design are revealed. Based on this flow, a multilevel reliability management solution is proposed, which includes athermal coating at fabrication-level, voltage tuning at device-level, as well as channel hopping at architecture-level. Simulation results indicate that our solution can fully compensate variations thereby sustaining reliable on-chip optical communication with power efficiency.
Kontakt
zhidan.zheng@tum.de
Betreuer:
Percolation on complex networks: Theory and application
Beschreibung
In the last two decades, network science has blossomed and influenced various fields, such as statistical physics, computer science, biology and sociology, from the perspective of the heterogeneous interaction patterns of components composing the complex systems. As a paradigm for random and semi-random connectivity, percolation model plays a key role in the development of network science and its applications. On the one hand, the concepts and analytical methods, such as the emergence of the giant cluster, the finite-size scaling, and the mean-field method, which are intimately related to the percolation theory, are employed to quantify and solve some core problems of networks. On the other hand, the insights into the percolation theory also facilitate the understanding of networked systems, such as robustness, epidemic spreading, vital node identification, and community detection. Meanwhile, network science also brings some new issues to the percolation theory itself, such as percolation of strong heterogeneous systems, topological transition of networks beyond pairwise interactions, and emergence of a giant cluster with mutual connections. So far, the percolation theory has already percolated into the researches of structure analysis and dynamic modeling in network science. Understanding the percolation theory should help the study of many fields in network science, including the still opening questions in the frontiers of networks, such as networks beyond pairwise interactions, temporal networks, and network of networks. The intention of this paper is to offer an overview of these applications, as well as the basic theory of percolation transition on network systems.
Kontakt
m.lian@tum.de
Betreuer:
Physically Aware Wavelength-Routed Optical NoC Design for Customized Topologies with Parallel Switching Elements and Sequence-Based Models
Beschreibung
Abstract - The wavelength-routed optical network-on-chip (WRONoC) is a promising solution for system-on-chip designs. Recent work in the WRONoC topology designs mainly utilizes crossing switching elements (CSEs) as switching mechanisms on predefined templates. However, using CSEs incurs more microring resonator (MRR) usage and waveguide crossings than parallel switching elements (PSEs), and their predefined templates constrain the solution spaces. To remedy these disadvantages, we propose a fully automated topology design flow that utilizes PSE structures to reduce MRR usage and waveguide crossings. Our add-drop filter sequence model expands the solution space and leverages the advantage of the crossing-free PSE structure. Our fixed-node crossing-aware edge routing effectively minimizes the waveguide crossings, and our A*-search preserves the admissibility property and guarantees an optimal routing solution. Besides, our design flow thoroughly considers the physical layout information. Experimental results show that our design substantially outperforms state-of-the-art works on customized designs.
Kontakt
alex.truppel@tum.de