Seminar on Topics in Integrated Systems
Lecturer (assistant) | |
---|---|
Type | Seminar |
Duration | 3 SWS |
Term | Wintersemester 2024/25 |
Language of instruction | English |
Dates
- 25.10.2024 14:00-15:30 N2128, Seminarraum
- 04.11.2024 15:00-16:30 2999, Seminarraum, Joint workshop for STISD and STEDA
- 18.11.2024 15:00-16:30 2999, Seminarraum, Joint workshop for STISD and STEDA
Admission information
Note: Registration (via TUMonline from September 23th 2024 to October 20th 2024 ) is required. Limited number of participants! Students have to choose a seminar topic before the introduction lesson. Therefore you need to contact the supervisor of the topic you are interested in. Topics are selected on a first come first served basis. Topics will be published on October 7th 2024 <a href="https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-on-topics-in-integrated-systems/"> https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-on-topics-in-integrated-systems/</a>
Objectives
Description
Prerequisites
Teaching and learning methods
Examination
Recommended literature
Links
Assigned Topics
Seminars
Cache Coherence Protocols for Multiprocessors
Description
Manycore architectures enhances parallel programming to achieve better performance and efficiency, thereby improving the parallel execution of applications. The shared-memory programming model, which is the predominant paradigm for parallel programming, interprets the distributed memory within many-core systems as a Distributed Shared Memory (DSM) architecture. This model necessitates a coherent memory data view across the memory components, including local caches, within a shared memory region, so that various processors can inherently communicate via loads/stores.
To ensure cache coherence, hardware-based protocols are employed, coordinating cache operations to maintain consistent data access across the system. More scalable and high-performance cache coherence protocols are essential to address the growing demands of high-performance many-core architectures.
For this topic, the student will first quickly gain an understanding of classic directory-based and snoopy cache coherence protocols. More importantly, they will then explore state-of-the-art cache coherence protocols and examine how these are evaluated. A starting point of literature will be provided.
Prerequisites
Have a fundamental understanding of memory hierarchies
Contact
Supervisor:
A Comparison of Recent Memory Prefetching Techniques
Description
DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.
However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.
In order to reduce the DRAM access latency, the cache hierarchy can be extended by dedictated hardware access predictors in order to preload certain data to the caches before it is actually accessed.
The goal of this Seminar is to study and compare prefetching mechanisms and access predictors on cache level with several optimizations and present their benefits and usecases. A starting point of literature will be provided.
Prerequisites
B.Sc. in Electrical engineering or similar degree
Contact
Oliver Lenke
o.lenke@tum.de
Supervisor:
A Survey of Recent Prefetching Techniques for Processor Caches
Description
Cache Design have by design some compulsory cache misses, i.e. the first access of a certain cacheline will typically result in a cache miss, since the data is not present in the cache hierarchy yet.
In order to reduce this, caches can be extended by prefetching mechanisms that speculatively prefetch some cachelines before they first get accessed.
The goal of this Seminar is to study and compare different cache prefetcher designs and present their benefits and usecases. A starting point of literature will be provided.
Prerequisites
B.Sc. in Electrical engineering or similar degree
Contact
Oliver Lenke
o.lenke@tum.de
Supervisor:
Asynchronous Design Using Standard EDA Tools
Description
Asynchronous logic have several advantages over conventional, clocked circuits which makes it of interest for certain areas of applications, such as network-on-chips, mixed-mode electronics, and arithmetic processors. Furthermore, a properly designed asynchronous circuit may offer both better performance and significantly lower power consumption than a synchronous equivalent.
Modern EDA tools, however, are not optimised for asynchronous design. This unfortunately complicates everything from architectural descriptions to synthesis and implementation, to verification and testing. A major concern lies in the fact that most tools are reliant upon global clocks for optimisation, as well as timing checks. For asynchronous circuits, where all functional blocks are self timed, this means that EDA tools will not be able to properly use clock constraints to optimise the critical path, thereby nullifying any speed advantages. And critically, EDA tools are not even guaranteed to produce functioning netlists. As such, in order to produce and test asynchronous circuits that are of non-trivial complexity, the standard design flow must be modified to take the characteristics of asynchronous logic into account.
For this seminar, the student should research the state-of-the-art for asynchronous logic design and testing with current industry standard EDA tools and what design flow modifications are required for producing robust and efficient asynchronous circuits.
Supervisor:
FPGA Implementations of RNNs: A Survey
Description
Field-programmable gate array (FPGA) implementations of recurrent neural networks (RNNs) are crucial because they provide high performance with low power consumption, making them ideal for real-time applications and embedded systems. Recent advances have shown that FPGAs can outperform traditional platforms like GPUs regarding energy efficiency while maintaining comparable accuracy.
In this seminar topic, your task is to introduce and summarize recent approaches for FPGA-based RNN accelerators. Furthermore, a comparison of different implementations concerning resource usage (lookup tables (LUTs), registers, digital signal processors (DSPs), and power dissipation) and performance (predictions per second, real-time capability) should be composed.
Outline:
- Literature Review: Get an overview of recent advances in FPGA implementations of RNNs
- Comparative Analysis: Summarize and compare the concepts of the most important implementations concerning resource usage and performance
- Scientific Writing: Compose your findings in a paper, resulting in a concise overview and comparison
- Presentation: Present your findings to other members of the seminar.
Prerequisites
- Be familiar with deep learning, especially recurrent neural network architectures
- Be familiar with FPGAs
Supervisor:
Simulation of Chiplet-based Systems
Description
With technology nodes approaching their physical limit, Moore’s law becomes continually more difficult to keep up with. As a strategy to allow further scaling, chiplet-based architectures will likely become more prevalent as they offer benefits regarding development effort and manufacturing yield.
Even while reusing IP, creating an entire multi-chiplet system is still a complicated task. Following a top-down approach, a high-level simulation can help design the system architecture before going to the register transfer level. As most available simulators cater to classical SoCs, setting up a simulation for chiplet-based systems might require special attention in selecting a framework and effort in its adaptation.
This seminar work should investigate what needs to be considered when simulating chiplet-based systems compared to SoCs, what simulation frameworks are viable, and what challenges simulation for chiplets and especially their interconnect brings.
A starting point for literature could be the following paper:
https://dl.acm.org/doi/abs/10.1145/3477206.3477459
Contact
michael.meidinger@tum.de
Supervisor:
An Overview of Service Migration in Modern Edge Computer Networks
Description
In modern Edge computer networks, applications and services should adhere to service-level agreements (SLA) like low latency or minimal throughput. Depending on demand and resource availability, these services have to be migrated between compute nodes to ensure these SLAs.
Service migration is a critical aspect of Edge computing, enabling the movement of services closer to the data source or end-users for improved performance and reduced latency. However, it comes with its own set of challenges, such as maintaining service continuity and managing resource constraints. This involves checkpointing and restarting of the applications (potentially in containers), as well as moving the data from one compute node to the other. This data movement could be further improved with RDMA technology.
This seminar should provide a background overview of the required technologies for service migration and explore recent improvements for low-latency service migration in both hardware and software.
These papers are an interesting starting point for your literature research:
- https://ieeexplore-ieee-org.eaccess.tum.edu/abstract/document/10643902
- https://www.usenix.org/conference/atc21/presentation/planeta
Contact
marco.liess@tum.de
Supervisor:
Exploration of Deadlock-Avoidance Algorithms for FPGA-Based Network-on-Chips
Description
Network-on-chip (NoC) is a communication architecture used in multi-core and many-core systems to interconnect processing elements (PEs), such as CPUs, GPUs, accelerators, and memory controllers, using packet-switched networks similar to those found in computer networks. It replaces traditional bus-based interconnects with a scalable and modular network infrastructure, offering higher performance, lower latency, and improved scalability. In a NoC, PEs are connected through a network of routers and links, forming a mesh, torus, or other topologies. Each router is responsible for forwarding packets between neighboring PEs using routing algorithms. NoC architectures can vary greatly in terms of topology, routing algorithms, flow control mechanisms, and other parameters, depending on the specific application requirements and design constraints.
Field-Programmable Gate Arrays (FPGAs) are integrated circuits that contain an array of configurable logic blocks interconnected through programmable routing resources. They provide a versatile and powerful platform for implementing digital circuits and systems, offering flexibility, reconfigurability, parallelism, and hardware acceleration capabilities. Hence, they are well-suited for a wide range of applications across various domains, including telecommunications, networking, automotive, aerospace, consumer electronics, and industrial automation.
FPGA-optimized NoCs are tailored to exploit the unique features and capabilities of FPGAs while addressing the challenges of communication and interconnection in FPGA-based systems. They play a crucial role in enabling efficient and scalable communication infrastructure for FPGA-based applications across a wide range of domains. The goal of this seminar work is to investigate state-of-the-art deadlock-avoidance algorithms for FPGA-based NoCs.
Relevant literature
[1] Monemi, Alireza, et al. "ProNoC: A low latency network-on-chip based many-core system-on-chip prototyping platform." Microprocessors and Microsystems 54 (2017): 60-74.
[2] Becker, Daniel U. Efficient microarchitecture for network-on-chip routers. Stanford University, 2012.
[3] Xu, Yi, et al. "Simple virtual channel allocation for high throughput and high frequency on-chip routers." HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 2010.