Seminar on Topics in Integrated Systems

Lecturer (assistant)
TypeSeminar
Duration3 SWS
TermWintersemester 2024/25
Language of instructionEnglish

Dates

Admission information

See TUMonline
Note: Registration (via TUMonline from September 23th 2024 to October 20th 2024 ) is required. Limited number of participants! Students have to choose a seminar topic before the introduction lesson. Therefore you need to contact the supervisor of the topic you are interested in. Topics are selected on a first come first served basis. Topics will be published on October 7th 2024 <a href="https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-on-topics-in-integrated-systems/"> https://www.ce.cit.tum.de/en/lis/teaching/seminars/seminar-on-topics-in-integrated-systems/</a>

Objectives

At the end of the seminar, the student is able to present a state-of-the-art literature review in the area of integrated systems building blocks and architectures in an understandable and convincing manner. The following competencies will be acquired: * The student is able to independently analyze state-of-the-art concepts in the field of integrated systems. * The student is able to present a topic in a structured way according to problem formulation, state of the art, goals, methods, and results. * The student can present a topic according to the structure given above orally with a set of slides, and with a written report.

Description

Specific topics in the area of integrated circuits and systems will be offered. The participants independently work on a current scientific topic, write a paper and present their topic in a talk. In the subsequent discussion, the topic will be treated in-depth.

Prerequisites

Basic knowledge of integrated circuits and systems and their applications.

Teaching and learning methods

Learning method: Students elaborate a given scientific topic by themselves in coordination with the respective research assistant. Teaching method: Introductory lessons will be given by the course coordinator, further details are discussed between research assistant and student on an individual basis. Presentation skills will be educated by a professional teacher.

Examination

Examination with the following elements: - paper of 4 pages in IEEE format - talk of 20 Minutes and subsequent questions

Recommended literature

A set of topics and related literature is given at the start of the course. Each participant selects his/her topic.

Links

Offered Topics

Assigned Topics

Seminars

Cache Coherence Protocols for Multiprocessors

Description

Manycore architectures enhances parallel programming to achieve better performance and efficiency, thereby improving the parallel execution of applications. The shared-memory programming model, which is the predominant paradigm for parallel programming, interprets the distributed memory within many-core systems as a Distributed Shared Memory (DSM) architecture. This model necessitates a coherent memory data view across the memory components, including local caches, within a shared memory region, so that various processors can inherently communicate via loads/stores.

To ensure cache coherence, hardware-based protocols are employed, coordinating cache operations to maintain consistent data access across the system. More scalable and high-performance cache coherence protocols are essential to address the growing demands of high-performance many-core architectures.

 For this topic, the student will first quickly gain an understanding of classic directory-based and snoopy cache coherence protocols. More importantly, they will then explore state-of-the-art cache coherence protocols and examine how these are evaluated. A starting point of literature will be provided.

 

 

 

 

 

 

Prerequisites

 

Have a fundamental understanding of memory hierarchies

 

 

Contact

Supervisor:

Shichen Huang

A Comparison of Recent Memory Prefetching Techniques

Description

DRAM modules are indispensable for modern computer architectures. Their main advantages are an easy design with only 1 Transistor per Bit and a high memory density.

However, DRAM accesses are rather slow and require a dedicated DRAM controller that coordinates the read and write accesses to the DRAM as well as the refresh cycles.

In order to reduce the DRAM access latency, the cache hierarchy can be extended by dedictated hardware access predictors in order to preload certain data to the caches before it is actually accessed.

The goal of this Seminar is to study and compare prefetching mechanisms and access predictors on cache level with several optimizations and present their benefits and usecases. A starting point of literature will be provided.

 

 

Prerequisites

B.Sc. in Electrical engineering or similar degree

 

Contact

Oliver Lenke

o.lenke@tum.de

 

 

Supervisor:

Oliver Lenke

A Survey of Recent Prefetching Techniques for Processor Caches

Description

Cache Design have by design some compulsory cache misses, i.e. the first access of a certain cacheline will typically result in a cache miss, since the data is not present in the cache hierarchy yet.

In order to reduce this, caches can be extended by prefetching mechanisms that speculatively prefetch some cachelines before they first get accessed.

The goal of this Seminar is to study and compare different cache prefetcher designs and present their benefits and usecases. A starting point of literature will be provided.

Prerequisites

B.Sc. in Electrical engineering or similar degree

 

Contact

Oliver Lenke

o.lenke@tum.de

 

 

Supervisor:

Oliver Lenke

Asynchronous Design Using Standard EDA Tools

Description

Asynchronous logic have several advantages over conventional, clocked circuits which makes it of interest for certain areas of applications, such as network-on-chips, mixed-mode electronics, and arithmetic processors. Furthermore, a properly designed asynchronous circuit may offer both better performance and significantly lower power consumption than a synchronous equivalent.

Modern EDA tools, however, are not optimised for asynchronous design. This unfortunately complicates everything from architectural descriptions to synthesis and implementation, to verification and testing. A major concern lies in the fact that most tools are reliant upon global clocks for optimisation, as well as timing checks. For asynchronous circuits, where all functional blocks are self timed, this means that EDA tools will not be able to properly use clock constraints to optimise the critical path, thereby nullifying any speed advantages. And critically, EDA tools are not even guaranteed to produce functioning netlists. As such, in order to produce and test asynchronous circuits that are of non-trivial complexity, the standard design flow must be modified to take the characteristics of asynchronous logic into account.

For this seminar, the student should research the state-of-the-art for asynchronous logic design and testing with current industry standard EDA tools and what design flow modifications are required for producing robust and efficient asynchronous circuits.

Supervisor:

William Wulff

FPGA Implementations of RNNs: A Survey

Description

Field-programmable gate array (FPGA) implementations of recurrent neural networks (RNNs) are crucial because they provide high performance with low power consumption, making them ideal for real-time applications and embedded systems. Recent advances have shown that FPGAs can outperform traditional platforms like GPUs regarding energy efficiency while maintaining comparable accuracy.

In this seminar topic, your task is to introduce and summarize recent approaches for FPGA-based RNN accelerators. Furthermore, a comparison of different implementations concerning resource usage (lookup tables (LUTs), registers, digital signal processors (DSPs), and power dissipation) and performance (predictions per second, real-time capability) should be composed.

Outline:

  • Literature Review: Get an overview of recent advances in FPGA implementations of RNNs
  • Comparative Analysis: Summarize and compare the concepts of the most important implementations concerning resource usage and performance
  • Scientific Writing: Compose your findings in a paper, resulting in a concise overview and comparison
  • Presentation: Present your findings to other members of the seminar.

Prerequisites

  • Be familiar with deep learning, especially recurrent neural network architectures
  • Be familiar with FPGAs

Supervisor:

Simulation of Chiplet-based Systems

Description

With technology nodes approaching their physical limit, Moore’s law becomes continually more difficult to keep up with. As a strategy to allow further scaling, chiplet-based architectures will likely become more prevalent as they offer benefits regarding development effort and manufacturing yield.

Even while reusing IP, creating an entire multi-chiplet system is still a complicated task. Following a top-down approach, a high-level simulation can help design the system architecture before going to the register transfer level. As most available simulators cater to classical SoCs, setting up a simulation for chiplet-based systems might require special attention in selecting a framework and effort in its adaptation.

This seminar work should investigate what needs to be considered when simulating chiplet-based systems compared to SoCs, what simulation frameworks are viable, and what challenges simulation for chiplets and especially their interconnect brings.

A starting point for literature could be the following paper:
https://dl.acm.org/doi/abs/10.1145/3477206.3477459

Contact

michael.meidinger@tum.de

Supervisor:

Michael Meidinger

An Overview of Service Migration in Modern Edge Computer Networks

Description

In modern Edge computer networks, applications and services should adhere to service-level agreements (SLA) like low latency or minimal throughput. Depending on demand and resource availability, these services have to be migrated between compute nodes to ensure these SLAs.

Service migration is a critical aspect of Edge computing, enabling the movement of services closer to the data source or end-users for improved performance and reduced latency. However, it comes with its own set of challenges, such as maintaining service continuity and managing resource constraints. This involves checkpointing and restarting of the applications (potentially in containers), as well as moving the data from one compute node to the other. This data movement could be further improved with RDMA technology.

This seminar should provide a background overview of the required technologies for service migration and explore recent improvements for low-latency service migration in both hardware and software.

These papers are an interesting starting point for your literature research:
- https://ieeexplore-ieee-org.eaccess.tum.edu/abstract/document/10643902
- https://www.usenix.org/conference/atc21/presentation/planeta

Contact

marco.liess@tum.de

Supervisor:

Marco Liess

Exploration of Deadlock-Avoidance Algorithms for FPGA-Based Network-on-Chips

Description

Network-on-chip (NoC) is a communication architecture used in multi-core and many-core systems to interconnect processing elements (PEs), such as CPUs, GPUs, accelerators, and memory controllers, using packet-switched networks similar to those found in computer networks. It replaces traditional bus-based interconnects with a scalable and modular network infrastructure, offering higher performance, lower latency, and improved scalability. In a NoC, PEs are connected through a network of routers and links, forming a mesh, torus, or other topologies. Each router is responsible for forwarding packets between neighboring PEs using routing algorithms. NoC architectures can vary greatly in terms of topology, routing algorithms, flow control mechanisms, and other parameters, depending on the specific application requirements and design constraints.

Field-Programmable Gate Arrays (FPGAs) are integrated circuits that contain an array of configurable logic blocks interconnected through programmable routing resources. They provide a versatile and powerful platform for implementing digital circuits and systems, offering flexibility, reconfigurability, parallelism, and hardware acceleration capabilities. Hence, they are well-suited for a wide range of applications across various domains, including telecommunications, networking, automotive, aerospace, consumer electronics, and industrial automation.

FPGA-optimized NoCs are tailored to exploit the unique features and capabilities of FPGAs while addressing the challenges of communication and interconnection in FPGA-based systems. They play a crucial role in enabling efficient and scalable communication infrastructure for FPGA-based applications across a wide range of domains. The goal of this seminar work is to investigate state-of-the-art deadlock-avoidance algorithms for FPGA-based NoCs.

Relevant literature
[1] Monemi, Alireza, et al. "ProNoC: A low latency network-on-chip based many-core system-on-chip prototyping platform." Microprocessors and Microsystems 54 (2017): 60-74.
[2] Becker, Daniel U. Efficient microarchitecture for network-on-chip routers. Stanford University, 2012.
[3] Xu, Yi, et al. "Simple virtual channel allocation for high throughput and high frequency on-chip routers." HPCA-16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture. IEEE, 2010.

Supervisor:

Klajd Zyla