Accelerating Convolutional Neural Networks using Programmable Logic

Dates:	Tuesday 10:00-12:00 (Lecture) (Room: 01.06.020) Thursday: 16:00-18:00 (Lab/Question Session)
First meeting:	Tuesday 29.04 at 10:00-12:00 (Room: 01.06.020)
Additional Information:	slides
ECTS:	10
Language:	English
Type:	Bachelor/Master lab course (IN0012, IN2106)
Moodle course:	t.b.d
Registration:	Registration is through the matching system
Questions?	Contact dirk.stober(at)tum.de
Warning	This course requires the installation of AMD Vitis that uses up to ~100GB free disk space on x86 systems running Linux or Windows (macOS not supported). Please make sure to have a supported x86 personal computer with sufficient disk space when signing up for this course.

This course is part of the BB-KI (Brandenburg / Bayern Aktion für KI-Hardware) chips project, aimed at offering practical courses in the area of dedicated AI Hardware.

Content

The course consists of a weekly lecture to teach the required concepts, introduce the practical exercises and student presentations. In addition, a weekly lab slot is offered for students to ask questions and for help regarding the practical exercises. The course will cover the following:

Introduction to Convolutional Neural Networks (CNNs) and implementation of CNN inference
Understanding of the building blocks of FPGAs and their purpose using SystemVerilog
Project in simulation and synthesis, co-designing your own CNN accelerator using HLS
You will implement the accelerator on an FPGA and integrate it with a CPU using the Pynq Z2 board
Evaluation of key performance metrics and comparison of SW/HW implementations

The main focus of the course is the acceleration of algorithms using FPGAs not on AI!

Grading

The lab will be done in small groups (max. 3 students) and consists of minor non-grade labs, as well as a mid-term report. The final grade will be based on a Project (HW/SW co-design of CNN inference) including a Report, Presentation and an individual discussion of the implementation

Learning Outcomes

Basic understanding of Convolutional Neural Networks (mainly Inference)
Basic Knowledge of existing AI Accelerators
Understanding the challenges of using PL to accelerate workloads
Ability to design simple digital circuits using RTL and HLS languages
Implementation and Integration of both SW and PL on a SoC platform (Pynq Z2)
Co-design of SW and HW
Ability to reason about the performance of different implementations

Prerequisites

Experience in Programming C/C++ required
Basic knowledge of Microcontrollers recommended
Basic knowledge of a RTL language (Verilog/VHDL) recommended or willingness to learn on your own
Knowledge of Machine Learning not required

To top

Informatik 10 - Lehrstuhl für Rechnerarchitektur & Parallele Systeme

Prof. Dr. Martin Schulz
schulzm(at)in.tum.de

Prof. Dr. Michael Gerndt
gerndt(at)in.tum.de

Prof. Dr.-Ing. Carsten Trinitis
Carsten.Trinitis(at)tum.de

Adresse:
Technische Universität München
Boltzmannstraße 3
85748 Garching
Deutschland

Sekretariat:
Raum 01.04.40
Tel.: +49 89 289-17659
Fax: +49 89 289-17662

Intranet