Accelerating Convolutional Neural Networks using Programmable Logic
Dates: | Tuesday 10:00-12:00 (Lecture) (Room: 01.06.020) Thursday: 16:00-18:00 (Lab/Question Session) |
First meeting: | Tuesday 16.04 at 10:00-12:00 (Room: 01.06.020) |
Additional Information: | slides |
ECTS: | 10 |
Language: | English |
Type: | Bachelor/Master lab course (IN0012, IN2106) |
Moodle course: | t.b.d |
Registration: | Registration is through the matching system |
Questions? | Contact dirk.stober(at)tum.de |
Warning | This course requires the installation of AMD Vitis that uses up to ~100GB free disk space on x86 systems running Linux or Windows (macOS not supported). Please make sure to have a supported x86 personal computer with sufficient disk space when signing up for this course. |
This course is part of the BB-KI (Brandenburg / Bayern Aktion für KI-Hardware) chips project, aimed at offering practical courses in the area of dedicated AI Hardware.
Content
The course consists of a weekly lecture to teach the required concepts, introduce the practical exercises and student presentations. In addition, a weekly lab slot is offered for students to ask questions and for help regarding the practical exercises. The course will cover the following:
- Introduction to Convolutional Neural Networks (CNNs) and implementation of CNN inference
- Understanding of the building blocks of FPGAs and their purpose using SystemVerilog
- Project in simulation and synthesis, co-designing your own CNN accelerator using HLS
- You will implement the accelerator on an FPGA and integrate it with a CPU using the Pynq Z2 board
- Evaluation of key performance metrics and comparison of SW/HW implementations
The main focus of the course is the acceleration of algorithms using FPGAs not on AI!
Grading
The lab will be done in small groups (max. 3 students) and consists of minor non-grade labs, as well as a mid-term report. The final grade will be based on a Project (HW/SW co-design of CNN inference) including a Report, Presentation and an individual discussion of the implementation
Learning Outcomes
- Basic understanding of Convolutional Neural Networks (mainly Inference)
- Basic Knowledge of existing AI Accelerators
- Understanding the challenges of using PL to accelerate workloads
- Ability to design simple digital circuits using RTL and HLS languages
- Implementation and Integration of both SW and PL on a SoC platform (Pynq Z2)
- Co-design of SW and HW
- Ability to reason about the performance of different implementations
Prerequisites
- Experience in Programming C/C++ required
- Basic knowledge of Microcontrollers recommended
- Basic knowledge of a RTL language (Verilog/VHDL) recommended or willingness to learn on your own
- Knowledge of Machine Learning not required