Student projects and final year projects at the Chair of Media Technology

We constantly offer topics for student projects (engineering experience, research experience, student, IDPs) and final year projects (bachelor thesis or master thesis).

Open Thesis

Download thesis as PDF

Collaborative Robotic Grasping during Teleoperation Tasks

Description

This topic focuses on improving robotic grasping networks and advancing embodied intelligence. Grasping is a fundamental capability in robotic manipulation and often plays a decisive role in the overall success of a task. Despite significant progress in learning-based grasping, current models still struggle with generalization and robustness in unstructured environments. Our goal is to enhance the success rate of existing grasping models and deploy them in real-world scenarios, where they can provide intelligent assistance during teleoperation tasks. By leveraging pre-trained grasping networks, we aim to reduce the human operator's workload, increase autonomy, and improve manipulation efficiency in complex and dynamic settings. This work offers a unique opportunity to work at the intersection of perception, control, and learning—pushing the boundaries of what robots can achieve through smarter, more adaptive grasping.

Prerequisites

Good Programming Skills (Python, C++)
Knowledge about Ubuntu/Linux/ROS
Motivation to learn and conduct research

Contact

dong.yang@tum.de

(Please attach your CV and transcript)

Supervisor:

Dong Yang

Download thesis as PDF

(Stereo) Depth Estimation in Challenging Conditions on Edge Devices

Description

This research focuses on enhancing stereo depth estimation techniques to operate effectively under challenging conditions on edge devices. The project aims to develop robust algorithms that can accurately estimate depth information in environments with varying lighting and weather conditions. By optimizing these algorithms for edge devices, the research ensures real-time processing and low-latency responses, which are crucial for portable navigation aids. The effectiveness of these improvements will be validated through a series of experiments, evaluating their performance in real-world scenarios.

Supervisor:

Rahul Chaudhari

Download thesis as PDF

Obstacle Detection and Avoidance Systems Using Meta Aria Smart Glasses

Description

This research focuses on testing and evaluating obstacle detection and avoidance solutions using Meta Aria smart glasses (and other available smart glasses technologies). The project will explore the integration of various detection algorithms and avoidance strategies with these wearable devices to assess their effectiveness in real-world environments.

Supervisor:

Rahul Chaudhari

Download thesis as PDF

Adaptive Visual Frame Rate Adjustment in VI-SLAM

Keywords:
Visual-Inertial SLAM, Robot Navigation

Description

We are looking for a motivated student to join our research on adaptive visual processing in SLAM systems. Building on our recent AFDI-SLAM framework, this project aims to develop more fine-grained frame rate modulation strategies based on motion dynamics, with a focus on decoupling translational and rotational cues, real-time performance, and intelligent scheduling.

Ideal candidates should have experience in computer vision, robotics, or deep learning. Familiarity with SLAM, PyTorch, or CUDA is a plus. This is a great opportunity to contribute to a practical, high-impact system and publish in top multimedia or robotics venues.

Prerequisites

C++ background

Familiarity with SLAM and PyTorch is a plus

Strong Motivation.

Contact

xin.su@tum.de

Supervisor:

Xin Su

Download thesis as PDF

HDR gain map implementation in python

Keywords:
HDR, gain map

Description

HDR gain map technology involves storing SDR and gain map within a single image file, utilizing a Gain Map to scale or fully recover the HDR rendition for viewing on displays of different HDR headrooms.

This topic includes an implementation of gain map technology in python, encoding gain map and storing gain maps with their metadata, decoding gain map and recovering HDR rendition.

This topic is suitable for an Ingenieurpraxis (IP) or Forschungspraxis (FP) for a duration of 9 weeks.

Prerequisites

Coding skills of Python and C++ for implementation

Knowledge of image processing and file formats

Availability for 9 weeks.

Contact

Hongjie You (hongjie.you@tum.de)

Supervisor:

Hongjie You

Download thesis as PDF

Hand-Object Interaction Reconstruction via Diffusion Model

Keywords:
Diffusion Model; Computer Vision

Description

This topic explores the use of diffusion models—an advanced generative AI technique—to reconstruct hand-object interactions from RGB images or videos. By learning the complex dynamics of hand movements and object manipulation, the model generates accurate 3D representations, benefiting applications in augmented reality, robotics, and human-computer interaction.

Prerequisites

Programming in Python
Knowledge about Deep Learning
Knowledge about Pytorch

Contact

xinguo.he@tum.de

Supervisor:

Xinguo He

Download thesis as PDF

Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization

Description

To achieve better user immersion and interaction fidelity, developing a multimodal tactile codec is necessary. Using correlation to compress multimodal signals into compact latent representations is a key challenge in multimodal codecs. VQ-VAE introduces a discrete latent variable space to achieve efficient coding, and it is promising for extension to multimodal scenarios. This project aims to use multimodal vector quantization to encode multiple tactile signals into a shared latent space.This unified representation will reduce redundancy while preserving important information for reconstruction.

Prerequisites

knowledge in deep learning
programming skills (python)
motivation in research

Contact

wenxuan.wei@tum.de

Supervisor:

Wenxuan Wei

Download thesis as PDF

Multimodal Tactile Data Compression through Shared-Private Representations

Description

The Tactile Internet relies on real-time transmission of multimodal tactile data for enhancing user immersion and fidelity. However, most existing tactile codecs are limited to vibrotactile data. They are not able to transmit richer multimodal signals.

This project aims to develop a novel tactile codec that supports multimodal data with a shared-private representation framework. A shared network will extract common semantic information from two modalities, while private networks capture modality-specific features. By sharing the common representations during reconstruction, the codec is expected to reduce the volume of data that needs to be transmitted.

Prerequisites

knowledge in deep learning
programming skills (python)
motivation in research

Contact

wenxuan.wei@tum.de

Supervisor:

Wenxuan Wei

Download thesis as PDF

Selfsupervised IMU-Denoising for Visual-Inertial SLAM

Keywords:
Selfsupervised Learning, IMU denoising

Description

In Visual-Inertial SLAM (Simultaneous Localization and Mapping), inertial measurement units (IMUs) are crucial for estimating motion. However, IMU data often contains accumulative noise, which degrades SLAM performance. Self-supervised machine learning techniques can automatically denoise IMU data without requiring labeled datasets. By leveraging self-supervised training, the project aim to explore neural networks distinguish useful IMU signal patterns from noise, improving the accuracy of motion estimation and robustness of Visual-Inertial SLAM systems.

Prerequisites

- Knowledge in Machine Learning and Transformer.

- Motivation to learn and research.

- Good coding skills in C++ and Python.

- Project experience in Machine Learning (PyTorch) is a plus.

Contact

xin.su@tum.de

Supervisor:

Xin Su

You can find important information about writing your thesis, giving a talk at LMT as well as templates for Powerpoint and LaTeX here.

To top

Student projects and final year projects at the Chair of Media Technology

Open Thesis

MA, IDP, FP: Collaborative Robotic Grasping during Teleoperation Tasks

Collaborative Robotic Grasping during Teleoperation Tasks

Description

Prerequisites

Contact

Supervisor:

MA, IDP, FP: (Stereo) Depth Estimation in Challenging Conditions on Edge Devices

(Stereo) Depth Estimation in Challenging Conditions on Edge Devices

Description

Supervisor:

MA, IDP, FP: Obstacle Detection and Avoidance Systems Using Meta Aria Smart Glasses

Obstacle Detection and Avoidance Systems Using Meta Aria Smart Glasses

Description

Supervisor:

BA, MA, FP, IP: Adaptive Visual Frame Rate Adjustment in VI-SLAM

Adaptive Visual Frame Rate Adjustment in VI-SLAM

Description

Prerequisites

Contact

Supervisor:

FP, IP: HDR gain map implementation in python

HDR gain map implementation in python

Description

Prerequisites

Contact

Supervisor:

MA, FP: Hand-Object Interaction Reconstruction via Diffusion Model

Hand-Object Interaction Reconstruction via Diffusion Model

Description

Prerequisites

Contact

Supervisor:

MA, FP: Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization

Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization

Description

Prerequisites

Contact

Supervisor:

MA, FP: Multimodal Tactile Data Compression through Shared-Private Representations

Multimodal Tactile Data Compression through Shared-Private Representations

Description

Prerequisites

Contact

Supervisor:

MA, FP: Selfsupervised IMU-Denoising for Visual-Inertial SLAM

Selfsupervised IMU-Denoising for Visual-Inertial SLAM

Description

Prerequisites

Contact

Supervisor: