Collaborative Robotic Grasping during Teleoperation Tasks
Description
This topic focuses on improving robotic grasping networks and advancing embodied intelligence. Grasping is a fundamental capability in robotic manipulation and often plays a decisive role in the overall success of a task. Despite significant progress in learning-based grasping, current models still struggle with generalization and robustness in unstructured environments. Our goal is to enhance the success rate of existing grasping models and deploy them in real-world scenarios, where they can provide intelligent assistance during teleoperation tasks. By leveraging pre-trained grasping networks, we aim to reduce the human operator's workload, increase autonomy, and improve manipulation efficiency in complex and dynamic settings. This work offers a unique opportunity to work at the intersection of perception, control, and learning—pushing the boundaries of what robots can achieve through smarter, more adaptive grasping.
Prerequisites
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Contact
dong.yang@tum.de
(Please attach your CV and transcript)
Supervisor:
(Stereo) Depth Estimation in Challenging Conditions on Edge Devices
Description
This research focuses on enhancing stereo depth estimation techniques to operate effectively under challenging conditions on edge devices. The project aims to develop robust algorithms that can accurately estimate depth information in environments with varying lighting and weather conditions. By optimizing these algorithms for edge devices, the research ensures real-time processing and low-latency responses, which are crucial for portable navigation aids. The effectiveness of these improvements will be validated through a series of experiments, evaluating their performance in real-world scenarios.
Supervisor:
Obstacle Detection and Avoidance Systems Using Meta Aria Smart Glasses
Description
This research focuses on testing and evaluating obstacle detection and avoidance solutions using Meta Aria smart glasses (and other available smart glasses technologies). The project will explore the integration of various detection algorithms and avoidance strategies with these wearable devices to assess their effectiveness in real-world environments.
Supervisor:
Adaptive Visual Frame Rate Adjustment in VI-SLAM
Visual-Inertial SLAM, Robot Navigation
Description
We are looking for a motivated student to join our research on adaptive visual processing in SLAM systems. Building on our recent AFDI-SLAM framework, this project aims to develop more fine-grained frame rate modulation strategies based on motion dynamics, with a focus on decoupling translational and rotational cues, real-time performance, and intelligent scheduling.
Ideal candidates should have experience in computer vision, robotics, or deep learning. Familiarity with SLAM, PyTorch, or CUDA is a plus. This is a great opportunity to contribute to a practical, high-impact system and publish in top multimedia or robotics venues.
Prerequisites
C++ background
Familiarity with SLAM and PyTorch is a plus
Strong Motivation.
Contact
xin.su@tum.de
Supervisor:
HDR gain map implementation in python
HDR, gain map
Description
HDR gain map technology involves storing SDR and gain map within a single image file, utilizing a Gain Map to scale or fully recover the HDR rendition for viewing on displays of different HDR headrooms.
This topic includes an implementation of gain map technology in python, encoding gain map and storing gain maps with their metadata, decoding gain map and recovering HDR rendition.
This topic is suitable for an Ingenieurpraxis (IP) or Forschungspraxis (FP) for a duration of 9 weeks.
Prerequisites
Coding skills of Python and C++ for implementation
Knowledge of image processing and file formats
Availability for 9 weeks.
Contact
Hongjie You (hongjie.you@tum.de)
Supervisor:
Hand-Object Interaction Reconstruction via Diffusion Model
Diffusion Model; Computer Vision
Description
This topic explores the use of diffusion models—an advanced generative AI technique—to reconstruct hand-object interactions from RGB images or videos. By learning the complex dynamics of hand movements and object manipulation, the model generates accurate 3D representations, benefiting applications in augmented reality, robotics, and human-computer interaction.
Prerequisites
- Programming in Python
- Knowledge about Deep Learning
- Knowledge about Pytorch
Contact
xinguo.he@tum.de
Supervisor:
Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization
Description
To achieve better user immersion and interaction fidelity, developing a multimodal tactile codec is necessary. Using correlation to compress multimodal signals into compact latent representations is a key challenge in multimodal codecs. VQ-VAE introduces a discrete latent variable space to achieve efficient coding, and it is promising for extension to multimodal scenarios. This project aims to use multimodal vector quantization to encode multiple tactile signals into a shared latent space.This unified representation will reduce redundancy while preserving important information for reconstruction.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Multimodal Tactile Data Compression through Shared-Private Representations
Description
The Tactile Internet relies on real-time transmission of multimodal tactile data for enhancing user immersion and fidelity. However, most existing tactile codecs are limited to vibrotactile data. They are not able to transmit richer multimodal signals.
This project aims to develop a novel tactile codec that supports multimodal data with a shared-private representation framework. A shared network will extract common semantic information from two modalities, while private networks capture modality-specific features. By sharing the common representations during reconstruction, the codec is expected to reduce the volume of data that needs to be transmitted.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Selfsupervised IMU-Denoising for Visual-Inertial SLAM
Selfsupervised Learning, IMU denoising
Description
In Visual-Inertial SLAM (Simultaneous Localization and Mapping), inertial measurement units (IMUs) are crucial for estimating motion. However, IMU data often contains accumulative noise, which degrades SLAM performance. Self-supervised machine learning techniques can automatically denoise IMU data without requiring labeled datasets. By leveraging self-supervised training, the project aim to explore neural networks distinguish useful IMU signal patterns from noise, improving the accuracy of motion estimation and robustness of Visual-Inertial SLAM systems.
Prerequisites
- Knowledge in Machine Learning and Transformer.
- Motivation to learn and research.
- Good coding skills in C++ and Python.
- Project experience in Machine Learning (PyTorch) is a plus.
Contact
xin.su@tum.de