Optimizing Multimodal Tactile Codecs with Cross-Modal Vector Quantization
Description
To achieve better user immersion and interaction fidelity, developing a multimodal tactile codec is necessary. Using correlation to compress multimodal signals into compact latent representations is a key challenge in multimodal codecs. VQ-VAE introduces a discrete latent variable space to achieve efficient coding, and it is promising for extension to multimodal scenarios. This project aims to use multimodal vector quantization to encode multiple tactile signals into a shared latent space.This unified representation will reduce redundancy while preserving important information for reconstruction.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Multimodal Tactile Data Compression through Shared-Private Representations
Description
The Tactile Internet relies on real-time transmission of multimodal tactile data for enhancing user immersion and fidelity. However, most existing tactile codecs are limited to vibrotactile data. They are not able to transmit richer multimodal signals.
This project aims to develop a novel tactile codec that supports multimodal data with a shared-private representation framework. A shared network will extract common semantic information from two modalities, while private networks capture modality-specific features. By sharing the common representations during reconstruction, the codec is expected to reduce the volume of data that needs to be transmitted.
Prerequisites
- knowledge in deep learning
- programming skills (python)
- motivation in research
Contact
wenxuan.wei@tum.de
Supervisor:
Context-based 3D Animations of Vehicles, Human and Animal Figurines
Description
The goal of this thesis is to animate 3D objects such as vehicles, humans, and animals based on multimodal contextual information.
A simple example: real-world 3D trajectory data of the object can be used to classify whether a given object is moving or idle. Based on the classification result, the corresponding animation is played on the object -- a breathing animation if the object is idle, and a walking/running animation if the object is in motion.
This idea can be extended further to produce more complex animations. For example, if a dog gets wet due to rain in an evolving story, the subsequent animation produced should be "shaking off water from the body".
Possible steps include:
- Using our Large Language Model based system to generate novel animations.
- Designing and evaluating a novel Machine Learning model that decides which animation to play based on 3D trajectory of the objects, semantic and geometric configuration of the 3D scenegraph, user input, and the context of an evolving story. The 3D trajectory can be obtained from our already operational pose tracking system.
Prerequisites
- Working knowledge of Blender
- Python
- Initial experience in training and evaluation of Machine Learning models
Supervisor:
3D Scene Navigation Using Free-hand Gestures
3D, Blender, Python, hand tracking, gesture recognition
Description
The goal of this bachelor thesis project is to design and evaluate a 3D scene navigation system based on free-hand gestures.
Possible steps include:
- Modeling a 3D world in Blender (an existing pre-desgined world may also be used e.g. from sketchfab)
- Designing a distinct set of hand gestures that allows comprehensive navigation of the 3D world (i.e. to control camera translation and rotation based on hand gestures). It should be possible for the user to navigate to any place in the 3D world quickly, efficiently, and intuitively.
- The Google mediapipe framework can be used to detect and track hand keypoints. On top of that, a novel gesture recognition model should be trained and evaluated.
- Comparing, contrasting, and benchmarking the performance of this system against the standard keyboard+mouse-based navigation capabilities offered by Blender.
Start date: 01.04.2025
Prerequisites
- Working knowledge of Blender and python
- Interest in 3D worlds and human-computer interaction
Supervisor:
Selfsupervised IMU-Denoising for Visual-Inertial SLAM
Selfsupervised Learning, IMU denoising
Description
In Visual-Inertial SLAM (Simultaneous Localization and Mapping), inertial measurement units (IMUs) are crucial for estimating motion. However, IMU data often contains accumulative noise, which degrades SLAM performance. Self-supervised machine learning techniques can automatically denoise IMU data without requiring labeled datasets. By leveraging self-supervised training, the project aim to explore neural networks distinguish useful IMU signal patterns from noise, improving the accuracy of motion estimation and robustness of Visual-Inertial SLAM systems.
Prerequisites
- Knowledge in Machine Learning and Transformer.
- Motivation to learn and research.
- Good coding skills in C++ and Python.
- Project experience in Machine Learning (PyTorch) is a plus.
Contact
xin.su@tum.de
Supervisor:
Scene Graph-based Real-time Scene Understanding for Assistive Robot Manipulation Task
Description
With the rapid development of embodied intelligent robots, real-time and accurate scene understanding is crucial for robots to complete tasks efficiently and effectively. Scene graphs represent objects and their relations in a scene via a graph structure. Previous studies have generated scene graphs from images or 3D scenes, also with the assistance of large language models (LLMs).
In this work, we investigate the application of scene graphs in assisting the human operator during the teleoperated manipulation task. Leveraging real-time generated scene graphs, the robot system can obtain a comprehensive understanding of the scene and also reason the best solution to complete the manipulation task based on the current robot state.
Prerequisites
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Contact
dong.yang@tum.de
(Please attach your CV and transcript)