Scene Graph-based Real-time Scene Understanding for Assistive Robot Manipulation Task
Description
With the rapid development of embodied intelligent robots, real-time and accurate scene understanding is crucial for robots to complete tasks efficiently and effectively. Scene graphs represent objects and their relations in a scene via a graph structure. Previous studies have generated scene graphs from images or 3D scenes, also with the assistance of large language models (LLMs).
In this work, we investigate the application of scene graphs in assisting the human operator during the teleoperated manipulation task. Leveraging real-time generated scene graphs, the robot system can obtain a comprehensive understanding of the scene and also reason the best solution to complete the manipulation task based on the current robot state.
Prerequisites
- Good Programming Skills (Python, C++)
- Knowledge about Ubuntu/Linux/ROS
- Motivation to learn and conduct research
Contact
dong.yang@tum.de
(Please attach your CV and transcript)
Supervisor:
DT-based Human-robot Teleoperation with Haptic Codecs Standard
Digital Twin, Teleoperation, Haptic Codecs Standard
Our project aims to build a DT-based human-robot teleoperation with haptic codecs standard and multiple sensors under a Linux system.
Description
For the system, the main achievements should be:
1. A completed human-in-the-loop haptic teleoperation: You should port the teleoperation system, which currently interacts with Unity on Windows, to a Linux system using the Robot Operating System (ROS). You can use Gazebo to create a remote environment. It should contain a robotic arm (the follower device) and an operational platform to simulate a real remote environment. You will use a Phantom device as the leader device to manipulate the virtual robot arm to gather information during the interaction to explore the environment updates, such as adding a new object, thus building a Digital Twin (DT) in the virtual environment on the leader side.
2. Multiple sensors for data collection on the Follower side: You should use visual and haptic devices to collect environment-update data and complete the environment restoration. Visual information is usually captured using 2D and depth cameras, and haptic information is expressed by the remote position and force feedback.
3. Haptic codecs for data transmission: The transmission of velocity, position, visual, and haptic information needs to follow the Haptic Codecs Standard.
4. Optional function: Plug-and-Play: When a haptic device is temporarily disconnected and reconnected, the teleoperation system should automatically restore normal operations, resuming synchronization between both sides. For both the leader side and the follower side, the detection of disconnection and the resumption of reconnection should be designed.
Prerequisites
Our requirements (preferably should have):
Familiarity with teleoperation systems, Linux systems, and visual and haptic sensors.
A good understanding of ROS (Robot Operating System).
Supervisor:
Radar-based Material Classification
signal processing, machine learning, material classification
Description
The work focuses on radar-based material classification. Due to the rapid development of autonomous driving technology, drones, home robots, and various smart devices in recent years, material sensing has received more attention. Millimeter-wave radar has been widely installed on these platforms due to its low price and robustness in harsh environments. Therefore, in this work, we will study methods for classifying some common indoor materials using millimeter wave radar signals.
In this work, we will collect radar signals from some common indoor materials such as wood, metal, glass, etc. After obtaining the required features through radar signal processing methods, we will use some machine learning algorithms to classify the materials.
Prerequisites
Programming in Python
Knowledge about Machine Learning
Knowledge about signal processing, particularly on radar signal processing
Contact
mengchen.xiong@tum.de
(Please attach your CV and transcript)
Supervisor:
Equivariant 3D Object Detection
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments
Description
The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality.
The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].
References
[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).
Prerequisites
- Python and Git
- Experience with a deep learning framework (Pytorch, Tensorflow)
- Interest in Computer Vision and Machine Learning
Supervisor:
3D Hand-Object Reconstruction from monocular RGB images
Computer Vision, Hand-Object Interaction
Description
Understanding human hand and object interaction is fundamental for meaningfully interpreting human action and behavior.
With the advent of deep learning and RGB-D sensors, pose estimation of isolated hands or objects has made significant progress.
However, despite a strong link to real applications such as augmented and virtual reality, joint reconstruction of hand and object has received relatively less attention.
This task focuses on accurately reconstructing hand-object interactions in three-dimensional space, given a single RGB image.
Prerequisites
- Programming in Python
- Knowledge about Deep Learning
- Knowledge about Pytorch
Contact
xinguo.he@tum.de