Open Thesis

Ongoing Thesis

Master's Theses

Real-time registration of noisy, incomplete and partially-occluded 3D pointclouds

Description

This topic is about the registration of 3D pointclouds belonging to certain objects in the scene, rather than about registering different pointclouds of the scene itself.

State-of-the-art (SOTA) pointcloud registration models/algorithms should be first reviewed, and promising candidates should be selected for evaluation based on the criteria listed below.

  • The method must work in real-time (at least 25 frames per second) for at least 5 different objects at the same time.
  • The method must be robust to noise in the  pointclouds. They come from an Intel RealSense D435 RGB+Depth camera.
  • The method must be able to robustly track the objects of interest even if they are occluded partially by other objects.

The best-suited method must then be extended or improved in a novel way or a completely novel method should be developed.

Both classical as well as Deep Learning based methods must be considered.

Related work:

  • DeepGMR: https://github.com/wentaoyuan/deepgmr
  • 3D Object Tracking with Transformer: https://github.com/3bobo/lttr

 

Prerequisites

  • First experiences with 3D data processing / Computer Vision
  • Python programming, ideally also familiarity with C++
  • Familiarity with Linux and the command line

Supervisor:

Rahul Chaudhari

Learning 3D skeleton animations of animals from videos

Description

Under this topic, the student should investigate how to learn 3D animations of skeletons of animals from videos. The 2D skeleton should be extracted first automatically from a video. A state-of-the-art 3D animal shape+pose (SMAL, see references below) model should then be fitted to the skeleton.

References

  • https://smal.is.tue.mpg.de/index.html
  • https://smalr.is.tue.mpg.de/
  • https://github.com/silviazuffi/smalr_online
  • https://github.com/silviazuffi/gloss_skeleton
  • https://github.com/silviazuffi/smalst
  • https://github.com/benjiebob/SMALify
  • https://github.com/benjiebob/SMALViewer
  • https://bmvc2022.mpi-inf.mpg.de/0848.pdf

Dataset

  • https://research.google.com/youtube8m/explore.html
  • https://youtube-vos.org/dataset/vos/
  • https://data.vision.ee.ethz.ch/cvl/youtube-objects/
  • https://blog.roboflow.com/youtube-video-computer-vision/
  • https://github.com/gtoderici/sports-1m-dataset/ (this dataset seems to provide raw videos from YT)
  • https://github.com/pandorgan/APT-36K
  • https://calvin-vision.net/datasets/tigdog/: contains all the videos, the behavior labels, the landmarks, and the segmentation masks for all three object classes (dog, horse, tiger)
  • https://github.com/hellock/WLD (raw videos)
  • https://sutdcv.github.io/Animal-Kingdom/
  • https://sites.google.com/view/animal-pose/

Prerequisites

- Background in Computer Vision, Optimization techniques, and Deep Learning

- Python programming

Supervisor:

Rahul Chaudhari

Interactive story generation with visual input

Description

Conventional stories for children of ages 3—6 years are static, independent of the medium (text, video, audio). We aim to make stories interactive, by giving the user control over characters, objects, scenes, and timing. This will lead to the construction of novel, unique, and personalized stories situated in (partially) familiar environments. We restrict this objective to specific domains consisting of a coherent body of works, such as the children’s book series “Meine Freundin Conni”. The challenges in this thesis include finding a suitable knowledge representation for the domain, learning that representation automatically, and inferring a novel storyline over that representation with active user interaction. In this direction, both neural as well as symbolic approaches should be explored.

So far we have implemented a text-based interactive story generation system based on Large Language Models. In this thesis, the text input modality should be replaced by visual input. In particular, the story should be driven by real-world motion of figurines and objects, rather than an abstract textual description of the scene and its dynamics.

 

Prerequisites

- First experiences with 2D/3D Computer Vision and Computer Graphics

- Familiarity with AI incl. Deep Learning (university courses / practical experience)

- Programming in Python

 

Supervisor:

Rahul Chaudhari

Deep Learning models for zero-shot object detection and segmentation

Description

In the world of computer vision, data labeling holds immense significance for training powerful machine learning models. Accurate annotations provide the foundation for teaching algorithms to understand visual information effectively. However, data labeling in computer vision poses unique challenges, including the complexity of visual data, the need for precise annotations, and handling large-scale datasets. Overcoming these challenges is crucial for enabling computer vision systems to extract valuable insights, identify objects, and revolutionize a wide range of industries.

Therefore, the development of automatic annotation pipelines for 2D and 3D labeling in various tasks is crucial, leveraging recent advancements in computer vision to enable automatic, efficient and accurate labeling of visual data.

This master thesis will focus on automatically labeling images and videos, and specifically generating 2D/3D labels (i.e., 2D/3D bounding boxes and segmentation masks). The automatic labeling pipeline has to generalize to any type of images and videos such as, household objects, toys, indoor/outdoor environments, etc.

The automatic labeling pipeline will be developed based on zero-shot detection and segmentation models suchGroundingDINO andsegment-anything, in addition to similar methods (seeAwesome Segment Anything). Additionally, the labeling pipeline including the used models will be implemented in theautodistill code base and the performance will be tested by training and evaluating some smaller target models for specific tasks.

Sub-tasks:

?     Automatic generation of 2D labels for images and videos, such as 2D bounding boxes and segmentation masks (seeGrounded-Segment-Anything andsegment-any-moving,Segment-and-Track-Anything).

?     Automatic generation of 3D labels for images and videos, such as 3D bounding boxes and segmentation masks (see3D-Box-Segment-Anything,SegmentAnything3D,segment-any-moving,Segment-and-Track-Anything).

?     Implement a 2D/3D labeling tool to modify and improve the automatic 2D/3D labels (seeDLTA-AI)

?     The automatic labeling pipeline in addition to the used base models and some target models have to be implemented in theautodistill code base to enable an easy end-to-end labeling, training, and deployment for various tasks such as 2D/3D object detection, segmentation.

?     Comprehensive overview of the performance and limitation of the current zero-shot models for the use of automatic labeling for tasks such as 2D/3D object detection, segmentation.

?     Suggestion of future works to overcome the limitation of the used methods

Bonus tasks:

?     Adding image augmentation and editing methods to the labeling pipeline and tool to generate more data (seeEditAnything)

?     Implement one-shot labeling methods to generate labels for unique objects (seePersonalize-SAM andMatcher)

Prerequisites

Interest and first experiences in Computer Vision, Deep Learning, Python programming, 3D data.

Supervisor:

Rahul Chaudhari

VR-based 3D synthetic data generation for interactive Computer Vision tasks

Description

Under this topic, the student will extend our existing VR-based synthetic data generation tool for Hand-Object interactions. Furthermore, the student will generate synthetic data using this tool and evaluate state-of-the-art Computer Vision and Deep Learning models for tracking Hand-Object Interactions in 3D.

Prerequisites

  • Strong familiarity with Python programming
  • Interest and first experiences in Computer Graphics, VR, Computer Vision, and Deep Learning.
  • Ideally also interest and experience in Blender 3D software

Supervisor:

Rahul Chaudhari

iOS app for tracking objects using RGB and depth data

Description

This topic is about the development of an iPhone app for tracking objects in the environment using data from the device's RGB and depth sensors.

Prerequisites

  • Good programming experience with C++ and Python
  • Ideally, experience building iOS apps with SWIFT and/or Unity ARFoundation
  • This topic is only suitable for you if you have a recent personal mac development device (ideally at least a MacBook Pro with Apple Silicon M1) and at least an iPhone 12 Pro with a LiDAR depth sensor

Supervisor:

Rahul Chaudhari

Student Assistant Jobs

HiWi / Working Student for Blender tasks

Keywords:
3D, blender, python
Short Description:
This is a working student position for a variety of tasks in the Blender environment: 3D modelling of characters / objects, character rigging, animation, interactive rendering, etc. Part of the job is automate certain workflows or tasks in blender using the Blender Python API.

Description

This is a working student position for a variety of tasks in the Blender environment:

  • 3D modelling of characters / objects,
  • character rigging,
  • animation,
  • interactive rendering, etc.
  • Part of the job is automate certain workflows or tasks in blender using the Blender Python API.

 

Prerequisites

  • Strong interest in 3D Computer Graphics and Gaming.
  • Very strong familiarity with Blender
  • Comfortable programming in Python
  • Ideally: also familiarity with development environments on Linux and windows.

Please send a description of your interest and experience regarding the above points together with your application.

Contact

https://www.ce.cit.tum.de/lmt/team/mitarbeiter/chaudhari-rahul/

Supervisor:

Rahul Chaudhari