Some approaches for the automated analysis of video content using machine learning, such as action recognition, work entirely or partially in the compressed domain. This means that instead of decoded RGB images, data (such as motion vectors or DCT coefficients) from the encoded video is used directly. Since the encoded video is in the form of a raw bitstream, it must be parsed to extract the desired data. However, even in recently published work in this area, legacy video codecs are often used. This can presumably be explained on the one hand by the widespread use and lower complexity, but certainly also by the fact that there are no or only a few freely available parsers for modern codecs.
In the offered IDP, you should research whether and which parsers already exist for modern video codecs (preferably AV1) and whether these already extract data that can be used for machine analysis of videos.
Depending on the status and availability of implementations, an efficient parser should be written or extended.
Since the parser will be used in a deep learning context, you will need to provide a Python interface. Optionally, you will also create a dataloader class for PyTorch so that the extracted data can be conveniently loaded for training neural networks.
Requirements:
- Enrolled in a master's degree program in Computer Science
- Strong ability to work autonomously
- Very good C++, C, Python skills
- Sound knowledge in software development
- Fluent in use of SDKs and not-own program code
- Proficient with git and doxygen
- Solid understanding of the concepts of video coding
- Basic understanding of machine and deep learning
The IDP can be done in a team of preferably two students and can start as soon as possible.
To apply, please write an email to p.paukner(at)tum.de with a short motivation (~3 sentences) why the topic is exciting for you personally, as well as a list of practical projects that you have (co-)implemented in your studies, work or privately.