Equivariant 3D Object Detection
3D Object Detection, Computer Vision, Deep Learning, Indoor Environments
Description
The thesis focuses on the application of equivariant deep learning techniques for 3D object detection in indoor scenes. Indoor environments, such as homes, offices, and industrial settings, present unique challenges for 3D object detection due to diverse object arrangements, varying lighting conditions, and occlusions. Traditional methods often struggle with these complexities, leading to suboptimal performance. The motivation for this research is to enhance the robustness and accuracy of 3D object detection in these environments, leveraging the inherent advantages of equivariant deep learning. This approach aims to improve the model's ability to recognize objects regardless of their orientation and position in the scene, which is crucial for applications in robotics, or augmented reality.
The thesis proposes the development of a deep learning model that incorporates equivariant neural networks for 3D object detection, such as the equivariant framework proposed in [1]. The proposed model will be evaluated on a benchmark 3D indoor dataset, such as the Stanford 3D Indoor Spaces Dataset (S3DIS) or the ScanNet dataset [2, 3].
References
[1] Deng, Congyue, et al. "Vector neurons: A general framework for so (3)-equivariant networks." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
[2] Dai, Angela, et al. "Scannet: Richly-annotated 3d reconstructions of indoor scenes." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Straub, Julian, et al. "The Replica dataset: A digital replica of indoor spaces." arXiv preprint arXiv:1906.05797 (2019).
Prerequisites
- Python and Git
- Experience with a deep learning framework (Pytorch, Tensorflow)
- Interest in Computer Vision and Machine Learning