Information Retrieval in High Dimensional Data
Lecturer: PD Dr. Martin Kleinsteuber
Assistant: Rayyan Khan
Audience: Master
ECTS: 6
Extent: 2/2 (SWS Lecture/Tutorial)
Cycle: Summer semester
Application: Elective course
Prerequisites for admission: -
Time & Place: TUMonline
Start:  

Access

Announcement: Admission Limitation for the Upcoming Lecture „Information Retrieval in High Dimensional Data“ WS 2024/25

Dear Students,

Please be informed that there will be an admission limit of 30 students for the upcoming lecture. Admission will be granted to the top 30 students who pass a multiple-choice test. The test will be online via Moodle and takes place during the first tutorial time slot on October, 17th.

Test Details:

• Duration: 60 minutes

• Content: The test will assess the prerequisites necessary for the lecture, including:

• Fundamentals of Statistics

• Linear Algebra

• Calculus

Only students who demonstrate a solid understanding of these areas will be considered for admission.

We appreciate your understanding and wish you the best of luck in the test.

Contents

From face recognition to gene data analysis, from the problem of analyzing motor sensor data to a concise description of human body motion: Engineers are often faced with the problem of analyzing high dimensional data, i.e. data acquired  from many sensors. The crucial step in retrieving information out of this huge amount of data is to reduce its high dimension in an intelligent way, which is also important for the task of visualizing high dimensional data.
Starting with an overview of applications and a very basic method of dimensionality reduction, the linear principle component analysis, we investigate modern methods and their field of applications.

  • Decisions from Data
  • Curse of Dimensionality
  • From Phenomena to Data
  • Logistic Regression
  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis
  • Support Vector Machines
  • Kernel PCA
  • Feedforward Neural Networks

At the end of the lecture, students understand several state-of-the-art dimensionality reduction and data analysis techniques and are able to implement them into Python.

The application examples will mainly be based on either image processing or natural language processing tasks, since they provide a paradigm for analyzing high dimensional data.

Moreover, during the lab course, students will have the possibility to improve their presentation and teamwork skills. This includes the design of a poster or powerpoint presentation.

Prerequisite: Basic knowledge of linear algebra and statistics as well as basic knowledge in Python (or the motivation to learn it).

Teaching Format

The course consists partially of frontal teaching with black board and beamer slides, but also of discussions and mumble groups to learn new definitions and concepts by means of simple examples.
The tutorials consist of discussing the exercises and programming tasks and supporting the students in solving them. Complementary presentations for mathematical questions are provided if it is required.

Recommended Literature

  • C.C. Aggarwal: Data Mining: The Text Book. Springer 2015.
  • C.M. Bishop: Pattern Recognition and Machine Learning. Springer Science and Business, 2006.
  • J. Izenman: Modern Multivariate Statistical Techniques. Springer 2008.
  • J.A. Lee, M. Verleysen: Nonlinear Dimensionality Reduction, Springer 2007.
  • T. Hastie, R. Tibshirani, J. Friedman: The Elements of Statistical Learning, Springer 2009.

Exam

  • Homework (33%)
  • Written exam (66%)