Decentralized Federated Learning on Constrained IoT Devices
Description
The Internet of Things (IoT) is an increasingly prominent aspect of our daily lives, with connected devices offering unprecedented convenience and efficiency. As we move towards a more interconnected world, ensuring the privacy and security of data generated by these devices is paramount. That is where decentralized federated learning comes in.
Federated Learning (FL) is a machine-learning paradigm that enables multiple parties to collaboratively train a model without sharing their data directly. This thesis focuses on taking FL one step further by removing the need for a central server, allowing IoT devices to directly collaborate in a peer-to-peer manner.
In this project, you will explore and develop decentralized federated learning frameworks specifically tailored for constrained IoT devices with limited computational power, memory, and energy resources. The aim is to design and implement efficient algorithms that can harness the collective power of these devices while ensuring data privacy and device autonomy. This involves tackling challenges related to resource-constrained environments, heterogeneous device capabilities, and maintaining security and privacy guarantees.
The project offers a unique opportunity to contribute to cutting-edge research with real-world impact. Successful outcomes will enable secure and private machine learning on IoT devices, fostering new applications in areas such as smart homes, industrial automation, and wearable health monitoring.
Responsibilities:
- Literature review on decentralized federated learning, especially in relation to IoT and decentralized systems.
- Design and development of decentralized FL frameworks suitable for constrained IoT devices.
- Implementation and evaluation of the proposed framework using real-world datasets and testbeds.
- Analysis of security and privacy aspects, along with resource utilization.
- Documentation and presentation of findings in a thesis report, possibly leading to publications in top venues.
Requirements:
- Enrollment in a Master's program in Computer Engineering, Computer Science, Electrical Engineering or related fields
- Solid understanding of machine learning algorithms and frameworks (e.g., TensorFlow, PyTorch)
- Proficiency in C and Python programming language
- Experience with IoT devices and embedded systems development
- Excellent analytical skills and a systematic problem-solving approach
Nice to Have:
- Knowledge of cybersecurity and privacy principles
- Familiarity with blockchain or other decentralized technologies
- Interest in distributed computing and edge computing paradigms
Contact
Email: navid.asadi@tum.de
Supervisor:
Attacks on Cloud Autoscaling Mechanisms
Cloud Computing, Kubernetes, autoscaling, low and slow attacks, Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), cloud security, container orches
Description
In the era of cloud-native computing, Kubernetes has emerged as a leading container orchestration platform, enabling seamless scalability and reliability for modern applications.
However, with its widespread adoption comes a new frontier in cybersecurity challenges, particularly low and slow attacks that exploit autoscaling features to disrupt services subtly yet effectively.
This project aims to delve into the intricacies of these attacks, examining their impact on Kubernetes' Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA), and proposing mitigation strategies for more resilient systems.
Responsibilities:
- Conduct a thorough literature review to identify existing knowledge gaps and research on similar attacks.
- Develop methodologies to simulate low and slow attack scenarios on Kubernetes clusters with varying configurations of autoscaling mechanisms.
- Analyze the impact of these attacks on resource utilization, service availability, and overall system performance.
- Evaluate current defense mechanisms and propose novel strategies to enhance the resilience of Kubernetes' autoscaling features.
- Implement and test selected mitigation approaches in a controlled environment.
- Document findings, present a comparative analysis of effectiveness, and discuss implications for future development in cloud security practices.
Requirements:
- A strong background in computer engineering, computer science or a related field.
- Familiarity with Kubernetes architecture and container orchestration concepts.
- Experience in deploying and managing applications on Kubernetes clusters.
- Proficiency in at least one scripting/programming language (e.g., Python, Go).
- Understanding of cloud computing and cybersecurity fundamentals.
Nice to Have:
- Prior research or hands-on experience in cloud security, particularly in the context of Kubernetes.
- Knowledge of network protocols and low-level system interactions.
- Experience with DevOps tools and practices.
Contact
Email: navid.asasdi@tum.de
Supervisor:
Working Student/Research Internship - On-Device Training on Microcontrollers
Description
We are seeking a highly motivated and skilled student to replicate a research paper that explores the application of pruning techniques for on-device training on microcontrollers. The original paper demonstrated the feasibility of deploying deep neural networks on resource-constrained devices, and achieved significant reductions in model size and computational requirements while maintaining acceptable accuracy.
Responsibilities:
- Extend our existing framework by implementing the pruning techniques on a microcontroller-based platform (e.g., Arduino, ESP32)
- Replicate the experiments described in the original paper to validate the results
- Evaluate the performance of the pruned models on various benchmark datasets
- Compare the results with the original paper and identify areas for improvement
- Document the replication process, results, and findings in a clear and concise manner
Requirements:
- Strong programming skills in C and Python
- Experience with deep learning frameworks (e.g., TensorFlow, PyTorch) and microcontroller-based platforms
- Familiarity with pruning techniques for neural networks is a plus
- Excellent analytical and problem-solving skills
- Ability to work independently and manage time effectively
- Strong communication and documentation skills
Contact
Email: navid.asadi@tum.de
Supervisor:
Working Student - Machine Learning Serving on Kubernetes
Machine Learning, Kubernetes, Containerization, Docker, Orchestration, Cloud Computing, MLOps, Machine Learning Operations, DevOps, Microservices Architecture,
Description
We are seeking an ambitious and forward-thinking working student to join our dynamic team working at the intersection of Machine Learning (ML) and Kubernetes. In this exciting role, you will be immersed in a cutting-edge environment where advanced ML models meet the power of container orchestration through Kubernetes. Your contributions will directly impact the development and optimization of scalable and robust ML serving systems leveraging the benefits of Kubernetes.
If you are a student passionate about both Machine Learning and Kubernetes, we invite you to join us on this exciting journey! We offer the chance to pioneer cutting-edge solutions that leverage the power of these two transformative technologies.
Responsibilities:
- Collaborate with a cross-functional team to design and implement ML workflows on Kubernetes.
- Assist in packaging and deploying ML models as microservices using containers (Docker) and managing them effectively through Kubernetes.
- Optimize resource allocation, scheduling, and scaling strategies for efficient model serving at varying workloads.
- Implement monitoring solutions specific to ML inference tasks within the Kubernetes cluster.
- Troubleshoot and debug issues related to containerized ML applications
- Document best practices, tutorials, and guides on leveraging Kubernetes for ML serving
Requirements:
- Currently enrolled in a Bachelor's or Master's program in School of CIT
- Strong programming skills in Python with experience in software development lifecycle methodologies.
- Familiarity with machine learning frameworks such as TensorFlow and PyTorch.
- Proficiency in container technologies. Docker and Kubernetes certification would be a plus but not mandatory.
- Experience with cloud computing platforms; e.g., AWS, GCP or Azure.
- Demonstrated ability to work independently with effective time management and strong problem-solving analytical skills.
- Excellent communication and teamwork capabilities.
Nice to Have:
- Kubernetes Certification: Having a valid Kubernetes certification (CKA, CKAD, or CKE) demonstrates your expertise in container orchestration and can be a significant advantage.
- Experience with DevOps and/or MLOps Tools: Familiarity with MLOps tools such as MLflow, Kubeflow, or TensorFlow Extended (TFX) can help you streamline the machine learning workflow and improve collaboration. Experience with OpenTelemetry, Jaeger, Istio, and monitoring tools is a plus.
- Knowledge of Distributed Systems: Understanding distributed systems architecture and design patterns can help you optimize the performance and scalability of your machine learning models.
- Contributions to Open-Source Projects: Having contributed to open-source projects related to Kubernetes, machine learning, or MLOps demonstrates your ability to collaborate with others and adapt to new technologies.
- Familiarity with Agile Methodologies: Knowledge of agile development methodologies such as Scrum or Kanban can help you work efficiently in a fast-paced environment and deliver results quickly.
- Cloud-Native Application Development: Experience with cloud-native application development using frameworks like Cloud Foundry or AWS Cloud Development Kit (CDK) can be beneficial in designing scalable and efficient machine learning workflows.
Contact
Email: navid.asadi@tum.de
Supervisor:
Working Student for the Edge AI Testbed
IoT, Edge Computing, Machine Learning, Measurement, Power Characterization
Description
We are seeking a highly motivated and enthusiastic Working Student to join our team as part of the Edge AI Testbed project. As a Working Student, a key member of our research team, you will contribute to the development and testing of cutting-edge Artificial Intelligence (AI) systems at the edge of the network. You will work closely with our researchers and engineers to design, implement, and evaluate innovative AI solutions that can operate efficiently on resource-constrained edge devices.
Responsibilities:
- Assist in designing and implementing AI models for edge computing
- Develop and test software components for the Edge AI Testbed
- Collaborate with team members to integrate AI models with edge hardware platforms
- Participate in performance optimization and evaluation of AI systems on edge devices
- Contribute to the development of tools and scripts for automated testing and deployment
- Document and report on project progress, results, and findings
If you are a motivated and talented student looking to gain hands-on experience in Edge AI, we encourage you to apply for this exciting opportunity!
Requirements:
- Currently enrolled in a Bachelor's or Master's program in School of CIT
- Strong programming skills in languages such as Python and C++
- Experience with AI frameworks such as TensorFlow, PyTorch, or Keras
- Familiarity with edge computing platforms and devices (e.g., Raspberry Pi, NVIDIA Jetson)
- Basic knowledge of Linux operating systems and shell scripting
- Excellent problem-solving skills and ability to work independently
- Strong communication and teamwork skills
Nice to Have:
- Experience with containerization using Docker
- Familiarity with cloud computing platforms (e.g., Kubernetes)
- Experience with Apache Ray
- Knowledge of computer vision or natural language processing
- Participation in open-source projects or personal projects related to AI and edge computing
Contact
Email: navid.asadi@tum.de
Supervisor:
An AI Benchmarking Suite for Microservices-Based Applications
Kubernetes, Deep Learning, Video Analytics, Microservices
Description
In the realm of AI applications, the deployment strategy significantly impacts performance metrics.
This research internship aims to investigate and benchmark AI applications in two predominant deployment configurations: monolithic and microservices-based, specifically within Kubernetes environments.
The central question revolves around understanding how these deployment strategies affect various performance metrics and determining the more efficient configuration. This inquiry is crucial as the deployment strategy plays a pivotal role in the operational efficiency of AI applications.
Currently, the field lacks a comprehensive benchmarking suite that evaluates AI applications from an end-to-end deployment perspective. Our approach includes the development of a benchmarking suite tailored for microservice-based AI applications.
This suite will capture metrics such as CPU/GPU/Memory utilization, interservice communication, end-to-end and per-service latency, and cache misses.
Requirements:
- Familiarity with Kubernetes
- Familiarity with Deep Learning frameworks (e.g., PyTorch or TensorFlow)
- Basics of computer networking
Contact
Email: navid.asadi@tum.de
Supervisor:
Performance Evaluation of Serverless Frameworks
Serverless, Function as a Service, Machine Learning, Distributed ML
Description
Serverless computing is a cloud computing paradigm that separates infrastructure management from software development and deployment. It offers advantages such as low development overhead, fine-grained unmanaged autoscaling, and reduced customer billing. From the cloud provider's perspective, serverless reduces operational costs through multi-tenant resource multiplexing and infrastructure heterogeneity.
However, the serverless paradigm also comes with its challenges. First, a systematic methodology is needed to assess the performance of heterogeneous open-source serverless solutions. To our knowledge, existing surveys need a thorough comparison between these frameworks. Second, there are inherent challenges associated with the serverless architecture, specifically due to its short-lived and stateless nature.
Requirements:
- Familiarity with Kubernetes
- Basics of computer networking
Contact
Email: navid.asadi@tum.de
Supervisor:
Distributed Deep Learning for Video Analytics
Distributed Deep Learning, Distributed Computing, Video Analytics, Edge Computing, Edge AI
Description
In recent years, deep learning-based algorithms have demonstrated superior accuracy in video analysis tasks, and scaling up such models; i.e., designing and training larger models with more parameters, can improve their accuracy even more.
On the other hand, due to strict latency requirements as well as privacy concerns, there is a tendency towards deploying video analysis tasks close to data sources; i.e., at the edge. However, compared to dedicated cloud infrastructures, edge devices (e.g., smartphones and IoT devices) as well as edge clouds are constrained in terms of compute, memory and storage resources, which consequently leads to a trade-off between response time and accuracy.
Considering video analysis tasks such as image classification and object detection as the application at the heart of this project, the goal is to evaluate different deep learning model distribution techniques for a scenario of interest.
Contact
Email: navid.asadi@tum.de
Supervisor:
Edge AI in Adversarial Environment: A Simplistic Byzantine Scenario
Distributed Deep Learning, Distributed Computing, Byzantine Attack, Adversarial Inference
Description
This project considers an environment consisting of several low performance machines which are connected together across a network.
Edge AI has drawn the attention of both academia and industry as a way to bring intelligence to edge devices to enhance data privacy as well as latency.
Prior works investigated on improving accuracy-latency trade-off of Edge AI by distributing a model into multiple available and idle machines. Building on top of those works, this project adds one more dimension: a scenario where $f$ out of $n$ contributing nodes are adversary.
Therefore, for each data sample an adversary (1) may not provide an output (can also be considered as a faulty node.) or (2) may provide an arbitrary (i.e., randomly generated) output.
The goal is to evaluate robustness of different parallelism techniques in terms of achievable accuracy in presence of malicious contributors and/or faulty nodes.
Note that contrary to the mainstream existing literature, this project mainly focuses on the inference (i.e., serving) phase of deep learning algorithms, and although robustness of the training phase can be considered as well, it has a much lower priority.
Contact
Email: navid.asadi@tum.de
Supervisor:
On the Efficiency of Deep Learning Parallelism Schemes
Distributed Deep Learning, Parallel Computing, Inference, AI Serving
Description
Deep Learning models are becoming increasingly larger so that most of the state-of-the-art model architectures are either too big to be deployed on a single machine or cause performance issues such as undesired delays.
This is not only true for the largest models being deployed in high performance cloud infrastructures but also for smaller and more efficient models that are designed to have fewer parameters (and hence, lower accuracy) to be deployed on edge devices.
That said, this project considers the second environment where there are multiple resource constrained machines connected through a network.
Continuing the research towards distributing deep learning models into multiple machines, the objective is to generate more efficient variants/submodels compared to existing deep learning parallelism algorithms.
Note that this project mainly focuses on the inference (i.e., serving) phase of deep learning algorithms, and although efficiency of the training phase can be considered as well, it has a much lower priority.
Contact
Email: navid.asadi@tum.de
Supervisor:
Optimizing Distributed Deep Learning Inference
deep learning, distributed systems, parallel computing, model parallelism, communication overhead reduction, performance evaluation, edge devices
Description
The rapid growth in size and complexity of deep learning models has led to significant challenges in deploying these architectures across resource-constrained machines interconnected through a network. This research project focuses on optimizing the deployment of deep learning models at the edge, where limited computational resources and high-latency networks hinder performance. The main objective is to develop efficient distributed inference techniques that can overcome the limitations of edge devices, ensuring real-time processing and decision-making.
The successful candidate will work on addressing the following challenges:
- Employing model parallelism techniques to distribute workload across compute nodes while minimizing communication overhead associated with exchanging intermediate tensors between nodes.
- Reducing inter-operator blocking to improve overall system throughput.
- Developing efficient compression techniques tailored for deep learning data exchanges to minimize network latency.
- Evaluating the performance of proposed modifications using standard deep learning benchmarks and real-world datasets.
Responsibilities:
- Implement and evaluate various parallelism techniques, such as model parallelism and variant parallelism, from a communication efficiency perspective.
- Identify and implement mechanisms to minimize the exchange of intermediate tensors between compute nodes, potentially using advanced compression techniques tailored for deep learning data exchanges.
- Conduct comprehensive performance evaluations of proposed modifications using standard deep learning benchmarks and real-world datasets. Assess improvements in latency, resource efficiency, and overall system throughput compared to baseline configurations.
- Write technical reports and publications detailing the research findings.
Requirements:
- Pursuing a Master's degree in School of CIT
- Strong background in deep learning, distributed systems, and parallel computing.
- Proficiency in Python and experience with deep learning frameworks (e.g., TensorFlow, PyTorch).
- Excellent problem-solving skills and the ability to work independently and collaboratively as part of a team.
- Strong communication and writing skills for technical reports and publications.
Contact
Email: navid.asadi@tum.de