IPF1 : Information Processing Factory
About
Autonomous Systems Design must deal with several impairments that arise in designing and deploying complex autonomous systems. Chief among such impairments are:(1) the unexplainability conundrum of AI/ML leading to unbounded behavior, (2) the intractability of verifying the system under all possible use cases, leading to the inability to deal with unexpected situations (e.g., emergent behavior and unknown unknowns, or black swan events), (3) the inability to fully predict the behavior of humans entangled with such systems (e.g., self-driving cars interacting with human-driven ones), (4) the aging and other physical world interaction mechanisms that ultimately affect the system’s operational parameters (e.g., energy, performance, reliability, safely and security) over time, (5) the inaccuracy of the models used at different levels of hierarchy to design the system, and (6) the need to reconcile conflicting operational parameters. The Information processing Factory (IPF) project is a collaboration between research teams in the US (UC Irvine) and Germany (TU Munich and TU Braunschweig) applying self-awareness to the self-management of MPSoCs. This includes (a) self-reflection, i.e., awareness of the MPSoC’s own hardware/software architecture, operational goals, and dynamic changes once deployed; (b) self-prediction of dynamic changes; and (c) self-adaptation to environment changes, optimizing operational parameters and protecting against unexpected situations with increased risk. The IPF paradigm applies principles inspired by factory management to the continuous operation and optimization of highly integrated embedded systems as shown in the corresponding figure. A general objective is to identify a sweet spot between maximized autonomy among IPF constituent components and a minimum of centralized control to ensure guaranteed service even under strict safety and availability requirements. Emphasis is on self-diagnosis for early detection of degradation and imminent failures combined with unsupervised self-adaptation to meet performance and safety targets in a mixed-critical environment.
Our Work
A mixed critical system consists of safety critical tasks and best effort tasks co-existing in the system. The safety critical tasks have strict requirements w.r.t. response time, execution deadline and resource requirements wheas the best effort tasks are required be executed without affecting the safety critical tasks by adhering to constraints. to Our work at TUM focusses on the optimizing the workload execution of best effort tasks in the mixed critical domain. In the IPF project, we developed Learning Classifier Tables (LCTs) which are light-weight, classifier based, human intepretable, hardware implemented and machine learning build blocks inheriting the concepts of learning classifier systems. LCTs are used as low-level controllers for the processing cores (executing best effort tasks) in the system-on-chip executing mixed critical tasks. LCTs learn to adapt and to optimize the operating point of the core in run-time to provide the performance targets required by the best effort tasks (IPS, FPS, response time) while adhering to system constraints (e.g. power budget, temperature). The performance targets and system constraints are dynamically changed in run-time by the best effert controller (BEC) and are reflected in the learning process of the LCT via objective and reward functions. Our main research and work focusses on the following topics related to LCTs:
1. Design of objective and reward function to reflect the performance targets and system constraints
2. Enabling coordination among the LCTs to improve performance.
Different LCTs are used to control the different cores in an multi-core SoC. The LCTs learn to achieve their given goals within specified constraints. However, in certain situations it is beneficial to have a common unified goal and constraint for multiple LCTs. In such situations, the LCTs must learn to obtain their common goal via coordination. More details available in the following article.

3. Classifier generation and evolution using genetic algorithms. (Article)
LCTs are explainable and interpretable classifier based systems. The knowledge learnt by the LCTs is stored as a population of classifiers. An initial population of classifiers can be generated at design time using simulation tools. However, the LCTs are used in runtime to obtain changing goals and constraints. Such a static initial population leads to inefficient learning and control. The classifiers have to be updated in runtime. In this work, we explore using genetic algorithms to modify and update the classifier population in runtime.

4. Experience replay for LCTs. (Article)
Experience replay is a popular strategy in machine learning used to accelarate and improve the learning process. In this work, we propose to extend LCTs with experience replay. LCTs are implemented in hardware and implementing an ER buffer requires additional memory and is expensive. We observe that the entire LCT population is not always 100% occupied based on the progress in learning. We reuse the unused slots in the LCT table as an ER buffer thus increasing the performance of the LCTs while requiring no additional memory.

5. Transfer learning in LCTs for faster learning to adapt to dynamically changing performance targets and constraints.
LCTs are used to learn and adapt to changing goals and constraints via trial and error. The different goals and constraints depict a different reward function. A policy learned by the LCT for a particular reward function might be bad for another reward function. The knowledge in the LCT is explainable and interpretable. This gives us the opportunity to selectively transfer certain knowledge when there is a change in the reward function.
6. Archive based safety mechanism for safe runtime learning. (Article)
LCTs are reinforcement learning agents which learn by trial and error to achieve an IPS target within a power budget. Violation of power budgets might lead to thermal violations which is detrimental for the chip. In this work, we propose a margin zone and a archive based safety mechanism to ensure safe runtime learning by the LCTs. The margin zone is a pre-designed zone within the power budget which the LCTs learn not to exceed. Any violation of the margin zone triggers the archive mechanism which restores the operating point of the LCTs back to the last safe point. More details available in the article.