Someone gives you a stick and your task is to balance this stick upright with your flat hand, without holding the stick itself. For most humans this is easy to achieve after a short period of training. Robots without a concise mathematical model of the exact stick dynamics have a hard time learning this simple balancing task.
In this work a cart controller is to be taught to balance a pole mounted on top using Reinforcement Learning. The experiments are to be implemented in a simulation environment.
In Reinforcement Learning (RL) applications one often has to deal with high-dimensional state or feature spaces. For example in continuous environments it is no longer possible to explicitly represent the value function in a tabular form. Value function approximation helps to cope with this problem.
Common forms of this approximation techniques are linear approximation or approximation with the help of Gaussian Processes (GP). This leads to algorithms such as Gaussian Process Temporal Difference (GPTD) learning. In environments where a lot of data is available computational feasibility can be achieved via imposing sparsity on the data. Such an approach can be found in the Relevance Vector Machine (RVM), which has embedded sparsity properties.
The focus of this work is to implement GPTD and RVM-TD on simulated environments. Further, different techniques of imposing sparsity into the model are to be implemented and compared.
Further Information:
The work can be done as Bachelor-, Master-Thesis or Interdisciplinary Project (IDP). If interested, please contact Dominik Meyer.