University of Twente Student Theses

Login

Unifying state representation learning with intrinsic motivations in reinforcement learning

Wittenstrom, B.K. (2021) Unifying state representation learning with intrinsic motivations in reinforcement learning.

[img] PDF
4MB
Abstract:Robots are in demand to work in unknown or unpredictable environments such as navigating roads or picking objects from a random pile. Model based control methods cannot work in these environments. Learning based control methods, such as reinforcement learning, do not need accurate models to find good control policies in unknown environments. Reinforcement learning however, suffers from the curse of dimensionality. The computation power needed increases exponentionaly with the size of the robots observation. This problem can be mitigated by filtering out important features in the observation into a low dimensional synthetic state representations. Currently, synthetic state representations are trained using a history of observations and actions gathered using a random action policy. A random action policy does not use the information gathered adjust what states its samples. There is a possibility to improve the training of synthetic state representation by improving the choice of actions used to collect samples to train the synthetic state representation. We trained state representations for an environment with simple consistent visual features, and one with complex distractor features. For each environment, we tested four different training policies random action policy, entropy maximization, prediction error maximization, and uniformsampling. We found different sampling methods can lead to different sampling distributions, depending on the training parameters and environment. The uniformity of coverage is important for complex environments where distractor features in one part of the environment do not generalize to other areas. The uniformity is not important to learn a good structure when the environments features are consistent enough that the SRL can generalize from one area of the environment to another. Finally, The relation between structure of the state representation and the performance of RL policy is complex. In the simple environment better structural scores seems to improve RL performance. In the complex environment this is the opposite case. In the complex environment, clustering of states caused by the distractor features may be disruptive to policy learning. Sampling methods that lead to a more uniform sampling distribution may improve the state representation learning structural performance performance. However, this is only the case of complex environments where generalization is impossible. Finally, what a good structure is for a synthetic state representation is still unknown. This is because, the "improved" structure of the state representation does not necessarily lead to higher RL performance. Therefore more research needs to be done into how the structure of state representations can facilitate RL performance, and how sampling methods can support the learning of those structures.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:52 mechanical engineering
Programme:Electrical Engineering MSc (60353)
Link to this item:https://purl.utwente.nl/essays/88676
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page