University of Twente Student Theses

Login

State representation learning using a graph neural network in a 2D grid world

Wettum, Y.C. van (2021) State representation learning using a graph neural network in a 2D grid world.

[img]

PDF
8MB

Abstract:	Reinforcement learning algorithms have shown great success in solving complicated robotics tasks. These tasks often involve multiple sensors which generate a high dimensional sensory input, i.e. observation. Learning the optimal policy from the high dimensional observation directly often requires processing large amounts of data. State representation learning aims to map the high dimensional observation to a lower dimensional state space, in order to reduce training time and the required amount of data. The ability to reason in terms of objects, relations and actions is an important aspect of human cognition (Spelke and Kinzler, 2007). This ability serves as a core motivation behind recent works that aim to incorporate relational reasoning in machine learning models (Battaglia et al., 2016; Kipf et al., 2018; Xu et al., 2019). A graph neural network has shown to be a powerful general framework for reasoning about objects and relations. In this work, the goal is to effectively use a graph neural network in state representation learning. The focus is on a navigation task in a 2D deterministic grid world environment. This environment has clear objects which can be encoded as a graph. Moreover, it is clear what the learned state representation should look like in terms of objects (a 2d grid structure for each moving object). In order to learn a state representation, an observation and the next observation are encoded into a state representation by using an encoder network. A neural network (i.e. transition model) is trained to predict the transition between the state representations. By simultaneously training an encoder network and a transition model in latent space, a state representation is learned. The transition model is implemented using a graph neural network in order to learn a state representation. The comparison is made with a conventional neural network. First, the problem is formulated as a supervised learning problem, where the states are manually encoded. This way, the encoding step is excluded and the focus is on learning the transition model. It was shown that the learned transition model and reward model can be used to plan the optimal policy for maze configurations seen during training. However, training a transition and reward model on randomized maze configurations showed to be problematic. The graph neural network was not able to learn the reward function. Moreover, both transition models did not show to be precise for unseen maze configurations. Nevertheless, the graph network showed to handle transition predictions for unseen maze configurations slightly better compared to the conventional neural network. Both learned environment models were not sufficiently accurate to be used for planning in unseen maze configurations. Secondly, the transition and reward model are trained in an unsupervised setting. A contrastive loss function is added with negative sampling in order learn expressive state representations (Kipf et al., 2020b; van der Pol et al., 2020). Furthermore, a dense reward function is used to make the reward function more informative. The graph neural network was able to decouple the objects in the scene, while the conventional neural network was not able to do this. The state representation of the conventional neural network still adhered to the grid like structure of the problem. For both the graph neural network and conventional neural network, the learned state representation and environment model were used to successfully plan the optimal policy in latent space. The graph neural network allows to learn expressive representations for each object in the scene, which is not possible using a conventional neural network. For simple environments, there will be no benefit in using a graph neural network as opposed to a neural network to learn a state representation from raw observations.
Item Type:	Essay (Master)
Faculty:	EEMCS: Electrical Engineering, Mathematics and Computer Science
Subject:	54 computer science
Programme:	Electrical Engineering MSc (60353)
Link to this item:	https://purl.utwente.nl/essays/86335
Export this item as:	BibTeX EndNote HTML Citation Reference Manager

Show download statistics for this publication

Repository Staff Only: item control page