University of Twente Student Theses

Login

Approximate dynamic programming with adaptive multivariate simplex splines

Jongh, Maike de (2021) Approximate dynamic programming with adaptive multivariate simplex splines.

[img] PDF
2MB
Abstract:Large-scale Markov decision processes (MDPs) typically suffer from the curse of dimensionality, which renders exact solution methods intractable. For such problems, we rely upon approximate dynamic programming (ADP) techniques. A fundamental element of approximate dynamic programming is value function approximation. This research presents a value function approximation architecture that is based on adaptive multivariate simplex splines. Their linearity in the parameters and the fact that they are easily evaluated and adapted on a local basis renders these functions suitable candidates for this purpose. The approximation power of a multivariate simplex spline depends to a great extent on the triangulation on which it is defined. Our main contribution is a procedure that adaptively refines this triangulation in regions of the state space that require a more accurate value function approximation. This procedure is integrated into an ADP framework. The result is a method that can be applied to any MDP with a continuous (or large discrete) state space and finite discrete action space without the need of any preadjustment of the splines based on problem-specific knowledge. The method is tested on a problem of balancing an inverted pendulum. For this problem, the performance of our algorithm is not far behind that of a problem-specific method. The results indicate that multivariate simplex splines have high potential as value function approximators in an ADP context.
Item Type:Essay (Master)
Faculty:EEMCS: Electrical Engineering, Mathematics and Computer Science
Programme:Applied Mathematics MSc (60348)
Link to this item:https://purl.utwente.nl/essays/86074
Export this item as:BibTeX
EndNote
HTML Citation
Reference Manager

 

Repository Staff Only: item control page