- Generalization strategies in reinforcement learning
- Award date
- 20 April 2018
- Number of pages
- Document type
- PhD thesis
- Faculty of Science (FNWI)
A reinforcement-learning agent learns by trying actions and observing resulting reward in each state of an MDP (Markov Decision Process). Value functions are MDP solutions that map state-action pairs to their expected return (discounted sum of rewards). This work investigates two classes of strategies to generalize from solved MDPs to new, but similar, MDPs the agent might encounter in the future.
The first strategy learns a single cross-MDP function based on the value functions of past MDPs the agent has solved. This function is employed in new MDPs as a shaping function, which provides the agent with informative reward on top of the ground truth reward. We propose and evaluate three different types of value functions to use as targets for learning the shaping function. In addition, we introduce FS-TEK, a novel feature selection algorithm, that selects relevant state features by observing how their covariance with return changes as more MDPs are solved.
The second strategy investigates neural controllers that exhibit a degree of robustness to changes in MDP. It empirically evaluates five recurrent neural net architectures and a deep and shallow feedforward net (FNN) on a set of simulated locomotion tasks, and subjects them to two types of perturbations: sensor noise, and a change in terrain. Results show that the FNNs and a continuous- time RNN are most robust to task changes on average, with the CTRNN significantly outperforming the others under noise perturbation.
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library, or send a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.