Generalization strategies in reinforcement learning

Open Access
Authors
Supervisors
Cosupervisors
Award date 20-04-2018
Number of pages 113
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
A reinforcement-learning agent learns by trying actions and observing resulting reward in each state of an MDP (Markov Decision Process). Value functions are MDP solutions that map state-action pairs to their expected return (discounted sum of rewards). This work investigates two classes of strategies to generalize from solved MDPs to new, but similar, MDPs the agent might encounter in the future.
The first strategy learns a single cross-MDP function based on the value functions of past MDPs the agent has solved. This function is employed in new MDPs as a shaping function, which provides the agent with informative reward on top of the ground truth reward. We propose and evaluate three different types of value functions to use as targets for learning the shaping function. In addition, we introduce FS-TEK, a novel feature selection algorithm, that selects relevant state features by observing how their covariance with return changes as more MDPs are solved.
The second strategy investigates neural controllers that exhibit a degree of robustness to changes in MDP. It empirically evaluates five recurrent neural net architectures and a deep and shallow feedforward net (FNN) on a set of simulated locomotion tasks, and subjects them to two types of perturbations: sensor noise, and a change in terrain. Results show that the FNNs and a continuous- time RNN are most robust to task changes on average, with the CTRNN significantly outperforming the others under noise perturbation.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back