Generalization strategies in reinforcement learning

M. Snel

Generalization strategies in reinforcement learning

Authors	M. Snel
Supervisors	B.J.A. Kröse
Cosupervisors	F.C.A. Groen S.A. Whiteson
Award date	20-04-2018
Number of pages	113
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	A reinforcement-learning agent learns by trying actions and observing resulting reward in each state of an MDP (Markov Decision Process). Value functions are MDP solutions that map state-action pairs to their expected return (discounted sum of rewards). This work investigates two classes of strategies to generalize from solved MDPs to new, but similar, MDPs the agent might encounter in the future. The first strategy learns a single cross-MDP function based on the value functions of past MDPs the agent has solved. This function is employed in new MDPs as a shaping function, which provides the agent with informative reward on top of the ground truth reward. We propose and evaluate three different types of value functions to use as targets for learning the shaping function. In addition, we introduce FS-TEK, a novel feature selection algorithm, that selects relevant state features by observing how their covariance with return changes as more MDPs are solved. The second strategy investigates neural controllers that exhibit a degree of robustness to changes in MDP. It empirically evaluates five recurrent neural net architectures and a deep and shallow feedforward net (FNN) on a set of simulated locomotion tasks, and subjects them to two types of perturbations: sensor noise, and a change in terrain. Results show that the FNNs and a continuous- time RNN are most robust to task changes on average, with the CTRNN significantly outperforming the others under noise perturbation.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Generalization strategies in reinforcement learning