Managing Uncertainty within Value Function Approximation in Reinforcement Learning - IMS - Equipe Information, Multimodalité et Signal Accéder directement au contenu
Communication Dans Un Congrès Année : 2010

Managing Uncertainty within Value Function Approximation in Reinforcement Learning

Matthieu Geist
Olivier Pietquin

Résumé

The dilemma between exploration and exploitation is an important topic in reinforcement learning (RL). Most successful approaches in addressing this problem tend to use some uncertainty information about values estimated during learning. On another hand, scalability is known as being a lack of RL algorithms and value function approximation has become a major topic of research. Both problems arise in realworld applications, however few approaches allow approximating the value function while maintaining uncertainty information about estimates. Even fewer use this information in the purpose of addressing the exploration/ exploitation dilemma. In this paper, we show how such an uncertainty information can be derived from a Kalman-based Temporal Differences (KTD) framework. An active learning scheme for a second-order value-iteration-like algorithm (named KTDQ) is proposed. We also suggest adaptations of several existing exploration/exploitation dilemma schemes. This is a first step towards global handling of continuous state and action spaces and exploration/exploitation dilemma.
Fichier non déposé

Dates et versions

hal-00554398 , version 1 (10-01-2011)

Identifiants

  • HAL Id : hal-00554398 , version 1

Citer

Matthieu Geist, Olivier Pietquin. Managing Uncertainty within Value Function Approximation in Reinforcement Learning. Active Learning and Experimental Design workshop (collocated with AISTATS 2010), May 2010, Sardinia, Italy. ⟨hal-00554398⟩
35 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More