A Q-learning Based Evolutionary Algorithm for Sequential Decision Making Problems

Haobo Fu, Peter R. Lewis and Xin Yao
In Proceedings of the Workshop "In Search of Synergies between Reinforcement Learning and Evolutionary Computation" at the 13th International Conference on Parallel Problem Solving from Nature (PPSN). VUB AI Lab, https://ai.vub.ac.be/haobo_fu_ppsn_2014

Both Evolutionary Dynamic Optimization (EDO) methods and Reinforcement Learning (RL) methods tackle forms of Sequential Decision Making Problems (SDMPs), yet with different key assumptions. In this paper, we combine the strength of both EDO methods and RL methods to develop a new algorithm for SDMPs. Assuming that the environmental state is observable and that a computational model of the reward function is available, the key idea in our algorithm is to employ an evolutionary algorithm to search on the reward function at each time step, the outcome of which is exploited to speed up convergence to optimal policies in RL methods. Some preliminary experimental studies demonstrate that our algorithm is a promising approach for SDMPs.

@inproceedings{fu_et_al_qbea,
author = {Haobo Fu and Peter R. Lewis and Xin Yao},
title = {{A Q-learning Based Evolutionary Algorithm for Sequential Decision Making Problems}},
booktitle = {In Proceedings of the Workshop "In Search of Synergies between Reinforcement Learning and Evolutionary Computation" at the 13th International Conference on Parallel Problem Solving from Nature (PPSN)},
publisher = {VUB Artificial Intelligence Lab},
year = {2014},
note = {available at https://ai.vub.ac.be/haobo_fu_ppsn_2014}
}