Demystifying the Mechanisms Behind Emergent Exploration in Goal-conditioned RL
Description
This paper examines emergent exploration in reinforcement learning, specifically using a goal-conditioned contrastive learning algorithm called SGCRL. The authors employ methodologies inspired by cognitive science, such as rational analysis and controlled intervention experiments, to analyze the implicit drivers of agent behavior in this reward-free setting. They demonstrate both theoretically and empirically that SGCRL's exploration is driven by an intrinsic reward signal based on representational similarity (or $\psi$-similarity) to the goal, where previously explored states become less similar to the goal, effectively guiding the agent toward novel regions. Experiments on mazes and the Tower of Hanoi, including tests against challenging scenarios like the noisy-TV problem, confirm that the single-goal data collection strategy is crucial for generating these exploration-encouraging representations, and that this mechanism can be extended to multi-goal tasks.