The zero-shot RL challenge is to discover agents that solve new tasks without task-specific training. We approach this problem by:

Learning a latent space of reward functions, and

Training a generalist agent on *random reward functions*, such that new reward functions can be solved in a zero-shot manner.

We encode reward functions by considering their *functional behavior* over states from an offline dataset. We learn a latent vector that is maximally informative about the reward function, and RL networks are conditioned on this latent vector.

Abstract

Can we pre-train a generalist agent from a large amount of unlabeled offline trajectories such that it can be
immediately adapted to any new downstream tasks in a zero-shot manner? In this work, we present a functional reward
encoding (FRE) as a general, scalable solution to this zero-shot RL problem. Our main idea is to learn functional
representations of any arbitrary tasks by encoding their state-reward samples using a transformer-based variational
auto-encoder. This functional encoding not only enables the pre-training of an agent from a wide diversity of general
unsupervised reward functions, but also provides a way to solve any new downstream tasks in a zero-shot manner, given a
small number of reward-annotated samples. We empirically show that FRE agents trained on diverse random unsupervised
reward functions can generalize to solve novel tasks in a range of simulated robotic benchmarks, often outperforming
previous zero-shot RL and offline RL methods.