Meta-Learning Shared Hierarchies

Kevin Frans, Jonathan Ho, Xi Chen, Pieter Abbeel, John Schulman

We develop a metalearning approach for learning hierarchically structured policies, improving sample efficiency on unseen tasks through the use of shared primitives---policies that are executed for large numbers of timesteps. Specifically, a set of primitives are shared within a distribution of tasks, and are switched between by task-specific policies. We provide a concrete metric for measuring the strength of such hierarchies, leading to an optimization problem for quickly reaching high reward on unseen tasks. We then present an algorithm to solve this problem end-to-end through the use of any off-the-shelf reinforcement learning method, by repeatedly sampling new tasks and resetting task-specific policies. We successfully discover meaningful motor primitives for the directional movement of four-legged robots, solely by interacting with distributions of mazes. We also demonstrate the transferability of primitives to solve long-timescale sparse-reward obstacle courses, and we enable 3D humanoid robots to robustly walk and crawl with the same policy.
https://arxiv.org/abs/1710.09767

Outline Colorization through Tandem Adversarial Networks

Kevin Frans

When creating digital art, coloring and shading are often time consuming tasks that follow the same general patterns. A solution to automatically colorize raw line art would have many practical applications. We propose a setup utilizing two networks in tandem: a color prediction network based only on outlines, and a shading network conditioned on both outlines and a color scheme. We present processing methods to limit information passed in the color scheme, improving generalization. Finally, we demonstrate natural-looking results when colorizing outlines from scratch, as well as from a messy, user-defined color scheme.
https://arxiv.org/abs/1704.08834

Speeding Up TRPO Through Parallelization and Parameter Adaptation

Kevin Frans, Danijar Hafner

I never actually finished a paper for this, but there is an old draft at /static/trpo.pdf. A couple of these slides from a talk I gave during my OpenAI interview are more up-to-date.