Reinforcement Learning Basics

In the past, there have been two main kinds of machine learning. In supervised learning, the computer is given both data and labels, and is simply tasked to find patterns that can map the input data to each corresponding label. In unsupervised learning, there are no labels, and the computer is tasked to find patterns that can group the input data into different clusters for categorization.

In comes reinforcement learning. It's a bit of a mix between the two. Say we've got an unknown environment, and all the computer can do is take an action from a given set, and see what happens.

For example, let's say we're trying to build an AI for the classic Atari game Space Invaders. The AI has no idea how the mechanics of the game work, all it's given is positions of each enemy, the player, and the score. However, through experimentation, it can locate patterns and figure out which actions result in the maximum score possible for each state.

In reinforcement learning algorithms, there are actually two components which need to be learned.

First, the value of each state in the environment. In the case of Space Invaders, how many points can the AI possibly score from a certain state onwards?

Next is the policy, or what action to take from each state to maximize reward. This is what most recognize as the actual AI portion, and it involves deciding how to prioritize reward now compared to potential reward later on.

There's also two main problems which need to be solved.

The first is reinforcement. Given an unknown environment, the AI must learn to interact with the environment to discover the value of each state. There's a tradeoff here: should the AI explore different kinds of actions in order to gain a wider reach of understanding, or take the optimal action so it has a more refined understanding of the correct path?

Second, we have planning. Once the model for the environment is known, what policy will lead to the maximum rewards from any given state?

These two problems make up the majority of reinforcement learning. Next post will go over some of the mathematical methods of solving them.

Note: most of this information comes from David Silver's RL Course.