This Curious AI Beats Many Games…and Gets Addicted to the TV

Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Reinforcement learning is a learning algorithm
that chooses a set of actions in an environment to maximize a score. This class of techniques enables us to train
an AI to master video games, avoiding obstacles with a drone, cleaning up a table with a robot
arm, and has many more really cool applications. We use the word score and reward interchangeably,
and the goal is that over time, the agent has to learn to maximize a prescribed reward. So where should the rewards come from? Most techniques work by using extrinsic rewards. Extrinsic rewards are only a half-solution
as they need to come from somewhere, either from the game in the form of a game score,
which simply isn’t present in every game. And even if it is present in a game, it is
very different for Atari breakout and for instance, a strategy game. Intrinsic rewards are designed to come to
the rescue, so the AI would be able to completely ignore the in-game score and somehow have
some sort of inner motivation to drive an AI to complete a level. But what could possibly be a good intrinsic
reward that would work well on a variety of tasks? Shouldn’t this be different from problem to
problem? If so, we are back to square one. If we are to call our learner intelligent,
then we need one algorithm that is able to solve a large number of different problems. If we need to reprogram it for every game,
that’s just a narrow intelligence. So, a key finding of this paper is that we
can endow the AI with a very human-like property – curiosity. Human babies also explore the world out of
curiosity and as a happy side-effect, learn a lot of useful skills to navigate in this
world later. However, as in our everyday speech, the definition
of curiosity is a little nebulous, we have to provide a mathematical definition for it. In this work, this is defined as trying to
maximize the number of surprises. This will drive the learner to favor actions
that lead to unexplored regions and complex dynamics in a game. So, how do these curious agents fare? Well, quite good! In Pong, when the agent plays against itself,
it will end up in long matches passing the ball between the two paddles. How about bowling? Well, I cannot resist but quote the authors
for this one. The agent learned to play the game better
than agents trained to maximize the (clipped) extrinsic reward directly. We think this is because the agent gets attracted
to the difficult-to-predict flashing of the scoreboard occurring after the strikes. With a little stretch one could perhaps say
that this AI is showing signs of addiction. I wonder how it would do with modern mobile
games with loot boxes? But, we’ll leave that for future work now. How about Super Mario? Well, the agent is very curious to see how
the levels continue, so it learns all the necessary skills to beat the game. Incredible. However, the more seasoned Fellow Scholars
immediately find that there is a catch. What if we sit down the AI in front of a TV
that constantly plays new material? You may think this is some kind of a joke,
but it’s not. It is a perfectly valid issue, because due
to its curiosity, the AI would have to stay there forever and not start exploring the
level. This is the good old definition of TV addiction. Talk about humanlike properties. And sure enough, as soon as we turn off the
TV, the agent gets to work immediately. Who would have thought! The paper notes that this challenge needs
to be dealt with over time, however, the algorithm was tested on a large variety of problems,
and it did not come up in practice. And the key insight is that curiosity is not
only a great replacement for extrinsic rewards, the two are often aligned, but curiosity,
in some cases, is even superior to that. That is an amazing value proposition for something
that we can run on any problem without any additional work. So, curious agents that are addicted to flashing
score screens and TVs. What a time to be alive! And, if you enjoyed this episode and you wish
to help us on our quest to inform even more people about these amazing stories, please
consider supporting us on You can pick up cool perks there to keep your
papers addiction in check. As always, there is a link to it and to the
paper in the video description. Thanks for watching and for your generous
support, and I’ll see you next time!

Leave a Reply

Your email address will not be published. Required fields are marked *