Artificial intelligence company DeepMind Technologies recently announced the creation of an AI that has not only learned how to play 57 classic Atari games, it can reportedly do so “better than any human.”
Agent57, as they call it, is a deep reinforcement learning algorithm (or RL algorithm) that has been tasked with getting the high score in all 57 games found in the Arcade Learning environment known as Atari57.
We’re talking games like Space Invaders, Asteroids, Pong, and Ms. Pac-Man (the Atari57 research paper, published back in 2013, contains the full list of games).
But why use video games to train an AI, you may ask? As they explain:
“…[games] provide a rich suite of tasks which players must develop sophisticated behavioural strategies to master, but they also provide an easy progress metric – game score – to optimise against.”
Their goal, as they say, is not to train the world’s greatest AI video game player, as important as that sounds. Instead, they’re using video games to develop “systems that learn to excel at a broad set of challenges.”
Montezuma’s Revenge, Solaris, Skiing and Pitfall! also appear in the Atari57 suite, and these four in particular represent the most challenging games for the AI to master.
As DeepMind explains in an official blog post, some games are just more complicated than others. Montezuma’s Revenge and Pitfall!, for example, involve what they call the “exploration-exploitation problem.” In this case, the AI has to determine whether or not it should do what it knows works (exploitation) or try something new (exploration).
Solaris and Skiing, on the other hand, involve “long-term credit assignment problems,” meaning that the rewards for specific actions don’t come immediately, so the AI is required to “collect information over long time scales.”
While previous AI “agents” have done well on some of the easier games, that hasn’t been the case for the harder ones. However, according to DeepMind, Agent57 “finally obtains above human-level performance on the very hardest games in the benchmark set, as well as the easiest ones.”