Thoughts on Esport AIs

Dota 2 and Starcraft 2 are two of the most popular and complex esports; professional gamers train for years to master them. But how hard are they for AIs? DeepMind is working on the latter. OpenAI has already created a system (OpenAI Five) which is able to beat strong amateurs at a simplified version of the former, and they'll be challenging professionals in August. By their own account, they were surprised that their standard reinforcement learning approach worked so well, because of the game's complexity:
  • Counting every fourth frame, as OpenAI Five does, Dota games last on average 20,000 "moves", compared with 40 for chess and 150 for go.
  • Unlike chess and go, Dota is a partial-information game in which players need to infer what their enemies are doing.
  • The range of actions in Dota is very large, with many of them varying continuously. Apparently, around 1000 are valid at each timestep (although I'm not quite sure what OpenAI are referring to as an action, to get that figure).
  • The information which can be observed at any given time is represented by 20,000 numbers; and unlike go or chess, many of these are floating-point numbers which need to be interpreted visually to detect a range of features.

Technical considerations

In Starcraft, a crucial factor is how quickly players can perform actions; top professionals average 150-300 actions per minute. In Dota, this isn't such a big concern, since players only control one hero; and Five averages only about 1/3 as many actions as it theoretically could (150 vs 450 per minute). However, timing and accuracy are pretty crucial, and here it has a major advantage: not only can it select actions arbitrarily precisely, it can also consistently achieve frame-perfect timing.

A second factor is coordination. Dota teams consist of five players, each with slightly different roles; teams communicate with each other mostly via audio. Five is technically comprised of five different neural networks, each controlling one player in a team. Although technically they don't have any communication channels between them, each network can instantly see everything that their teammates can see, and act accordingly. Given that they trained together, this should give them an advantage over humans (roughly equivalent to giving pros a splitscreen view of their team, although taking full advantage of that would be a difficult multitasking challenge).

Thirdly, there's the question of APIs - that is, what inputs Five actually receives. It'd be most impressive if it were given only pixels, as humans are. However, that would apparently push the task beyond the limits of feasibility. Instead, "OpenAI Five is given access to the same information as humans, but instantly sees data like positions, healths, and item inventories that humans have to check manually". While this is a slightly confusing way to frame it, it seems like this means that the agent doesn't get access to pixels at all, only the API. That's a little disappointing, but understandable.

It's difficult to predict how the upcoming match against professionals will go, because unlike the previous matches, this one will be under much fewer restrictions, requiring a version of Five retrained from scratch. And presumably the aspects which had previously been excluded - including warding, Roshan, invisibility and summons/illusions - were chosen because they were expected to be difficult to deal with. (Update: OpenAI recently announced the rules for the upcoming match: while there are still a few restrictions left, the most important ones have been removed. I'm most impressed by the fact that they're expanding the hero pool to 18, which indicates either that they've figured out some way of improving generalisation, or more likely that they're using absolutely enormous amounts of compute for training.) But even if the new version of Five struggles, there are three notable reasons to be impressed by its achievements so far. The first is that AlphaGo and AlphaGo Zero relied on Monte Carlo Tree Search to explore many possible lines of play in great depth. OpenAI Five doesn't - the neural network itself contains all the information required to execute long-term plans, without explicitly modelling how the game will unfold.

The second is that, because Dota players have limited access to information, they need to remember many facts about previous states. This requires the addition of a LSTM on top of the visual processing done by a CNN; while this architecture has been around for a few years, I think the Dota result is its first major success.

The third is that there are many different tactics and strategies which need to be explored to learn to play well; and when they need to be chained together, the number of possibilities increases exponentially. Even in Atari games like Montezuma's Revenge, neural networks struggle to discover strategies to get keys. OpenAI addressed this issue through a combination of shaped rewards, randomisation of starting states, and hardcoded combinations of item and skill builds. This is a bit hacky, but you can't blame them for doing what works.

Having said that, we should also note that the overall result continues a trend where progress requires massively increasing amounts of computational power. Five trained on 180 years of Dota per day for what looks like around 18 days, i.e. 3 millennia overall. When you take into account that this training consisted of running five networks at once, that running Dota itself takes a fair bit of processing, and that they probably trained quite a few different versions of Five, we're looking at ludicrous amounts of compute. I wouldn't be surprised if the amount of money researchers are willing to spend on AI compute has been increasing even faster than processing power itself over the last year or two.

Speculations

My current thinking is that, while this result is impressive, maybe it tells us that Dota just isn't as hard as we thought it was. People keep talking about the complex long-term strategic skills required to win at the professional level. But even if human professionals learn the nuances of each other's playing styles and predict what their opponents predict that they will predict, plausibly this is all unnecessary if you're good enough at the micro game and can recognise and respond well to a few dozen tactical and strategic patterns. And we know that neural networks can be excellent at the micro game, because in 1v1 play (which has much less strategy) OpenAI's bots already beat professionals.

An interesting point from OpenAI's blog post: Five learned - from self-play alone - a lane-switching strategy which it took humans years to come up with, and which professionals only started adopting recently. So there's no denying that Five uses complex strategy. But the distinction I want to make is between plans and planning. The agent is intelligent enough to follow plans. But it's the training process that is the planner. In his latest book, Daniel Dennet makes a very similar argument: that animals are competent without comprehension of what they're doing - because they're not doing the planning, evolution is. A resulting prediction: even if Five beats a team of professionals, if it's made publicly available then someone will find some strange strategy to beat it. (Update: after writing this post, I found out that while the 1v1 Dota bot was undefeated in official matches, when it was made publicly available amateur players found several strategies to reliably defeat it, including unusual item builds and “creep pulling”.)

If I'm right, I think that's good news for AI safety in the short and medium term. It implies that for most tasks, even cognitively complex ones, we'll be able to create agents which are much better than humans, but which are still unable to succeed in novel environments. Those agents will be able to handle novelty within environments that they're trained for, but only because their training process was "smart" enough to equip them with powerful and broad strategies and heuristics. These agents would likely still fail on adversarial examples optimised very hard to fool them, but I think humans would too if there were a way of generating such images. (Here's the best adversarial example for humans I've seen so far, which I and many others originally took to be a crowd at a concert).

On the other hand, my analysis also means that when we figure out how to train an agent to plan and strategise, it might quickly become very good at it. But "planning" is such a vague and general skill that the bottleneck for learning it is probably our ability to improve our training methods or task representations. It's also possible that current neural network architectures are just not powerful enough to do arbitrarily-complex planning - for example, because feed-forward networks like CNNs have no backwards connections, and recurrent networks have them only in a very constrained way. Either way, since much of our recent progress has been made using algorithms and architectures that have been around for decades, my guess is that if improvements are necessary, they won't arise for decades more.

Comments

Popular posts from this blog

In Search of All Souls

25 poems

Book review: Very Important People