Geert Pingen @gpgn

Cloud & Machine Learning Engineer, Research Scientist
StarCraft Bot
Research
Machine Learning

Introduction

StarCraft: BroodWar (1999) is the expansion set of the game Starcraft (1998), developed by Blizzard Entertainment. It is a Real-Time Strategy (RTS) game, and considered to be, not just in its own genre, a major influence on gaming. The franchise is listed twice (StarCraft, including the expansion, and its successor StarCraft 2, released in 2010) in the top selling PC games of all time, making apparent its relevance even today.

In multiplayer mode, players are required to build a base using harvested resources, build an army, and destroy the opponent player's base with that army. A player is victorious when there are no enemy buildings left in the game. Being an old - but well known strategic game, it lends itself exceptionally well for research purposes. Not only is there a vast number of replays to be analyzed, it also runs smoothly on any modern computer setup. This allows for automated tournaments to be held online versus different automated agents.

The focus of this project was to implement such an automated agent using a model for strategy prediction that relied on data mined from replays. The main heuristic used to select counter strategies was a time-optimal counter score calculated by a number of different offensive and defensive features.

Competitive StarCraft

Although the best StarCraft game AI's cannot yet outperform expert human players, the competition grows stronger every year. One of the main stages for those game AI's (or bots) are automated tournaments. Currently, the three largest tournaments of this kind are the Artifcial Intelligence and Interactive Digital Entertainment (AIIDE), the IEEE Computer Intelligence and Games (CIG), and the Student Starcraft AI (SSCAI) competitions. In these tournaments competetors can submit their bots, after which a set number of games will be played,pairing each bot against eachother via LAN.

Research

Weber et al. (2009) have used data mined from replays to predict strategies and develop an opponent model. Their results show that machine learning algorithms can be used to successfully predict strategies with 69% precision 5 minutes into the game, growing to 86% at 10 minutes (using NNge, or non-nested generalized exemplars). Using a rule set to classify tier 2 (or mid-game) strategies, they labeled their replay data based on domain knowledge.

Synnaeve & Bessiere (2011) have also made progress in the area of Bayesian networks applied to build order prediction, and reactive build order construction. In one study they computed the distribution of build orders and applied a Bayesian network to generate an estimate of a players build order based on the observed game state. Prediction results were shown to be as high as 4 buildings ahead of a current game state, dropping to 3 when a 0.3 random noise ratio was applied to simulate a competitive setup.

Bot Overview

The bot performed in a highly modular fashion, as is common in these types of bots (although a holistic approach has been attempted in RTS games - however none as complex as StarCraft). This modularity allows it to break down the problem of winning a game into seperate subproblems and deal with them individually. To achieve this, a number of managers were implemented, each handling a different aspect of the game.

Figure 1. A modular overview of the managers in the bot. Solid arrows indicate control, whereas dashed lines indicate data flow. Managers within a column also have data flow connections though they are not shown here.

Figure 1. A modular overview of the managers in the bot. Solid arrows indicate control, whereas dashed lines indicate data flow. Managers within a column also have data flow connections though they are not shown here.

The most important managers are the Build Manager; the Macro Manager; the Micro Manager; the Scout Manager; and the Strategy Manager.

Features

Various features were implemented for each of these managers, such as an extensive build- and economy planner for the build/macro manager (responsible for base and worker construction, and resource harvesting); a squad system to optimize army movement and unit positioning, and a potential field implementation of target acquisition for the micro manager; a system to indicate optimal time and location for the scout manager; and finally the neural network based model for counter strategy selection for the strategy manager.

Figure 2. Overview of the three different potential field implementations. The Global Field is the top layer and represents the difference between walkable and nonwalkable tiles, shown here in green and black respectively. The Squad Field shows a group of Dragoons attacking some Drones, wanting to move in attack range (green), and getting repelled a bit by eachother (red area). The bottom image shows a set of Personal Fields, two of which have been reversed as a consequence of the unit having their cooldown activated.

Figure 2. Overview of the three different potential field implementations. The Global Field is the top layer and represents the difference between walkable and nonwalkable tiles, shown here in green and black respectively. The Squad Field shows a group of Dragoons attacking some Drones, wanting to move in attack range (green), and getting repelled a bit by eachother (red area). The bottom image shows a set of Personal Fields, two of which have been reversed as a consequence of the unit having their cooldown activated.

Data

A set of 24552 Protoss versus Protoss (or PvP) replays were gathered from the website bwreplays, which was reduced to a number of 24485 (99.727% of the original dataset) after pruning some faulty or otherwise unusable replays. To enable modeling of the various strategies (or build orders) that exist in the game, only PvP matchups were considered. The reason for this is twofold. Firstly, the bot was created to play the Protoss race. Hence replays including Protoss players. Secondly, choosing the option to consider mirror-matchups only means a separate Strategy Tree for the opponent race does not have to be created. It is expected that results from this project can be applied to different races as well; taken into account that separate Strategy Trees are produced from those replays.

The dataset contains replays by 11984 unique players, both from individual games and tournament settings. Spanning from 1999 to 2013, and versions 1.08 to 1.16.1 (current) of the game, it covers the game's entire lifetime. This enabled modeling of older, though maybe not used as often, strategies as well.

Together with the creation of the Strategy Tree, a database was created to log all the data regarding offensive and defensive power for each player according to the feature set and the id of the winning player, per replay. These values were also stored in the Strategy Tree, but they were stored in a separate database because they are used separately in training the neural network.

Since only offensive and defensive values were used for the current meta-game (they may have been altered in patching throughout the years) to determine a counter-strategy, only replays from the current patch (1.16.1) were taken into account. This means that the training set contained a total of 10889 entries (44.472% of the original pruned dataset).

Strategy Tree

The Strategy Tree is a hierarchical tree ADT on which nodes represent a distinct choice in build order. This could be the construction of a Forge (a Protoss building that unlocks the construction of static defences), or the researching of an upgrade. The race's main building will be represented as the root node, since this is the building every player starts off with and it is always present when the game begins.

From there on out it branches into the different decisions in build order a player has made. Each path in the tree then represents a strategy for that race, and the entire tree is a model for all the strategies observed in the replays.

For every replay parsed, basic information on the player names and id's, the winning player, and offensive and defensive power for both players is stored. The replay is then fed into a tree generating module, which gathers data on the build orders from both players. It will traverse through the existing tree, starting at the root node, and go down a path if the actions of the build order from the replay match the ones found in the tree. If it cannot find an existing path, it will create a new branch. The likelihood of a specific branch is also updated. This value represents the average ratio of a node being visited compared to its siblings.

In theory, the Strategy Tree could grow infinitely large. However, since the tree is generated based on real-life replays this is not the case. All paths in the tree represent a build order executed in a live match, or a subset of one such replay. There are a number of game decisions not included in the tree as nodes, such as Pylons, Assimilators, and Photon Cannons.

The final element of the Strategy Tree is the set of values for the offensive and defensive power stored in a node. These values represent the ability of the strategy to combat its opponent, either by striking at them or by defending from an attack. Offensive power is dependent on the combined raw attack power of the units present, the number of attack upgrades researched, and the number of unit-producing structures. Defensive power is dependent on the combined raw armour power of the units present, the number of defence upgrades researched, the number of static defences, and the number of worker units (as an indication of resource income).

Figure 3. Building the Strategy Tree. A build order gathered from the parsed replay is added to the existing tree by traversing its nodes until no existing path can be found. The existing branch is then continued until the entire build order is represented in the Strategy Tree.

Figure 3. Building the Strategy Tree. A build order gathered from the parsed replay is added to the existing tree by traversing its nodes until no existing path can be found. The existing branch is then continued until the entire build order is represented in the Strategy Tree.

Strategy Recognition

Once the Strategy Tree has been generated, it can be serialized and used by the bot in the Strategy Manager. Whenever the Scout Manager gains new information on the status of the opponent, the Strategy Tree is used to make an assessment on what strategy the opponent is pursuing. A list of currently observed opponent units and upgrades is fed into the Strategy Tree model, and returned is a set of paths that contain the observed build order. For example, when a Nexus and Gateway are observed, a list of paths containing this subset is returned. From that list the smallest option is selected, meaning that a path of lesser length still containing the observed subset is preferred over options of bigger length. An observation can match two separately returned paths at the same time. The recognition of an enemy's position in the tree is obviously dependent on the validity of scouted information.

Strategy Generation

A neural network trained on the value of the winning player was applied with a supervised learning paradigm using the Encog framework, and tested with 5-fold cross validation (to allow it to train in the same environment as the bot and strategy model). The network consists of 15 input neurons (7 offensive/defensive features for each player -as listed above-, and the total iteration count), 2 hidden neurons, two output neurons, and two bias neurons. The bias neurons are set to random values in the range of [0.3, 0.7].

The neural network was able to predict a winner of a match given these input features with a validation error stabilizing at a mean of 0.2115 over the last 2000 iterations, meaning classification of about 78% of the replays to their correct output (win or loss condition).

The weights learned by the network were then applied to the bot and utilized in its Strategy selection model.

Figure 4. Error plotted against the number of epochs of the 5-fold cross-validation testing, stabilising at a mean of 0.2115 over the last 2000 epochs. The light blue plot is the original plot per epoch; The dark blue plot is a plot averaging every 10 epochs; The red plot is a plot averaging every 100 epochs.

Figure 4. Error plotted against the number of epochs of the 5-fold cross-validation testing, stabilising at a mean of 0.2115 over the last 2000 epochs. The light blue plot is the original plot per epoch; The dark blue plot is a plot averaging every 10 epochs; The red plot is a plot averaging every 100 epochs.

References

[1] Weber, B. G. and Mateas, M. (2009) A data mining approach to strategy prediction. Lanzi, pp. 140-147.

[2] Synnaeve, G. and Bessiere, P. (2011) A bayesian model for opening prediction in rts games with application to starcraft. Cho et al., pp. 281-288.