Background: A Review of RTS Game AI 2
This chapter is based on the following publication:
Robertson, G. and Watson, I. (2014b). A review of real-time strategy game AI. AI Magazine, 35(4):75–104
This chapter provides a general background on Real-Time Strategy (RTS) game Artificial Intelligence (AI) concepts, approaches, and current directions. As a pub-lished work, it functions as an introduction to people new to the field of RTS game AI, helping them gain an overall understanding of work that has been done. Al-though the approaches used in this thesis are outlined in this chapter, they are ex-panded upon with further detail and related work in chapter 3.
The field of game Artificial Intelligence (AI), and the Real-Time Strategy (RTS) video game StarCraft were introduced in chapter 1. StarCraft was identified as an ideal domain for exploring and improving AI approaches to a range of challenges, with rapidly growing interest from the academic community.
This chapter presents a comprehensive review of the literature on AI techniques used for StarCraft, also including research based on other RTS games in the case that significant literature based on StarCraft is not yet available in an area. It begins by outlining the diﬀerent AI techniques used, grouped by the area in which they are pri- marily applied. These areas are tactical decision-making, strategic decision-making, plan recognition, and learning (table 2.1). This is followed by a comparison of the way game AI is used in academia and the game industry, which outlines the diﬀerences in goals and discusses the low adoption of academic research in the industry.
Finally, some areas are identified in which there does not seem to be suﬃcient re- search on topics that are well-suited to study in the context of RTS game AI. This last section also proposes standardisation of the evaluation methods used in StarCraft AI research in order to make better comparison possible between papers.
Tactical and micromanagement decisions – controlling individual units or groups of units over a short period of time – often make use of a diﬀerent technique.
AI making strategic decisions. These tactical decisions can follow a relatively simple metric, such as attempting to maximise the amount of enemy firepower which can be removed from the playing field in the shortest time (Davis, 1999). In the video game industry, it is common for simple techniques, such as finite state machines, to be used to make these decisions (Buckland, 2005). However, even in these small-scale decisions, many factors can be considered to attempt to make the best decisions pos-sible, particularly when using units with varied abilities (figure 2.1), but the problem space is not nearly as large as that of the full game, making feasible exploratory ap-proaches to learning domain knowledge (Weber and Mateas, 2009). There appears to be less research interest in this aspect of RTS game AI than in the area of large-scale, long-term strategic decision making and learning.
Figure . : A battle in StarCraft – intense micromanagement is required to max-imise the eﬀectiveness of individual units, especially “spellcaster” units like the Pro-toss Arbiter
Reinforcement Learning (RL) is an area of machine learning in which an agent must learn, by trial and error, optimal actions to take in particular situations order to max-imise an overall reward value (Sutton and Barto, 1998). Through many iterations of weakly supervised learning, RL can discover new solutions which are better than previously known solutions. It is relatively simple to apply to a new domain, as it requires only a description of the situation and possible actions, and a reward metric (Manslow, 2004). However, in a domain as complex as an RTS game – even just for tactical decision-making – RL often requires clever state abstraction mechanisms in order to learn eﬀectively. This technique is not commonly used for large-scale strategic decision-making, but is often applied to tactical decision-making in RTS games, likely due to the huge problem space and delayed reward inherent in strategic decisions, which make RL diﬃcult.
RL has been applied to StarCraft by Shantia et al. (2011), where Sarsa, an algo-rithm for solving RL problems, is used to learn to control units in small skirmishes. They made use of artificial neural networks to learn the expected reward for attacking or fleeing with a particular unit in a given state (figure 2.2), and chose the action with the highest expected reward when in-game. The system learned to beat the inbuilt StarCraft AI scripting on average in only small three-unit skirmishes, with none of the variations learning to beat the inbuilt scripting on average in six-unit skirmishes (Shantia et al., 2011).
Figure . : Game state information fed into a neural network to produce an ex-pected reward value for a particular action. Adapted from Shantia et al. (2011)
RL techniques have also been applied to other RTS games. Sharma et al. (2007) and Molineaux et al. (2008) combine Case-Based Reasoning (CBR) and RL for learn-ing tactical-level unit control in MadRTS1 (for a description of CBR see section 2.4.4). Sharma et al. (2007) was able to increase the learning speed of the RL agent by beginning learning in a simple situation and then gradually increasing the com-plexity of the situation. The resulting performance of the agent was the same or better than an agent trained in the complex situation directly. Their system stores its knowledge in cases which pertain to situations it has encountered before, as in CBR. However, each case stores the expected utility for every possible action in that 1Mad Doc Software. Website no longer available.
situation as well as the contribution of that case to a reward value, allowing the sys-tem to learn desirable actions and situations. It remains to be seen how well it would work in a more complex domain. Molineaux et al. (2008) describe a system for RL with non-discrete actions. Their system retrieves similar cases from past experience and estimates the result of applying each case’s actions to the current state. It then uses a separate case base to estimate the value of each estimated resulting state, and extrapolates around, or interpolates between, the actions to choose one which is es-timated to provide the maximum value state. This technique results in a significant increase in performance when compared with one using discrete actions (Molineaux et al., 2008).
Human critique is added to RL by Judah et al. (2010) in order to learn tactical decision-making for controlling a small group of units in combat in Wargus. By interleaving sessions of autonomous state space exploration and human critique of the agent’s actions, the system was able to learn a better policy in a fraction of the training iterations compared with using RL alone. However, slightly better overall results were achieved using human critique only to train the agent, possibly due to humans giving better feedback when they can see an immediate result (Judah et al., 2010).
Marthi et al. (2005) argues that it is preferable to decrease the apparent complex-ity of RTS games and potentially increase the eﬀectiveness of RL or other techniques by decomposing the game into a hierarchy of interacting parts. Using this method, instead of coordinating a group of units by learning the correct combination of unit actions, each unit can be controlled individually with a higher-level group control aﬀecting each individual’s decision. Similar hierarchical decomposition appears in many RTS AI approaches because it reduces complexity from a combinatorial com-bination of possibilities – in this case, possible actions for each unit – down to a multiplicative combination.
Search-based techniques have so far been unable to deal with the complexity of the long-term strategic aspects of RTS games, but they have been successfully applied to smaller-scale or abstracted versions of RTS combat. To apply these search methods, a simulator is usually required to allow the AI system to evaluate the results of actions very rapidly in order to explore the game tree.
Sailer et al. (2007) take a game theoretic approach by searching for the Nash equi-librium strategy among a set of known strategies in a simplified RTS. Their simpli-fied RTS retains just the tactics aspect of RTS games by concentrating on unit group movements, so it does not require long-term planning for building infrastructure and also excludes micromanagement for controlling individual units. They use a simula-tion to compare the expected outcome from using each of the strategies against their opponent, for each of the strategies their opponent could be using (which is drawn from the same set), and select the Nash-optimal strategy. The simulation can avoid simulating every time-step, skipping instead to just the states in which something “interesting” happens, such as a player making a decision, or units coming into firing range of opponents. Through this combination of abstraction, state skipping, and needing to examine only the possible moves prescribed by a pair of known strategies at a time, it is usually possible to search all the way to an end-game state very rapidly, which in turn means a simple evaluation function can be used. The resulting Nash player was able to defeat each of the scripted strategies, as long as the set included a viable counter-strategy for each strategy, and it also produced better results than the max-min and min-max players (Sailer et al., 2007).
Search-based techniques are particularly diﬃcult to use in StarCraft because of the closed-source nature of the game and inability to arbitrarily manipulate the game state. This means that the precise mechanics of the game rules are unclear, and the game cannot be easily set up to run from a particular state to be used as a simulator. Furthermore, the game must carry out expensive calculations such as unit vision and collisions, and cannot be forced to skip ahead to just the “interesting” states, making it too slow for the purpose of search (Churchill et al., 2012). In order to overcome these problems, Churchill et al. (2012) created a simulator called “SparCraft”2 which models StarCraft and approximates the rules, but allows the state to be arbitrarily manipulated and unnecessary expensive calculations to be ignored (including skip-ping uninteresting states). Using this simulator and a modified version of alpha-beta search, which takes into consideration actions of diﬀering duration, they could find eﬀective moves for a given configuration of units. Search time was limited to approxi-mate real-time conditions, so the moves found were not optimal. This search allowed them to win an average of 92% of randomised balanced scenarios against all of the standard scripted strategies they tested against within their simulator (Churchill et al., 2012).
Despite working very well in simulation, the results do not translate perfectly back to the actual game of StarCraft, due to simplifications such as the lack of unit colli-sions and acceleration, aﬀecting the outcome (Churchill and Buro, 2012; Churchill et al., 2012). The system was able to win only 84% of scenarios against the built-in StarCraft AI despite the simulation predicting 100%, faring the worst in scenarios which were set up to require hit-and-run behaviour (Churchill and Buro, 2012). The main limitation of this system is that due to the combinatorial explosion of possible actions and states as the number of units increases, the number of possible actions in StarCraft, and a time constraint of 5ms per logical game frame, the search will only allow up to eight units per side in a two player battle before it is too slow. On the other hand, better results may be achieved through opponent modeling, because the search can incorporate known opponent actions instead of searching through all pos-sible opponent actions. When this was tested on the scripted strategies with a perfect model of each opponent (the scripts themselves), the search was able to achieve at least a 95% win rate against each of the scripts in simulation (Churchill et al., 2012).
Monte Carlo Planning
Monte Carlo planning has received significant attention recently in the field of com-puter Go, but seems to be almost absent from RTS AI, and (to the author’s knowl-edge) completely untested in the domain of StarCraft. It involves sampling the deci-sion space using randomly-generated plans in order to find out which plans tend to lead to more successful outcomes. It may be very suitable for RTS games because it can deal with uncertainty, randomness, large decision spaces, and opponent actions through its sampling mechanism. Monte Carlo planning has likely not yet been ap-plied to StarCraft due to the unavailability of an eﬀective simulator, as was the case with the search methods above, as well as the complexity of the domain. However, it has been applied to some very restricted versions of RTS games. Although both of the examples seen here are considering tactical- and unit-level decisions, given a suitable abstraction and simulation, MCTS may also be eﬀective at strategic level decision-making in a domain as complex as StarCraft.
Chung et al. (2005) created a capture-the-flag game in which each player needed to control a group of units to navigate through obstacles to the opposite side of a map and retrieve the opponent’s flag. They created a generalised Monte Carlo planning framework and then applied it to their game, producing positive results. Unfortu-nately, they lacked a strong scripted opponent to test against, and their system was also very reliant on heuristic evaluations of intermediate states in order to make planning decisions. Later, Balla and Fern (2009) applied the more recent technique of Upper Confidence Bounds applied to Trees (UCT) to a simplified Wargus scenario. A major benefit of their approach is that it does not require a heuristic evaluation function for intermediate states, and instead plays a game randomly out to a terminal state in order to evaluate a plan. The system was evaluated by playing against a range of scripts and a human player in a scenario involving multiple friendly and enemy groups of the basic footman unit placed around an empty map. In these experiments, the UCT system made decisions at the tactical level for moving groups of units while micromanagement was controlled by the inbuilt Wargus AI, and the UCT evaluated terminal states based on either unit hit points remaining or time taken. The system was able to win all of the scenarios, unlike any of the scripts, and to overall outper-form all of the other scripts and the human player on the particular metric (either hit points or time) that it was using.
Various other AI techniques have been applied to tactical decision-making in Star-Craft. Synnaeve and Bessière (2011b) combines unit objectives, opportunities, and threats using a Bayesian model to decide which direction to move units in a battle. The model treats each of its sensory inputs as part of a probability equation which can be solved, given data (potentially learned through RL) about the distributions of the inputs with respect to the direction moved, to find the probability that a unit should move in each possible direction. The best direction can be selected, or the direction probabilities can be sampled over to avoid having two units choose to move into the same location. Their Bayesian model is paired with a hierarchical finite state machine to choose diﬀerent sets of behaviour for when units are engaging or avoiding enemy forces, or scouting. The agent produced was very eﬀective against the built-in StarCraft AI as well as its own ablated versions (Synnaeve and Bessière, 2011b).
CBR, although usually used for strategic reasoning in RTS AI (see section 2.4.4), has also been applied to tactical decision-making in Warcraft III3, a game which has a greater focus on micromanagement than StarCraft (Szczepański and Aamodt, 2009). CBR generally selects the most similar case for reuse, but Szczepański and Aamodt (2009) added a conditional check to each case so that it could be selected only when its action was able to be executed. They also added reactionary cases which would be executed as soon as certain conditions were met. The resulting agent was able to beat the built-in AI of Warcraft III in a micromanagement battle using only a small number of cases, and was able to assist human players by micromanaging battles to let the human focus on higher-level strategy.
Neuroevolution is a technique that uses an evolutionary algorithm to create or train an artificial neural network. Gabriel et al. (2012) use a neuroevolution approach called rtNEAT to evolve both the topology and connection weights of neural net-works for individual unit control in StarCraft. In their approach, each unit has its own neural network that receives input from environmental sources (such as nearby units or obstacles) and hand-defined abstractions (such as the number, type, and “quality” of nearby units), and outputs whether to attack, retreat, or move left or right. During a game, the performance of the units is evaluated using a hand-crafted fitness function and poorly-performing unit agents are replaced by combinations of the best-performing agents. It is tested in very simple scenarios of 12 versus 12 units in a square arena, where all units on each side are either a hand-to-hand or ranged type unit. In these situations, it learns to beat the built-in StarCraft AI and some other agents. However, it remains unclear how well it would cope with more units or mixes of diﬀerent unit types (Gabriel et al., 2012).
1.1 Artificial Intelligence for Games
1.2 Real-Time Strategy Games
1.4 Learning by Observation
1.5 Thesis Objectives
1.6 Thesis Contributions
1.7 Thesis Outline
2 Background: A Review of RTS Game AI
2.2 Tactical Decision-Making
2.3 Strategic Decision-Making
2.4 Plan Recognition and Learning
2.5 Open Research Areas
3 Main Approaches and Related Work
3.1 Learning by Observation
3.2 Case-Based Reasoning
3.3 Behaviour Trees
4 Case-Based Reasoning for Learning by Observation in RTS Games
4.4 Experimental Setup
5 An Improved Dataset for RTS Game AI Research
5.6 Conclusions and Future Work
6 Data Mining RTS Game Data
6.2 Association Rule Mining
6.3 Chosen Approaches
6.4 Alternative Rule Mining
7 Learning Behaviour Trees by Observation from RTS Game Data
7.2 Relation to Planning Systems
7.4 Experimental Setup
7.7 Conclusion and Future Work
8 Concluding Discussion
8.1 Discussion of Main Results
8.2 Future Directions
GET THE COMPLETE PROJECT