designates my notes. / designates important. / designates very important.
Chapter 2 talks about neurology. This is used as an analogy of how signals propagate within the body and how they can propagate within you game’s AI. Interesting to note that the speed of signals within the body - your reflexes - are actually WAY slower (compared to for example internet speeds) than you might think.
The book touches on many topics but doesn’t go deeply into any. That isn’t a bad thing. It is actually a good introduction to many AI concepts and can act as a sort of idea catalyst to give you a jumping off point in which AI branch you want to learn more about (or which AI might work in your game).
Some of the info is a little dated - they talk about hundreds of pipes in a GPU as a big deal - but nothing bad and nothing that makes the concepts any less relevant.
Part 1 - General Wisdom
Part 2 - Architecture
Part 3 - Movement and Pathfinding
Part 4 - Strategy and Tactics
Part 5 - Agent Awareness and Knowledge Representation
Part 6 - Racing
Part 7 - Odds and Ends
Brian Kernighan, codeveloper of Unix and the C programming language, is believed to have said, “Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it”
Apparently the scenario AI for the original Warcraft would simply wait a fixed amount of time, and then start spawning a wave of units to attack you. It would spawn these units at the edge of the gray fog—which is to say, just outside of the visibility range of your units. It would continue spawning units until your defenses were nearly overwhelmed—and then it would stop, leaving you to mop up those units that remained and allowing you to win the fight.
This approach seems to cross the line fairly cleanly into the realm of “cheating.” The AI doesn’t have to worry about building buildings, or saving money, or recruiting units—it just spawns whatever it needs. On the other hand, think about the experience that results. No matter how good or bad you are at the game, it will create an epic battle for you — one which pushes you to the absolute limit of your ability, but one in which you will ultimately, against all odds, be victorious.
An illustration of action potential propagation through nerve fibers over time. The resulting swing in repolarization and the delay in membrane channels reopening prevent previously excited sites from firing again.
Schmitt trigger and hysteresis
$$y = v_i . v_w > 0 \space then \space 1 \space else \space 0$$
Rosenblatt’s perceptron has two phases, a learning pass and a prediction pass. In the learning pass, a definition of $v_i$ is passed in as well as a desired output, $y_{ideal}$ , normalized to a range of $0 \dots 1$. On each presentation of $v_i$ and $y_{ideal}$ a learning rule is applied in a similar form to Hebb’s.
Here, each weight is affected by the difference in overall output and expected output $y_{ideal}$ multiplied up by contributory input and learning rate.
$$\omega_{i,t+1} = \omega_i + \mu(y_{ideal} - y)i_i$$
For example, here is a utility equation from a recent talk [Mark 10]:
$$coverChance = 0.2+reloadNeed \times 1.0+healNeed \times 1.5+threatRating \times 1.3$$
$$coverChance = 1 \times w_0 + reloadNeed \times w_1+healNeed \times w_2+threatRating \times w_3$$
If we present the player with a number of scenarios and ask them whether they would take cover, we can easily get values that, after enough questions, are still easily accessible in meaning and therefore can make better initial guesses at the equation that might want to drive our AI.
A final nice property is that we don’t necessarily need to answer the false side of the equation if we don’t want to; we could just supply random values to represent false. Imagine each time the player goes into cover we measure these values and return true, following it by random values that return false. Even if we happen to get a lucky random true cover condition, over time it will just represent noise. If we then clamp all weights to sensible ranges, potentially found by training in test circumstances, we now have a quick to compute run-time predictor of what determines when the player may think of going into cover!
Normal distributions (also known as Gaussian distributions or bell curves)
There is randomness in these distributions, but they are not uniformly random. For example, the chance of a man growing to be 6 feet tall is not the same as the chance of him growing to a final height of 5 feet tall or 7 feet tall. If the chance were the same, then the distribution would be uniformly random.
Some random things in life do show a uniform distribution, such as the chance of giving birth to a boy or a girl. However, the large majority of distributions in life are closer to a normal distribution than a uniform distribution. But why?
The answer is quite simple and is explained by the central limit theorem. Basically, when many random variables are added together, the resulting sum will follow a normal distribution. This can be seen when you roll three 6-sided dice. While there is a uniform chance of a single die landing on any face, the chance of rolling three dice and their sum equaling the maximum of 18 is not uniform with regard to other outcomes. For example, the odds of three dice adding up to 18 is 0.5% while the odds of three dice adding up to 10 is 12.5%. Figure 3.1 shows that the sum of rolling three dice actually follows a normal distribution.
Randomness is too random (for many uses in games).
small runs of randomness don’t look random
In one-dimension, Perlin noise is constructed by first deciding how many octaves to use. Each octave contributes to the signal detail at a particular scale, with higher octaves adding more fine-grained detail. Each octave is computed individually and then they are added to each other to produce the final signal. Figure 3.5 shows one-dimensional Perlin noise, constructed with four octaves.
In order to explain how each octave signal is produced, let’s start with the first one. The first octave is computed by starting and ending the interval with two different uniform random numbers, in the range [0, 1]. The signal in the middle is computed by applying a mathematical function that interpolates between the two. The ideal function to use is the S-curve function $6t^5 - 15t^4 + 10t^3$ because it has many nice mathematical properties, such as being smooth in the first and second derivatives [Perlin 02]. This is desirable so that the signal contained within higher octaves is smooth.
For the second octave, we choose three uniform random numbers, place them equidistant from each other, and then interpolate between them using our sigmoid function. Similarly, for the third octave, we choose five uniform random numbers, place them equidistant from each other, and then interpolate between them. The number of uniform random numbers for a given octave is equal to $2^{n−1} + 1$. Figure 3.5 shows four octaves with randomly chosen numbers within each octave.
Once we have the octaves, the next step is to scale each octave with an amplitude. This will cause the higher octaves to progressively contribute to the fine-grained variance in the final signal. Starting with the first octave, we multiply the signal by an amplitude of 0.5, as shown in Figure 3.5. The second octave is multiplied by an amplitude of 0.25, and the third octave is multiplied by an amplitude of 0.125, and so on. The formula for the amplitude at a given octave is $p_i$ , where $p$ is the persistence value and $i$ is the octave (our example used a persistence value of 0.5). The persistence value will control how much influence higher octaves have, with high values of persistence giving more weight to higher octaves ( producing more high-frequency noise in the final signal).
Now that the octaves have been appropriately scaled, we can add them together to get our final one-dimensional Perlin noise signal, as shown at the bottom right of Figure 3.5. While this is all fine and good, it is important to realize that for the purposes of game AI, you are not going to compute and store the entire final signal, since there is no need to have the whole thing at once. Instead, given a particular time along the signal, in the range $[0, 1]$ along the x-axis, you’ll just compute that particular point as needed for your simulation. So if you want the point in the middle of the final signal, you would compute the individual signal in each octave at time 0.5, scale each octave value with their correct amplitude, and add them together to get a single value. You can then run your simulation at any rate by requesting the next point at 0.500001, 0.51, or 0.6, for example.
Controlling Perlin Noise
As alluded to in the previous section, there are several controls that will allow you to customize the randomness of the noise. The following list is a summary.
Number of octaves: Lower octaves offer larger swings in the signal while higher octaves offer more fine-grained noise. This can be randomized within a population as well, so that some individuals have more octaves than others when generating a particular behavior trait.
Range of octaves: You can have any range, for example octaves 4 through 8. You do not have to start with octave 1. Again, the ranges can be randomized within a population.
Amplitude at each octave: The choice of amplitude at each octave can be used to control the final signal. The higher the amplitude, the more that octave will influence the final signal. Simply ensure that the sum of amplitudes across all octaves does not exceed 1.0 if you don’t want the final signal to exceed 1.0.
Choice of interpolation: The S-curve function is commonly used in Perlin noise, with original Perlin noise using $3t^2 - 2t^3$ [Perlin 85] and improved Perlin noise using $6t^5 - 15t^4 + 10t^3$ (smooth in the second derivative) [Perlin 02]. However, you might be able to get other interesting effects by choosing a different formula [Komppa 10].
class FSMState
{
virtual void onEnter();
virtual void onUpdate();
virtual void onExit();
list<FSMTransition> transitions;
};
class FSMTransition
{
virtual bool isValid();
virtual FSMState* getNextState();
virtual void onTransition();
}
class FiniteStateMachine
{
void update();
list<FSMState> states;
FSMState* initialState;
FSMState* activeState;
}
The FiniteStateMachine class contains a list of all states in our FSM, as well as the initial state and the current active state. It also contains the central update() function, which is called each tick and is responsible for running our behavioral algorithm as follows:
Call isValid() on each transition in activeState.transtitions until isValid() returns true or there are no more transitions.
If a valid transition is found, then:
Call activeState.onExit()
Set activeState to validTransition.getNextState()
Call activeState.onEnter()
If a valid transition is not found, then call activeState.onUpdate() With this structure in place, it’s a matter of setting up transitions and filling out the onEnter(), onUpdate(), onExit(), and onTransition() functions to produce the desired AI behavior.
Hierarchical Finite-State Machines
Adding the second, third, or fourth state to an NPC’s FSM is usually structurally trivial, as all that’s needed is to hook up transitions to the few existing required states. However, if you’re nearing the end of development and your FSM is already complicated with 10, 20, or 30 existing states, then fitting your new state into the existing structure can be extremely difficult and error-prone.
You want to add a conversation state, but want to return to the patrol “direction” after the conversation? You’ll need 2 distinct states.
The reason this works is that the HFSM structure adds additional hysteresis that isn’t present in an FSM. With a standard FSM, we can always assume that the state machine starts off in its initial state, but this is not the case with a nested state machine in an HFSM. Note the circled “H” in Figure 4.4, which points to the “history state.” The first time we enter the nested Watch Building state machine, the history state indicates the initial state, but from then on it indicates the most recent active state of that state machine.
Our example HFSM starts out in Watch Building (indicated by the solid circle and arrow as before), which chooses Patrol to Safe as the initial state. If our NPC reaches the safe and transitions into Patrol to Door, then the history state switches to Patrol to Door. If the NPC’s phone rings at this point, then our HFSM exits Patrol to Door and Watch Building, transitioning to the Conversation state. After Conversation ends, the HFSM will transition back to Watch Building which resumes in Patrol to Door (the history state), not Patrol to Safe (the initial state).
For a solid detailed implementation, check out Section 5.3.9 in the book Artificial Intelligence for Games by Ian Millington and John Funge [Millington and Funge 09].
The algorithm to execute a behavior tree is as follows:
Make root node the current node
While current node exists,
Run current node’s precondition
If precondition returns true,
Else,
Run all behaviors on the execute list
Since trees are stateless, the algorithm doesn’t need to remember what behaviors were previously running in order to determine what behaviors should execute on a given frame. Further, behaviors can (and should) be written to be completely unaware of each other, so adding or removing behaviors from a character’s behavior tree do not affect the running of the rest of the tree. This alleviates the problem common with FSMs, where every state must know the transition criteria for every other state.
Extensibility is also an advantage with behavior trees. It is easy to start from the base algorithm as described and start adding extra functionality. Common additions are behavior on_start/on_finish functions that are run the first time a behavior begins and when it completes. Different behavior selectors can be implemented as well. For example, a parent behavior could specify that instead of choosing one of its children to run, each of its children should be run once in turn, or that one of its children should be chosen randomly to run. Indeed, a child behavior could be run based on a utility system-type selector (see below) if desired. Preconditions can be written to fire in response to events as well, giving the tree flexibility to respond to agent stimuli. Another popular extension is to specify individual behaviors as nonexclusive, meaning that if their precondition is run, the behavior tree should keep checking siblings at that level.
Since behaviors themselves are stateless, care must be taken when creating behaviors that appear to apply memory. For example, imagine a citizen running away from a battle. Once well away from the area, the “run away” behavior may stop executing, and the highest-priority behavior that takes over could take the citizen back into the combat area, making the citizen continually loop between two behaviors. While steps can be taken to prevent this sort of problem, traditional planners can tend to deal with the situation more easily.
Another caveat to using utility-based architecture is that all the subtlety and responsiveness that you gain often comes at a price. While the core architecture is often relatively simple to set up, and new behaviors can be added simply, they can be somewhat challenging to tune. Rarely does a behavior sit in isolation in a utility-based system. Instead, it is added to the pile of all the other potential behaviors with the idea that the associated mathematical models will encourage the appropriate behaviors to “bubble to the top.” The trick is to juggle all the models to encourage the most reasonable behaviors to shine when it is most appropriate. This is often more art than science. As with art, however, the results that are produced are often far more engaging than those generated by using simple science alone.
For more on utility-based systems, see the article in this book, An Introduction to Utility Theory [Graham 13] and the book Behavioral Mathematics for Game AI [Mark 09].
Goal-Oriented Action Planning (GOAP) is a technique pioneered by Monolith’s Jeff Orkin for the game F.E.A.R. in 2005, and has been used in a number of games since, most recently for titles such as Just Cause 2 and Deus Ex: Human Revolution. GOAP is derived from the Stanford Research Institute Problem Solver (STRIPS) approach to AI which was first developed in the early 1970s. In general terms, STRIPS (and GOAP) allows an AI system to create its own approaches to solving problems by being provided with a description of how the game world works—that is, a list of the actions that are possible, the requirements before each action can be used (called “preconditions”), and the effects of the action.
Backwards chaining search works in the following manner:
Add the goal to the outstanding facts list
For each outstanding fact
Remove this outstanding fact
Find the actions that have the fact as an effect
If the precondition of the action is satisfied,
Otherwise,
As opposed to backward planners like GOAP, which start with a desired world state and move backwards until it reaches the current state world state, HTN is a forward planner, meaning that it will start with the current world state and work towards a desired solution.
The following pseudocode shows how a plan is built.
Add the root compound task to our decomposing list
For each task in our decomposing list
Remove task
If task is compound
If task is primitive
HTN planners start with a very high-level root task and continuously decompose it into smaller and smaller tasks.
because of increasing complexity as you add more configuration, break the decision making up hierarchically. That is, have a high-level reasoner that makes the big, overarching decisions, and then one or more lower-level reasoners that handle implementation of the higher-level reasoners’ decisions.
The advantage here is that the complexity of AI configuration scales worse than linearly on the number of options in a particular reasoner. To give a sense of the relevance, imagine that the cost of configuring the AI is $O(n^2)$ on the number of options (as it is for FSMs). If we have 25 options, then the cost of configuring the AI is on the order of 252 = 625. On the other hand, if we have five reasoners, each with five options, then the cost of configuring the AI is only 5 × (52) = 125.
One handy trick is to use option stacks to handle your hit reaction. If a character is hit by an enemy attack (e.g., a bullet), we typically want them to play a visible reaction. We also want the character to stop whatever it was doing while it reacts. For instance, if an enemy is firing their weapon when we hit them, they should not fire any shots while the hit reaction plays. It just looks wrong if they do. Thus, we push an Is Hit option onto the option stack, which suspends all previously running options while the reaction plays, and then pop it back off when the reaction is done.
In the academic AI community, blackboard architectures typically refer to a specific approach in which multiple reasoners propose potential solutions (or partial solutions) to a problem, and then share that information on a blackboard [Wikipedia 12, Isla et al. 02]. Within the game community, however, the term is often used simply to refer to a shared memory space which various AI components can use to store knowledge that may be of use to more than one of them, or may be needed multiple times.
Line-of-sight and path are examples that could be placed on a blackboard.
One trick is to put the intelligence in the world, rather than in the character. This technique was popularized by The Sims, though earlier examples exist. In The Sims (and its sequels), objects in the world not only advertise the benefits that they offer (for example, a TV might advertise that it’s entertaining, or a bed might advertise that you can rest there), they also contain information about how to go about performing the associated actions [Forbus et al. 01].
Another advantage of this approach is that it greatly decreases the cost of expansion packs. In the Zoo Tycoon 2 franchise, for example, every other expansion pack was “ content only.” Because much of the intelligence was built into the objects, we could create new objects that would be used by existing animals, and even entirely new animals, without having to make any changes to the source code.
process_behavior_node(node)
if (node.precondition returns true) {
node.action()
if (node.child exists)
process_behavior_node(node.child)
} else {
if (node.sibling exists)
process_behavior_node(node.sibling)
}
While the flexibility of planners is a great strength, it can also be a great weakness. Often, designers will want to have more control over the sequences of actions that an AI can perform. While it is cool that your AI can create a jump kick plan on its own, it could also create a sequence of 27 consecutive jumps. This breaks the illusion of intelligence our AI should produce
This is the classic tradeoff that AI designers and programmers have to deal with constantly: the choice between the fully designed (brittle) AI that behavior trees provide and the fully autonomous (unpredictable) AI that planners provide.
Rather than add these special cases to each Action, we added a special Action called a FakeSim. The FakeSim Action is a special type of Composite Action called a decorator, which wraps another Action to add extra functionality to it. The FakeSim was responsible for adding incorrect information to the wrapped Action’s simulation step by modifying the world state directly. For example, there are some enemies that have a shield which makes them invulnerable to lightsaber attacks. If we want a Jedi to attack the enemy to demonstrate that the enemy is invulnerable while the shield is up, we can wrap the SwingSaber Action with a FakeSim Decorator which lowers the victim’s shield during the simulation step. Then, the SwingSaber simulation will think that the Jedi can damage the enemy and give it a good simulation result. This would allow SwingSaber to be chosen, even though it won’t actually be beneficial.
It’s important to note that utility is not the same as value. Value is a measurable quantity (such as the prices above). Utility measures how much we desire something. This can change based on personality or the context of the situation.
using normalized scores (values that go from 0–1) provide a reasonable starting point.
It’s important to note that any value range will work, as long as there is consistency across the different variables. If an AI agent scores an action with a value of 15, you should know immediately what that means in the context of the whole system. For instance, does that 15 mean 15 out of 25 or 15%?
The key to decision making using utility-based AI is to calculate a utility score (sometimes called a weight) for every action the AI agent can take and then choose the action with the highest score.
$$EU = \sum_{i=1}^n D_i P_i$$
$D$ is the desire for that outcome (i.e., the utility), and $P$ is the probability that the outcome will occur. This probability is normalized so that the sum of all the probabilities is 1. This is applied to every possible action that can be chosen, and the action with the highest expected utility is chosen. This is called the principle of maximum expected utility [Russell et al. 09].
For example, an enemy AI in an RPG attacking the player has two possible outcomes— either the AI hits the player or it misses. If the AI has an 85% chance to hit the player, and successfully hitting the player has a calculated utility score of 0.6, the adjusted utility would be 0.85 × 0.6 = 0.51. (Note that, in this case, missing the player has a utility of zero, so there’s no need to factor it in.) Taking this further, if this attack were to be compared to attacking with a different weapon, for example, with a 60% chance of hitting but a utility score of 0.9 if successful, the adjusted utility would be 0.60 × 0.9 = 0.54. Despite having a lesser chance of hitting, the second option provides a greater overall expected utility.
Let’s say we have an ant simulation game where the AI must determine whether to expand the colony or whether to breed. There are three different factors we want to consider for these decisions. The first is the overall crowdedness of the colony. If there are too many ants, we need to expand to make room for more. The second is the health of the colony, which we’ll say is based on how full the food stores are. Ant eggs need to be kept at a specific temperature; so there are specially built nurseries that house the eggs where they are taken care of. The amount of room in these nurseries is the third decision factor. These decision factors are based on game statistics that determine the score for each factor. The population and max population determine how many ants are in the colony and how many can exist based on the current colony size. The food stat represents how full the food stores are and is measured as a number from 0 to \100. The nursery space stat is also measured from 0 to 100 and represents how much space there is in the nursery. You can think of the last stats as percentages.
In the ant example above, we chose to represent health as a linear ratio by dividing the current amount of food with the maximum amount of food. This probably isn’t a very realistic calculation since the colony shouldn’t care about food when the stores are mostly full. Some kind of quadratic curve is more of what we want.
The key to utility theory is to understand the relationship between the input and the output, and being able to describe that resulting curve [Mark 09]. This can be thought of as a conversion process, where you are converting one or more values from the game to utility. Coming up with the proper function is really more art than science and is usually where you’ll spend most of your time. There are a huge number of different formulas you could use to generate reasonable utility curves, but a few of them crop up often enough that they warrant some discussion.
A linear curve forms a straight line with a constant slope. The utility value is simply a multiplier of the input. Equation 9.2 shows the formula for calculating a normalized utility score for a given value and Figure 9.2 shows the resulting curve.
$$U =\frac{x}{m}$$
In Equation 9.2, $x$ is the input value and $m$ is the maximum value for that input. This is really just a normalization function, which is all we need for a linear output.
A quadratic function is one that forms a parabolic curve, causing it to start slow and then curve upwards very quickly. The simplest way to achieve this is to add an exponent to Equation 9.2. Equation 9.3 shows an example of this.
$$U =\empheqlparen\frac{x}{m}\empheqrparen^k$$
The logistic function is another common formula for creating utility curves. It’s one of several sigmoid functions that place the largest rate of change in the center of the input range, trailing off at both ends as they approach 0 and \1. The input range for the logistic function can be just about anything, but it is effectively limited to $[–10 \dots 10]$. There really isn’t much point generating a curve larger than that and the range is often clamped down even further. For example, when $x$ is 6, $EU$ will be 0.9975.
Equation 9.4 shows the formula for the logistic function and Figure 9.5 shows the resulting curve. Note the use of the constant $e$. This is Euler’s number—the base of the natural logarithm—which is approximately 2.718281828. This value can be adjusted to affect the shape of the curve. As the number goes up, the curve will sharpen and begin to resemble a square wave. As the number goes down, it will soften.
$$U=\frac{1}{1+e^-x}$$
A piecewise linear curve is just a custom-built curve. The idea is that you hand-tune a bunch of 2D points that represent the thresholds you want.
You might want hunger to NEVER be selectable when it is below a certain threshold.
There are many other types of custom curves. For example, the curve in Figure 9.6 could be changed so that the values from 15 to 60 are calculated with a quadratic curve, while the rest are linear. There’s no limit to the number of combinations you can have.
Once the utility has been calculated for each action, the next step is to choose one of those actions. There are a number of ways you can do this. The simplest is to just choose the highest scoring option. For some games, this may be exactly what you want. A chess AI should definitely choose the highest scoring move. A strategy game might do the same. For some games (like The Sims), choosing the absolute best action can feel very robotic due to the likelihood that the action will always be selected in that situation. Another solution is to use the utility scores as weight, and randomly choose one of the actions based on the weights. This can be accomplished by dividing each score with the sum of all scores to get the percentage chance that the action will be chosen. Then you generate a random number and select the action that number corresponds to. This tends to have the opposite problem, however. Your AI agents will behave reasonably well most of the time, but every now and then, they’ll choose something utterly stupid.
You can get the best of both worlds by taking a subset of the highest scoring actions and choosing one of those with a weighted random. This can either be a tuned value, such as choosing from among the top five scoring actions, or it can be percentile based where you take the highest score and also consider things that scored within, say, 10% of it.
there could also be times when some set of actions are just completely inappropriate. You may not even want to score them. For example, say you’re making an FPS and have a guard AI. You might have some set of actions for him to consider, like getting some coffee, chatting with his fellow guard, checking for lint, etc. If the player shoots at him, he shouldn’t even consider any of those actions
The most straightforward way to solve this is with bucketing, also known as dual utility AI [Dill 11]. All actions are categorized into buckets and each bucket is given a weight. The higher priority buckets are always processed first.
In Figure 9.7, you can see that there are two buckets, one for Hunger and one for Fun. Hunger has scored 0.8 while Fun has scored 0.4. The Sim will walk through all possible actions in the Hunger bucket and, assuming any of those actions are valid, will choose one. The Sim will not consider anything in the Fun bucket
The buckets themselves are scored based on a response curve created by designers.
One issue that’s worth bringing up in any AI system is the concept of inertia. If your AI agent is attempting to decide something every frame, it’s possible to run into oscillation issues, especially if you have two things that are scored similarly. For example, say you have FPS where the AI realizes it’s in a bad spot. The enemy soldier starts scoring both “attack the player” and “run away” at 0.5. If the AI was making a new decision every frame, it is possible that they would start appearing very frantic. The AI might shoot the player a couple times, start to run away, then shoot again, then repeat. Oscillations in behavior such as this look very bad.
One solution is to add a weight to any action that you are already currently engaged in. This will cause the AI to tend to remain committed until something truly better comes along. Another solution is to use cooldowns. Once an AI agent makes a decision, they enter a cooldown stage where the weighting for remaining in that action is extremely high. This weight can revert at the end of the cooldown period, or it can gradually drop as well. Another solution is to stall making another decision—either for a period of time or until such time as the current action is finished. This really depends on the type of game you’re making and how your decision/action process works, however. On The Sims Medieval, a Sim would only attempt to make a decision when their interaction queue was empty. Once they chose an action, they would commit to performing that action. Once the Sim completed (or failed to complete) their action, they would choose a new action.
$$U = max\Biggl( min\biggl( \Bigl( 1-\frac{hp-minDmg}{maxDmg-minDmg} \Bigl) \times(1-a)+a,1 \biggl),0 \Biggl)$$
Figure 9.8 shows the resulting curve from Equation 9.5 where a is set to 0.6.
The second decision factor is the threat. This is a curve that measures what percentage of the actor’s current hit points will be taken away if the player hits for maximum damage. It has a shape similar to a quadratic curve and is generated with Equation 9.6.
$$U = min \empheqlparen \frac{maxDmg}{hp} , 1 \empheqrparen$$
Figure 9.9 shows the resulting curve for Threat.
The third decision factor is the actor’s desire for health. This uses a variation of the logistics function in Equation 9.4. As the actor’s hit points are reduced, its desire to heal will rise. Equation 9.7 shows the formula for this decision factor.
$$U = 1 - \frac{1}{1+(e\times0.68)^{-\empheqlparen\frac{hp}{maxHp}\times12\empheqrparen+6}}$$
$$U=1-\empheqlparen\frac{hp}{maxHp}\empheqrparen^{\frac{1}{(p+1)^4} \times 0.25}$$
References
[Dill 11] K. Dill. “A game AI approach to autonomous control of virtual characters.” Interservice/ Industry Training, Simulation, and Education Conference, 2011, pp. 4–5. Available online (http://www.iitsec.org/about/PublicationsProceedings/Documents/11136_Paper.pdf).
[Mark 09] D. Mark. Behavioral Mathematics for Game AI. Reading, MA: Charles River Media, 2009, pp 229–240.
[Russell et al. 09] S. Russell and P. Norvig. Artificial Intelligence: A Modern Approach. Reading, MA: Prentice Hall, 2009, pp. 480–509.
In a standard behavior tree, priority is static. It is baked right into the tree. The simplicity is welcome, but in practice it can be frustratingly limiting. The same behavior may require different relative priorities, depending on the context. Ensuring our Monster Hunter’s primary weapon has a full clip should always be a consideration, even if we’re casually patrolling the jungle. But if we’re engaged with a savage monster, it’s absolutely necessary that we continue to deal damage. Behavior tree authors often deal with this conundrum by duplicating sections of the tree at different branches, with different conditions and/or priorities. Even with slick sub-tree instancing or referencing, this still becomes inefficient, verbose, and potentially fragile.
Decisions are rarely binary, and many behaviors simply do not have priorities we can comfortably establish offline. Let’s start with a simple example behavior tree (Figure 10.1). Having no ability to shoot is a precondition for the Seek Medic behavior, forcing us to duplicate the behavior, as seen in Figure 10.2. We could start by giving Seek Medic stricter conditions and prioritizing it over Shoot, but this will likely create the opposite problem where the Monster Hunter immediately takes the Seek Medic action the instant conditions pass. This is the sort of fundamental problem we want to address with the integration of utility.
$$RawUtility = HealthGained – HealthLost$$
$$Value = HealthGained – (HealthLost × 2.0)$$
$$Value = HealthGained – (pow(HealthLost,1.2))$$
$$Utility = \frac{Heal\times HealPower - Delay \times DelayPower}{HealPower + DelayPower}$$
The tree already features a component for selecting which branches are taken during execution, namely the selector. To introduce utility-based selection, we’ll simply create a new specialized type of selector that considers not just the binary validity of its children, but their relative utility as well. We’ll cleverly dub the new node type the utility selector.
we can address our problem with Seek Medic by switching Combat to a utility selector, as we’ve done in Figure 10.3.
Compound Task [AttackEnemy]
Method 0 [WsHasTreeTrunk == true]
Subtasks [NavigateTo(EnemyLoc), DoTrunkSlam()]
Method 1 [WsHasTreeTrunk == false]
Subtasks [LiftBoulderFromGround(), ThrowBoulderAt(EnemyLoc)]
By understanding how compound tasks work, it’s easy to imagine how we could have a large hierarchy that may start with a BeTrunkThumper compound task that is broken down into sets of smaller tasks—each of which are then broken into smaller tasks, and so on. This is how HTN forms a hierarchy that describes how our troll NPC is going to behave. It’s important to understand that compound tasks are really just containers for a set of methods that represent different ways to accomplish some high level task. There is no compound task code running during plan execution.
A domain is the term used to describe the entire task hierarchy.
We start with a compound task called BeTrunkThumper. This root task encapsulates the “main idea” of what it means to be a Trunk Thumper.
Compound Task [BeTrunkThumper]
Method [WsCanSeeEnemy == true]
Subtasks [NavigateToEnemy(), DoTrunkSlam()]
Method [true]
Subtasks [ChooseBridgeToCheck(), NavigateToBridge(), CheckBridge()]
Primitive Task [DoTrunkSlam]
Operator [AnimatedAttackOperator(TrunkSlamAnimName)]
Primitive Task [NavigateToEnemy]
Operator [NavigateToOperator(EnemyLocRef)]
Effects [WsLocation = EnemyLocRef]
Primitive Task [ChooseBridgeToCheck]
Operator [ChooseBridgeToCheckOperator]
Primitive Task [NavigateToBridge]
Operator [NavigateToOperator(NextBridgeLocRef)]
Effects [WsLocation = NextBridgeLocRef]
Primitive Task [CheckBridge]
Operator [CheckBridgeOperator(SearchAnimName)]
With a domain made up of compound and primitive tasks, we are starting to form an image of how these are put together to represent an NPC. Combine that with the world state and we can talk about the work horse of our HTN, the planner.
There are three conditions that will force the planner to find a new plan: the NPC finishes or fails the current plan, the NPC does not have a plan, or the NPC’s world state changes via a sensor.
To do this, the planner starts with a root compound task that represents the problem domain in which we are trying to plan for. Using our earlier example, this root task would be the BeTrunkThumper task. This root task is pushed onto the TasksToProcess stack. Next, the planner creates a copy of the world state. The planner will be modifying this working world state to “ simulate” what will happen as tasks are executed. After these initialization steps are taken, the planner begins to iterate on the tasks to process. On each iteration, the planner pops the next task off the TasksToProcess stack. If it is a compound task, the planner tries to decompose it—first, by searching through its methods looking for the first set of conditions that are valid. If a method is found, that method’s subtasks are added on to the TaskToProcess stack. If a valid method is not found, the planner’s state is rolled back to the last compound task that was decomposed.
Compound Task [BeTrunkThumper]
Method [ WsCanSeeEnemy == true]
Subtasks [AttackEnemy()]// using the new compound task
Method [true]
Subtasks [ChooseBridgeToCheck(), NavigateToBridge(), CheckBridge()]
Compound Task [AttackEnemy]//new compound task
Method [WsTrunkHealth > 0]
Subtasks [NavigateToEnemy(), DoTrunkSlam()]
Method [true]
Subtasks [FindTrunk(), NavigateToTrunk(), UprootTrunk(), AttackEnemy()]
Primitive Task [DoTrunkSlam]
Operator [DoTrunkSlamOperator]
Effects [WsTrunkHealth += -1]
Primitive Task [UprootTrunk]
Operator [UprootTrunkOperator]
Effects [WsTrunkHealth = 3]
Primitive Task [NavigateToTrunk]
Operator [NavigateToOperator(FoundTrunk)]
Effects [WsLocation = FoundTrunk]
Compound Task [BeTrunkThumper]
Method [ WsCanSeeEnemy == true]
Subtasks [AttackEnemy()]
Method [ WsHasSeenEnemyRecently == true]//New method
Subtasks [NavToLastEnemyLoc(), RegainLOSRoar()]
Method [true]
Subtasks [ChooseBridgeToCheck(), NavigateToBridge(), CheckBridge()]
Primitive Task [NavToLastEnemyLoc]
Operator [NavigateToOperator(LastEnemyLocation)]
Effects [WsLocation = LastEnemyLocation]
Primitive Task [RegainLOSRoar]
Preconditions[WsCanSeeEnemy == true]
Operator [RegainLOSRoar()]
Primitive Task [NavToLastEnemyLoc]
Operator [NavigateToOperator(LastEnemyLocation)]
Effects [WsLocation = LastEnemyLocation]
ExpectedEffects [WsCanSeeEnemy = true]
Compound Task [AttackEnemy]
Method [WsTrunkHealth > 0, AttackedRecently == false, CanNavigateToEnemy == true]
Subtasks [NavigateToEnemy(), DoTrunkSlam(), RecoveryRoar()]
Method [WsTrunkHealth == 0]
Subtasks [FindTrunk(), NavigateToTrunk(), UprootTrunk(), AttackEnemy()]
Method [true]
Subtasks [PickupBoulder(), ThrowBoulder()]
Primitive Task [DoTrunkSlam]
Operator [DoTrunkSlamOperator]
Effects [WsTrunkHealth += -1, AttackedRecently = true]
Primitive Task [RecoveryRoar]
Operator [PlayAnimation(TrunkSlamRecoverAnim)]
Primitive Task [PickupBoulder]
Operator [PickupBoulder()]
Primitive Task [ThrowBoulder]
Operator [ThrowBoulder()]
Partial planning is one of the most powerful features of HTN. In simplest terms, it allows the planner the ability to not fully decompose a complete plan. HTN is able to do this because it uses forward decomposition or forward search to find plans. That is, the planner starts with the current world state and plans forward in time from that. This allows the planner to only plan ahead a few steps.
GOAP and STRIPS planner variants, on the other hand, use a backward search [Jorkin 04]. This means the search makes its way from a desired goal state toward the current world state. Searching this way means the planner has to complete the entire search in order to know what first step to take.
BIR = Break in Reality
An unrealistic state (US) BIR is the most immediate and obvious type of BIR, where a character’s immediately observable simulation is wrong. A character eating from an empty plate, or running in place against a wall
A fundamental discontinuity (FD) BIR is a little more subtle, but not by much: it occurs when a character’s current state is incompatible with the player’s memory of his past state. A character disappearing while momentarily around a corner, or having been frozen in place for hours while the player was away
An unrealistic long-term behavior (ULTB) BIR is the subtlest: It occurs only when an extended period of observation reveals problems with a character’s behavior. A character wandering randomly instead of having goal-driven behaviors
a tool which will be used in a lot of them: the exponential moving average (EMA). The EMA is a method for smoothing and averaging an ongoing sequence of measurements. Given an input function $F(t)$ we produce the output function $G(t)$. We initialize $(0) = F(0)$, and then at each time $t$ we update $G(t)$ as $G(t) = (1 - \alpha)F(t)+ \alpha G(t- \Delta t)$ , where Δt is the timestep since the last measurement. The $\alpha$ in that equation is calculated as $\alpha = e^{-k \Delta t}$, where $k$ is the convergence rate (higher values lead to faster changes in the average). You can tune $k$ to change the smoothness of the EMA, and how closely it tracks the input function. We’re going to use the EMA a lot in these models, so it’s a good idea to familiarize yourself with it (Figure 14.2).
The most successful approaches to scripting will generally fall into one of two camps. First is the “scripts as master” perspective, wherein scripts control the high-level aspects of agent decision making and planning. The other method sees “scripts as servant,” where some other architecture controls the overall activity of agents, but selectively deploys scripts to attain specific design goals or create certain dramatic effects.
In general, master-style systems work best in one of two scenarios. In the optimal case, a library of ready-to-use tools already exists, and scripting can become the “glue” that combines these techniques into a coherent and powerful overarching model for agent behavior.
servant scripts are most effective when design requires a high degree of specificity in agent behavior. This is the typical sense in which interactions are thought of as “scripted”; a set of possible scenarios is envisioned by the designers, and special-case logic for reacting to each scenario is put in place by the AI implementation team. Servant scripts need not be entirely reactive, however; simple scripted loops and behavioral patterns can make for excellent ambient or “background” AI.
Last, but certainly not least, integrated architectures provide an illustrative method for writing almost any large-scale code. The layered approach has been heavily encouraged for decades, with notable proponents including Fred Brooks and the SICP course from MIT. Learning to structure code in this way can be a powerful force multiplier for creating clean, well separated modules for the rest of the project, even well outside the scope of AI systems.
1986 MIT course: https://www.youtube.com/playlist?list=PL8FE88AA54363BC46
When using scripting, this often simply boils down to keeping a trace log of the steps that have been performed by the agent, and, where applicable, what branches have been selected and how often loops have been repeated. Being able to select an agent and view a debug listing of its complete script state is also an invaluable tool.
Scripts should be seen as a sort of glue that attaches various decision-making, planning, and knowledge representation systems into a cohesive and powerful whole.
A* and pathfinding
Precompute Every Single Path (Roy–Floyd–Warshall) While at first glance it seems ridiculous, it is possible to precompute every single path in a search space and store it in a look-up table. The memory implications are severe, but there are ways to temper the memory requirements and make it work for games. The algorithm is known in English-speaking circles as the Floyd–Warshall algorithm, while in Europe it is better known as Roy–Floyd.
Roy–Floyd–Warshall is the absolute fastest way to generate a path at runtime. It should routinely be an order of magnitude faster than the best A* implementation.
The look-up table is calculated offline before the game ships.
The look-up table requires $O(n^2)$ entries, where n is the number of nodes. For example, for a 100 by 100 grid search space, there are 10,000 nodes. Therefore, the memory required for the look-up table would be 100,000,000 entries (with 2 bytes per entry, this would be ~200 MB).
Path generation is as simple as looking up the answer. The time complexity is $O(p)$, where p is the number of nodes in the final path.
Another approach to reducing the memory requirement is to compress the Roy–Floyd–Warshall data. Published work [Botea 11] has shown the effectiveness of compressing the data, and this approach fared very well in the 2012 Grid-Based Path Planning competition (http://www.movingai.com/GPPC/), when sufficient memory was available.
An alternate way to compress the Roy–Floyd–Warshall data is to take advantage of the structure of the environment. In many maps, but not all maps, there are relatively few optimal paths of significant length through the state space, and most of these paths overlap. Thus, it is possible to find a sparse number of “transit nodes” through which optimal paths cross [Bast et al. 07]. If, for every state in the state space, we store the path to all transit nodes for that state, as well as the optimal paths between all transit nodes, we can easily reconstruct the shortest path information between any two states, using much less space than when storing the shortest path between all pairs of states. This is one of several methods which have been shown to be highly effective on highway road maps [Abraham et al. 10].
The full Roy–Floyd–Warshall data results in very fast pathfinding queries, at the cost of memory overhead. In many cases you might want to use less memory and more CPU, which suggests building strong, but not perfect heuristics.
Imagine if we store just a few rows/columns of the Roy–Floyd–Warshall data. This corresponds to keeping the shortest paths from a few select nodes. Fortunately, improved distance estimates between all nodes can be inferred from this data. If $d(x, y)$ is the distance between node x and y, and we know $d(p, z)$ for all $z$, then the estimated distance between $x$ and $y$ is $h(x, y) = |d(p, x) – d(p, y)|$, where $p$ is a pivot node that corresponds to a single row/column in the Roy–Floyd–Warshall data. With multiple pivot nodes, we can perform multiple heuristic lookups and take the maximum. The improved estimates will reduce the cost of A* search. This approach has been developed in many contexts and been given many different names [Ng and Zhang 01, Goldberg and Harrelson 05, Goldenberg et al. 11, Rayner et al. 11]. We prefer the name Euclidean embedding, which we will justify shortly. First, we summarize the facts about this approach:
Euclidean embeddings can be far more accurate than the default heuristics for a map, and in some maps are nearly as fast as Roy-Floyd-Warshall.
The look-up table can be calculated before the game ships or at runtime, depending on the size and dynamic nature of the maps.
The heuristic requires $O(kn)$ entries, where $n$ is the number of nodes and $k$ is the number of pivots.
Euclidean embeddings provide a heuristic for guiding A* search. Given multiple heuristics, A* should usually take the maximum of all available heuristics.
Why do we call this a Euclidean embedding? Consider a map that is wrapped into a spiral, such as in Figure 17.2. Points A and B are quite close in the coordinates of the map, but quite far when considering the minimal travel distance between A and B. If we could just unroll the map into a straight line, the distance estimates would be more accurate. Thus, the central problem is that the coordinates used for aesthetic and gameplay purposes are not the best for A* search purposes. That is, they do not provide accurate heuristic estimates. If we could provide a different set of coordinates optimized for A* search, we could use these coordinates to estimate distances between nodes and have a higher quality heuristic. This process of transforming a map into a new state space where distance estimates are (hopefully) more accurate is called an embedding. A single-source shortest-path search from a pivot node is equivalent to performing a one-dimensional embedding, as each node gets a single coordinate, and the heuristic in this embedding is the distance between the embedded points. Other types of embeddings are possible, just not yet well understood.
In order for A* to guarantee an optimal path, the heuristic must be admissible, meaning that the heuristic guess of the cost from the current node to the goal node must never overestimate the true cost. However, by using an overestimating heuristic, you can get a tremendous speed-up at the possible expense of a slightly nonoptimal path. While this sounds like a terrible trade-off initially, it turns out that a small amount of overestimating has large benefits with very little noticeable nonoptimality. In the world of search algorithms this once might have been seen as heresy, but in the video game industry it’s a shrewd and worthwhile optimization.
In order to understand how to overestimate the heuristic, let’s first look at Equation 17.1, which is the classic A* cost formula. As you can see, the final cost, $f(x)$, is the sum of the given cost, $g(x)$, and the heuristic cost, $h(x)$. Each node added to the Open List gets this final cost assigned to it and this is how the Open List is sorted.
$$ f(x) = g(x) + h(x)$$
$$ f(x) = g(x) + (h(x) \times weight)$$
Bad Idea #1: Simultaneous Searches
supporting many simultaneous searches at the same time is fraught with disaster. The primary problem is that you’ll need to support separate Open Lists for each request. The implications are severe as to the amount of memory required, and the subsequent thrashing in the cache can be devastating.
But what is to be done about a single search that holds up all other searches? On one hand, this might be a false concern because your pathfinding engine should be blindingly fast for all searches. If it isn’t, then that’s an indication that you chose the wrong search space representation or should be using hierarchical pathfinding.
However, if we concede that a single search might take a very long time to calculate, then one solution is to learn from supermarkets. The way supermarkets deal with this problem is to create two types of check-out lanes. One is for customers with very few items (10 items or less) and one for the customers with their cart overflowing with groceries. We can do a similar thing with pathfinding by allowing up to two searches at a time. One queue is for requests deemed to be relatively fast (based on distance between the start and goal) and one queue for requests deemed to take a long time (again based on distance between the start and goal).
The pros (of grids) are:
Grids are one of the simplest possible representations and are easy to implement. A working implementation can be completed in a few hours.
A grid representation can be easily edited externally with a text editor. This can save significant tool-building efforts [Van Dongen 10].
Terrain costs in grids are easy to dynamically update. For example, player-detected traps in Dragon Age: Origins are easily marked with a few bits in the relevant grid cells. It is easy for A* to account for these costs when planning, although the cost of planning will be increased if too many cells are re-weighted.
Passable cells can be quickly modified in a grid in a similar way to terrain costs being updated.
Localization in a grid is easy, simply requiring the coordinates to be divided by the grid resolution to return the localized grid cell.
The cons (of grids) are:
Grids are memory-intensive in large worlds. Note that a sparse representation can be used when the world is large, but the walkable space is relatively small [Sturtevant 11].
Path smoothing usually must be performed to remove the characteristic 45° and 90° angles that are found in grid-based movement, although any-angle planning approaches can also be used [Nash et al. 07].
Path planning in grids can be expensive due to the fine-grain representation of the world. This can be addressed using some form of abstraction [Rabin 00, Sturtevant 07].
Grid worlds often contain many symmetric paths, which can increase the cost of path planning. Some techniques can be used to avoid this (e.g., [Harabor and Grastien 11]), but this can also be avoided with different state representations.
Waypoint graphs represent the world as an abstract graph.
The pros (of waypoint graphs) are:
Waypoint graphs are relatively easy to implement.
Waypoint graphs are easy to modify if the changes are known ahead of time. For instance, if a door in the world closes and is locked, it is easy for the developer to mark the edges in the graph that cross the opening of the door and block them when the door is shut.
Waypoint graphs represent only a small fraction of the points found in a grid. This sparse representation of walkable space is both cheap to store and leads to inexpensive path planning requests.
The cons (of waypoint graphs) are:
Path quality can suffer if there are not enough walkable edges in the graph, but too many walkable edges will impact storage and planning complexity.
Waypoint graphs may require manual placement of nodes to get good path quality.
Localization on waypoint graphs requires mapping between game space and the graph. If a character is knocked off of the graph, it may be unclear where the character should actually be within the waypoint graph.
Because there is no explicit representation of the underlying state space, smoothing off the waypoint graph can result in characters getting stuck on physics or other objects.
Dynamic changes are difficult when they aren’t known ahead of time. If a char- acter can create an unexpected hole in a wall, new connections on the waypoint graph are needed. However, it can be expensive to check all nearby connections to verify if they have become passable due to the changes in the map.
Navigation meshes represent the world using convex polygons [Tozour 04]. A special case of navigation meshes are constrained Delaunay triangulations [Chen 09], for which the world is only represented by triangles. Note that grids can also be seen as a special case of navigation meshes, as both representations use convex polygons, but their usage is significantly different in practice.
The pros (of nav meshes) are:
Polygons can represent worlds more accurately than grids, as they can represent non-grid-aligned worlds.
With the accurate representation of a polygon it is easier to correctly perform smoothing both before and during movement. This accuracy can also be used for tighter animation constraints.
Path planning on navigation meshes is usually fast, as the representation of the world is fairly coarse. But, this does not impact path quality, as characters are free to walk at any angle.
Navigation meshes are not as memory-intensive as grids as they can represent large spaces with just a few polygons.
The cons (of nav meshes) are:
The time required to implement a navigation mesh is significant, although good open-source implementations are available [Mononen 11].
Navigation meshes often require geometric algorithms, which may fail in special cases such as parallel lines, meaning that implementation is much more difficult [Chen 09].
Changes to navigation meshes can be difficult or expensive to implement, especially when contrasted with changes to grid worlds.
Localization on navigation meshes can be expensive if poorly implemented. Good implementations will use additional data structures like grids to speed up the process [Demyen 06].
Grids are most useful when the terrain is fundamentally 2D, when implementation time is limited, when the world is dynamic, and when sufficient memory is available. They are not well suited for very large open-world games, or for games where the exact bounds of walkable spaces are required for high-quality animation.
Waypoint graphs are most useful when implementation time is limited, when fast path planning is needed, and when an accurate representation of the world is not necessary.
Navigation meshes are best when there is adequate time for testing and implementation. They are the most flexible of the possible implementations when implemented well, but can be overkill for smaller projects.
By: D. Hunter Hale and G. Michael Youngblood
This chapter was very interesting, but I will (for now) stick with baking nav meshes in Godot.
Precomputed solutions for pathfinding were common on old generation consoles, but have rarely been used on current hardware. These solutions give the best results in terms of computation cost, as all path request results are precomputed in a lookup table. However, they have two drawbacks: memory cost and loss of flexibility. Currently most games use dynamic algorithms like A* for navigation.
In the context of MMO games, however, precomputed solutions are still used. While the corresponding servers are typically equipped with ample memory, they have very few CPU cycles available for each request.
Node: a polygon or a sublayer component.
Component: a group of connected nodes. There is always a path between any two nodes in a component.
Tile: a cell in the grid partitioned layer. It can have any number of components or nodes.
For each 10 × 10 m grid sector there are three different 10 × 10 m 2D arrays, or fields of data, used by this algorithm. These three field types are cost fields, integration fields, and flow fields. Cost fields store predetermined “path cost” values for each grid square and are used as input when building an integration field. Integration fields store integrated “cost to goal” values per grid location and are used as input when building a flow field. Finally, flow fields contain path goal directions. The following sections go over each field in more detail.
A cost field is an 8-bit field containing cost values in the range 0–255, where 255 is a special case that is used to represent walls, and 1-254 represent the path cost of traversing that grid location. Varying costs can be used to represent slopes or difficult to move through areas, such as swamps. Cost fields have at least a cost of one for each grid location; if there is extra cost associated with that location, then it’s added to one.
If a 10 × 10 m sector is clear of all cost, then a global static “clear” cost field filled with ones is referenced instead. In this way, you only spend memory on cost fields that contain unique data. In an RTS game, there are a surprising number of clear sectors.
The integration field is a 24-bit field where the first 16 bits is the total integrated cost amount and the second 8 bits are used for integration flags such as “active wave front” and “line of sight.” You can optionally spend more memory for better flow results by using a 32-bit float for your integrated cost making it a 40-bit field.
Flow fields are 8-bit fields with the first four bits used as an index into a direction lookup table and the second four bits as flags, such as “pathable” and “has line of sight.” The flow field holds all the primary directions and flags used by the agent’s steering pipeline for steering around hills and walls to flow toward the path goal.
Units move through the grid following a static vector flow field. The flow field represents the optimal path direction at every cell in the grid, and is an approximation of a continuous flow function. Given a set of destination points, the flow function defines a vector field of normalized vectors, indicating the direction of the optimal path to the nearest destination. The flow function is similar to common methods for describing flows in fluid dynamics [Cabral and Leedom 93], with the difference that all flow vectors are normalized.
Flow fields guide units to the nearest destination in the same manner as a standard pathfinding system; however, the units’ pathing information is encoded in a flow field, removing the need for units to compute paths individually.
For example, if a bridge across a river is destroyed, the flow field only needs to be recomputed once to account for the change to pathable areas. Units following that flow field will implicitly change their respective paths in response to the change in the game world.
Units in Fieldrunners 2 use a limited, greedy, prioritized summation of five steering behaviors (four of which are shown in Figure 24.2). The five behaviors listed in descending order of priority include flow-field following, obstacle avoidance, separation, alignment, and cohesion. In each simulation step, a unit is only influenced by a specified total magnitude of steering forces. The forces resulting from steering behaviors are added to the running total in priority order until the maximum magnitude has been reached. Any steering forces that have not been added to the total are ignored.
Obstacle avoidance helps faster units maneuver intelligently around slower units. The implementations of the obstacle avoidance and separation behaviors differ slightly from Reynolds’ original implementation [Reynolds 99]. The obstacle avoidance steering behavior generates a “side stepping” force perpendicular to the unit’s velocity, and proportional to the position and relative velocity of the neighbor. The force generated by the separation steering behavior is scaled by the ratio of the kinetic energy of the neighbor to the kinetic energy of the unit the force is applied to. Units with smaller masses and velocities (presumably being more nimble) will more readily yield to larger, less maneuverable units. Finally, flow-field following moves the unit in the direction specified by the flow field. The flow-field direction at the position of the unit is computed by linearly interpolating the four closest flow vectors.
The key attributes of a successful query language for position selection are:
Abstraction—We gain a lot of power if we can reapply existing criteria (keywords) in new ways.
Readability—The intent of a query should be understood as easily as possible by developers.
Extensibility—It should be easy to add new criteria to the language.
Efficiency—The language itself should not add overhead to the evaluation process.
Listing 26.1 shows a query that is made up of two subqueries, called “options.” The first option is preferred but, should it fail, subsequent options will be tried in the order that they are specified before failure is reported. In this example, the first option will collect hidespots effective against the current attack target within a radius of 15 m around the agent, discarding any that are less than 5 m from the agent and discard any point closer to the attack target than the agent himself. Of the points that remain, it will prefer any hard cover that is available to any soft cover and prefer the closest point. The second option (the fallback) generates a grid of points to move sideways or away from the target and prefers to block Line-of-Sight (LOS). Both options share the goal of moving to a new, safe location at least 5 m away. Thus, this query might be used in reaction to a grenade landing at the agent’s feet.
The Generation section of each option specifies the source of our candidate points. The Conditions section contains the filters that should be applied during evaluation, which must all pass if a point is to be valid. The Weights section tells the AI how to score the fitness of those valid points.
Each line in those sections is a Lua table entry comprising a string and a value. The string, in the syntax of our DSL, is formed from a sequence of keywords joined by underscores, which is trivially parsed back into the keywords themselves. Usually the most important keyword comes first—for example, hidespots or distance—and specifies the evaluation or generation method that we are going to apply. This can be referred to as the criterion.
If an agent runs towards a hidespot only to have an opponent occupy it before him, it can hurt us twice: first by the agent appearing to lack any anticipation of his opponent, and second by leaving him stranded in the open and abruptly changing direction towards other cover.
This is easy to prevent with a couple of provisions. The simplest is to focus on our cur- rent primary opponent and discard any point closer to our enemy than to us, effectively assuming that both agents will move with the same speed. We can implement this as a simple condition, such as:
Conditions = {canReachBefore_the_target = true}
The simplest thing we can do that will make a big difference to performance in general is to pay attention to the order we evaluate our criteria. The rule of thumb for evaluation order is:
Cheap filters first, to discard points early
Weights, to allow us to sort into order
Expensive filters, evaluating from highest scoring down, until a point passes
Requiring that opponents attack the player one at a time is a technique known as the Kung-Fu Circle, named after classic scenes from martial arts movies in which the protagonist faces off against dozens of foes who launch their attacks one at a time.
At a high level, the Belgian AI algorithm is built around the idea of a grid carried around with every creature in the game. While every NPC had a grid for itself, in practice the player is the game entity we are most concerned about, so we will use the player as our example throughout this article. The grid is world-space aligned and centered on the player with eight empty slots for attacking creatures
In addition to the physical location of those slots, the grid stores two variables: grid capacity and attack capacity. Grid capacity will work to place a limit on the number of creatures that can attack the player at once, while attack capacity will limit the number and types of attacks that they can use.
Every creature in the game is assigned a grid weight, which is the cost for that creature to be assigned a spot on someone’s grid. The total grid weight of the creatures attacking a character must be less than that character’s grid capacity. Similarly, every attack has an attack weight, and the total weight of all attacks being used against a character at any point in time must be less than that character’s attack capacity.
In order to tell whether an AI agent is able to see something, most games tend to use a raycast from the agent to the target. While this does test whether there’s a clear line of sight between the two, it can start to get expensive very quickly. For example, in our games we try to limit the number of active AI agents to 16. This means that every agent can potentially see 15 other AI characters. In the worst case scenario, this could mean 120 raycasts being requested to check visibility between all the AI agents, even before the player is considered! In the AI system used for Crackdown 2, each agent could register how hostile a target needed to be before visibility checks should be done. These hostility levels were hostile, neutral, and friendly. By having AI agents register an interest in only hostile targets, this meant that any nonhostile targets became invisible to the agent, dramatically reducing the amount of raycasts required. Should a target change hostility (for example, going from being a neutral to a hostile target), the AI system will start or stop visibility tests as required.
Further optimizations can be made to the generation of visual stims. In the CryAISystem, every agent has a view distance and a field of view. A lot of unnecessary raycasts can be avoided by doing these much cheaper tests to see if potential visual targets are even within an agent’s view cone before requesting a raycast.
The remaining few stims tend to be events that you want to make your AI aware of, but don’t fit under the normal categories of sight or sound. For some of these events, you can effectively treat them as a dog whistle—create a sound stim without playing any audio and send that to the perception manager. These stims are then just handled in whatever way you need them to be.
By: Brook Miles
I find this a very interesting chapter. Nothing particularly complex, but these concepts fit into the kind of games I am interested in writing.
We determined that fundamentally there were two broad categories of interest sources our agents needed to detect in the world, things they could see, and things they could hear, which gave us our two senses: sight and sound.
Detection by the sight test involves a series of checks including these questions: Is the interest source within one of the agent’s vision cones? Is the game object associated with the interest source currently lit by a light source, or does the agent have the night vision flag, which removes this requirement? Is there any collision blocking line of sight between the agent’s eye position and that of the interest source?
A vision cone, as shown in Figure 32.1, is typically defined by an offset and direction from the agent’s eye position, an angle defining how wide it is, and a maximum distance. Other vision geometry is possible as well; we have some which are simply a single ray, an entire circle, or a square or trapezoid for specific purposes.
For both performance and gameplay reasons, sight and sound interest sources define a maximum radius, and only agents within that radius are tested to determine whether they can detect the interest source.
From our two core senses, we can now allow the designers to create an interest source representing whatever object or event they want, so long as it can be detected via the sight or sound tests. Designers can specify a “gunshot” sound, or a “footstep” sound, a “corpse” sight, or a “suspect” sight. The agent’s behavioral scripts can use the specified interest source type to determine any special behavior, but the backend only needs to know how to determine whether the agent can see or hear the interest source.
You can also bend the definition of “sight” and “sound” somewhat. One special case of agent in Mark of the Ninja is guard dogs, who we want to be able to “smell” the player in the dark but, for gameplay reasons, only over a very short distance. In this case, instead of needing to create an entirely new smell test, we can simply define a sight interest source which is attached to the player game object, and require that any agent noticing it have the “dog” tag as shown in Listing 32.1. Voila, we have a “smell” interest.
An interest is just the record in the agent’s brain of what he’s interested in right this moment. It may have a reference to the interest source that created it (if there was one), but even if it doesn’t, it still has a copy of all of the necessary information, the sense for the interest, the source type, its position, priority, and so on. When an agent is determined to have detected an interest source, an interest record is added to its brain, and this is the information that the agent uses from that point on.
While sight and sound are the only available sense types for interest sources in Mark of the Ninja, interests of any arbitrarily defined sense can be added directly to an agent’s brain by the designer through a script call, as no additional testing needs to be done against them; the designer or script writer has already determined that this agent should be interested in whatever it is.
For example a “touch” interest may be added to an agent’s brain when he is struck with a dart projectile, or a “missing partner” interest can be added if the agent’s partner goes off to investigate a sound and fails to return.
A question that arose early on was how groups of agents should respond when they all sense an interest simultaneously. At first it was every man for himself; each agent did its own test and upon sensing an interest would react, most likely by running to investigate it. This typically resulted in entire groups of agents running towards the slightest noise or converging on the player en masse. This wasn’t the kind of gameplay we were looking for. We want the player to be able to manipulate the agents, distract them, split them apart, and dispatch them on the player’s own terms.
By driving the detection of interest sources from the interest source itself, instead of from each agent individually, we can easily collect all of the information we need in order to determine who should be reacting, and how they should react.
The sensory manager update loop tests each interest source against all possible “detection candidates.” Given this list, it makes some decisions based mainly on group size, but possibly also by location or distance from the interest source. If only a single agent can detect the interest, our work is done, the agent is notified, and he goes to investi gate. If more than one agent can detect the interest, we can assign roles to each agent, which are stored along with the interest record in the agent’s brain. Roles only have meaning within the context of a specific interest, and when that interest is forgotten or replaced, the role associated with it goes away too. If multiple agents can detect the interest, one is chosen as the “sentry” or “group leader” and he plays audio dialog telling the other agents nearby to go check out the interest and then hangs back waiting. One or more agents are given the “investigate” role and will go and investigate, seemingly at the command of the “group leader.” Any remaining agents will get the “bystander” role, and may indicate they’ve seen or heard the interest but other- wise hold position and decrease the priority of the interest in their mind so they are more likely to notice new interests for which they might be chosen as leader or investigator. The key is that once the roles are assigned, and the sensory update is complete, there is no “group” to manage. Each agent is acting independently, but due to the roles that were assigned, they behave differently from each other in a way that implies group coordination.
If you are investigating a broken light and come across a dead body, should you stop and investigate the body, or continue to look at the light? What if you hear your partner being stabbed by a Ninja, and then discover a broken light on your way to help him? Should you stop to investigate?
Interest sources, and by extension interest entries in agents’ brains, contain a simple integer value of priority, where higher priority interests can replace lower or equal priority interests, and the agent will change his focus accordingly. If the agent currently holds a high priority interest, lower priorities interests are discarded and never enter the agent’s awareness.
Listing 32.2. Mark of the Ninja’s priority definitions for interests.
INTEREST_PRIORITY_LOWEST = 0
INTEREST_PRIORITY_BROKEN = 1
INTEREST_PRIORITY_MISSING = 2
INTEREST_PRIORITY_SUSPECT = 4
INTEREST_PRIORITY_SMOKE = 4
INTEREST_PRIORITY_CORPSE = 4
INTEREST_PRIORITY_NOISE_QUIET = 4
INTEREST_PRIORITY_NOISE_LOUD = 4
INTEREST_PRIORITY_BOX = 5
INTEREST_PRIORITY_SPIKEMINE = 5
INTEREST_PRIORITY_DISTRACTIONFLARE = 10
INTEREST_PRIORITY_TERROR = 20
SocialObjectComponent - This component performs the task of coordinating much of the group formation aspect of the system. At its core, it is a component that handles set membership, allowing characters to request access to the group, removing characters that are no longer participating, and allocating resources and/or positions in the group structure. This system is also responsible for advertising the availability of the social interaction as well as organizing flow control for when resources are limited, which as an example is useful to control how many people are talking at once during a group discussion.
This SocialObjectComponent is usually either added to the world during
instantiation of an object or it is added dynamically during the update of a
character that is receptive to a social encounter. An example of the former is a
hot dog stand, where the SocialObectComponent is instantiated to control the
behavior of the characters as they use the stand to buy hot dogs. An example of
the latter would be when a character has true conditions for <idle>,
<wants_social>, <sees_friend>, and <friend_also_wants_social>
. When the
conditions for social activity are met, the character spawns a GameObject, which
has a SocialObjectComponent added. This becomes a proposal for social
interaction and the SocialObjectComponent begins its role in coordinating the
interaction.
The intracharacter coordination of the various social dynamics components is controlled by the SocialComponent. This component is responsible for querying the world as to available potential interactions, forming requests to participate, controlling the focus of attention, etc. Much of the work of this component is involved in handling events propagated through the game and sending events to the different components of the social dynamics system to handle. For instance, the social component sends an event to its parent GameObject to indicate that the attention of the character has changed.
There are three key components to this schedule system: schedules, schedule entries, and actions. Together, these concepts form the core of the background AI system.
Schedules are linked to NPCs with a pointer, an ID, or anything else that’s appropriate to the system. The schedule is the top-level interface for accessing any schedule data on the NPC and manipulating the schedule during runtime. Schedules contain a number of schedule entries, each of which represents a slice of time (see the following).
A schedule entry is a single slice of time within the schedule. It manages the lifecycle of the schedule entry and determines when it’s time to move to the next entry. Schedule entries also encapsulate the decision-making an NPC performs when it is looking for something to do. For this reason, schedule entries are typically implemented with a strategy pattern, or other design pattern that enables you to easily swap one entry for another [Gamma et al. 01].
It’s usually not enough to have an NPC interact with a targeted object. It’s much more common to tell an NPC to sleep in its own bed or to work at its particular forge. We want the best of both worlds, so the action system needs to deal with the concept of object ownership as well.
All game objects that can be interacted with should have a type ID, which defines the type of object it is. The object type is really just metadata for the designer to label groups of similar objects. For example, all beds could be grouped under the “bed” type. This allows NPCs to own an object of a particular type, which greatly simplifies the scheduling data. For example, a schedule could have an action that tells the NPC to go to bed. The NPC will look in its map of owned objects and check to see if it has a “bed” object. If it does, it will use that bed. If not, some default behavior can be defined. Perhaps the NPC will choose a random bed, or perhaps it will fail the action and choose a new one. This same action could be applied to multiple NPCs without modification and it would work just fine. Each NPC would go to its own appropriate bed.
Another thing to consider is the rigid nature of schedules. If you have a schedule that sends NPCs to an inn for dinner and drinks at 6:00pm every day, you could easily have a pile of NPCs all stopping whatever they’re doing and immediately heading to the inn at the exact same time. A better solution is to randomize the schedule update time with a Gaussian distribution function centered on the end time for that schedule entry.
Remember, the underlying structure behind this schedule system is a hierarchical finite-state machine.
$$e(t) = Y(t) - R(t)$$
$$u(t) = K_p \cdot e(t) + K_i \cdot \int e(t)dt + K_d \cdot \frac{d}{dt}e(t) $$
Set all gains (K) to 0 and increase Kp until the system behaves in a desirable manner with no overshoot or oscillation. Tuning this value first is advisable, as it will generally have the biggest effect on the output.
Increase Ki to eliminate the steady-state error.
Adjust Kd to reduce any overshoot or reduce the settling time as required.
Position: It is not possible to drive in the same place as an observed vehicle, so write a large amount of heat at this position.
Block: If the observed vehicle is behind the player, it may make a good target to block so remove heat at this position.
Draft: If the observed vehicle is in a good position for drafting, remove heat based upon a drafting cone produced by this vehicle.
for each agent
for each neighbor within radius
calculate agent separation force away from neighbor
calculate center of mass for agent cohesion force
calculate average heading for agent alignment force
calculate wander force
sum prioritized weighted forces and apply to agent velocity
for each agent
apply velocity to position
We have two basic types of events that can be posted. Information-only events notify the ASD of changes in the game state, which may not be directly related to a sound emitting action. For example, if a game object has been spawned or despawned, or if the game is paused or resumed, then we can use an information-only event to pass this knowledge to the AI. These events may not cause sounds to be played directly, but they can affect the way in which we play other sounds.
The second type of event is the play-request event. These events are used when we want to play a specific type of sound, such as a gunshot. These events replace where previously we would have called directly into the sound system. For example, we now post an event at the point when an explosion occurs, or when the animation system causes feet to strike the ground (so that we can play footstep sounds). By using play-request events, we give the ASD the opportunity to make decisions about which sound sample to play and how loud to play it (e.g., should the volume of gunshot events be reduced so that we can hear enemy speech?). The ASD can evaluate the event and use the player’s current context, and the state of the game objects involved, to drive its decisions.
$$ P(event\space e) = \frac{#\space of\space ways\space e\space can\space happen}{#\space of\space all\space possible\space outcomes}$$
$$P(event\space e) = \frac{#\space of\space times\space e\space occurred}{#\space of\space time\space any\space event\space occurred }$$
This is useful for deriving probability from a sequence of past events. From this history, we can predict that the next event is most likely the event that has occurred most often.
In this sequence: “Jump, Jump, Dodge, Attack, Dodge, Attack, Jump, Jump, Dodge,” P(Jump) = 4/9 = 44%, P(Attack) = 2/9 = 22%, and P(Dodge) = 3/9 = 33%. Based on raw probability, the player will jump next. Do you agree with this prediction? Probably not, since you’ve likely picked up on a pattern. N-grams are used to find patterns in sequences of events. An N-gram could predict that a “Dodge, Attack” pattern was being executed. N-grams provide predictions more accurately than raw probability alone.
Each N-gram has an order, which is the N value. 1-grams, 2-grams, and 3-grams are called unigrams, bigrams, and trigrams. Above that, we use 4-gram, 5-gram, and so on. The N value is the length of patterns that the N-gram will recognize (the N-tuples). N-grams step through event sequences in order and count each pattern of N events they find. As new events are added to the sequence, N-grams update their internal pattern counts.
Consider a game where we need to predict the player’s next move, Left or Right. So far, the player has moved in the following sequence: “R, R, L, R, R”. From this, we can compute the following N-grams:
There isn’t enough data yet for an N-gram with order greater than 5. Notice that we store occurrence counts rather than probabilities, which requires less computation. We can perform the divisions later to calculate probabilities when we need them. With patterns learned, our N-grams can now predict the next event.
An N-gram predicts by picking out an observed pattern that it thinks is being executed again. To do so, it considers the most recent events in the sequence, and matches them against previously observed patterns. This set of recent events we’ll call the window. (See Figure 48.1.)
The length of the window is always N-1, so each pattern includes one more event than the window length. Patterns “match” the window if their first N-1 events are the same. All matching patterns will have a unique Nth event. The pattern with the most occurrences has the highest probability, so its Nth event becomes the prediction. No division is required.
Let’s see what our N-grams will predict next in the same sequence, “R, R, L, R, R.” (See Table 48.1.)
Using N-gram statistics, we can make the following predictions:
The unigram predicts Right, since it uses raw probability. Its window size is N – 1 = 0.
The bigram has observed two patterns that match its window, RR and RL. Since RR occurred once more than RL, the bigram chooses RR and predicts Right.
The trigram finds that only RRL matches its window, so it uses RRL to predict Left.
The 4-gram has not observed any patterns that start with LRR, so it cannot make a prediction.
The probability that an event $e$ will occur next is equal to the probability of the matching pattern containing e as its $N$th event, as shown in Equation 48.3.
$$P(event\space e\space is\space next) = P(matching\space pattern\space ending in\space e\space is\space next) $$
$$P(matching\space pattern\space mp) = \frac{#\space of\space mp\space occurances}{Sum\space of\space all\space matching\space pattern\space occurances}$$
During prediction, you’ll likely check the occurrences for all matching patterns to find the max. You can sum them while you’re at it, and find the denominator in the same pass.
If needed, you can calculate the probability that an event pattern will occur again anytime in the future, as shown in Equation 48.5. This is a nonconditional probability, as it doesn’t use the current window.
$$P(event\space pattern\space ep) = \frac{#\space of\space ep\space occurances}{Sum\space of\space all\space pattern\space occurances\space so\space far}$$
$$Total\space pattern\space occurances\space T = L-(N-1)$$