CPSC 322 - Lecture 18 - October 20, 2004

CPSC 322 - Lecture 18

Game Search


I.  But first, how to collect paths while searching

All the basic search strategies we've looked at so far
give us nothing more than a yes or no answer:  "yes there's
a path from the start node to the goal node" or "no there
isn't".  That's sort of useful, but not nearly as useful
as an answer like "yes, I found a path, and here it is!"
Fortunately, it doesn't take much extra effort for a 
search algorithm to collect the paths to the frontier node
as it searches the state space -- it's just a matter of a 
little bookkeeping.  Included in today's slides are an
example of path-keeping behavior in the context of breadth-
first search, a path-keeping search algorithm in Datalog, 
and the equivalent code in CILOG.  Who could ask for more?


II.  Let the fun begin

For a single agent in a relatively non-hostile world, the 
search for the path from some start state to some goal state 
is not especially difficult, and in fact, it's sort of dull.  
But the world isn't always peaceful---sometimes, there are 
other agents out there trying to keep you from reaching your 
goal, and at the same time those other agents are trying to 
achieve some goal of their own.  We see this kind of stuff 
everywhere:  in the executive boardroom, on the athletic 
field, sometimes even in the classroom.  And when we try to 
model this kind of competitive behavior on a computer, we 
have to keep in mind that while our computerized "good guy" 
is going to try to move toward the goal in as optimal a 
fashion as possible, the computerized "bad guy" is going to do
everything it can to keep us from getting there.  Thus, the 
question in a competitive or adversarial situation is no 
longer "what's my optimal path to the goal?", but is instead 
"what's my path to the goal when someone else is trying stop 
me?"

The fundamental change in the nature of that question results 
in a fundamental change to the way we do state-space search 
in adversarial situations, thus giving rise to something 
called "adversarial search".  And since we frequently use 
this kind of search when we build intelligent game-playing 
programs, this kind of search is frequently called "game 
search".

The principle of game search is to first generate the state 
space (or "game tree") some number of levels deep, where each 
level corresponds to one player's move (or, more accurately, 
the set of all moves that the player could possibly make at 
that point).  After generating the state space for that 
number of levels, the nodes at the bottom level are evaluated 
for goodness.  (In the context of game playing, those nodes 
are often called "boards", each one representing one possible 
legal arrangement of game pieces on the game board.)

The estimate of the goodness of a board is a little bit 
different than before, but not much.  Since we have to worry 
about the opponent, we set up our estimation function so that 
it returns a spectrum of values, just like before, but now 
the two extremes are boards that are great for us (i.e., we 
win) and boards that are great for our opponent (i.e., we 
lose).  We apply our estimation function to those lowest 
level boards, and propagate the numerical values upward to 
help us determine which is the best move to make.


III. The joy of hex

In order to explore the wild and wacky world of adversarial 
or game search, we're going to have to introduce a game.  
It's a simple game for two players, and it's called hexapawn, 
for reasons which will become obvious.

The rules of hexapawn (at least in its original form), are 
as follows:

The game is played on a 3x3 board.  Each player begins with 
three pawns lined up on opposite sides of the board.  There 
are three white pawns and three black pawns, which gives us a 
grand total of six pawns, hence the name hexapawn.  White 
always moves first, just like in chess.  The players take 
turns moving their pawns.  A pawn can move one square forward 
to an empty square, or it can move one square diagonally 
ahead (either to the left or right) to a square occupied by 
an opponent's pawn, in which case the opponent's pawn is 
removed from the board.  One player wins when one of these 
three conditions is true:  1) one of that player's pawns has 
reached the opposite end of the board, 2) the opponent's 
pawns have all been removed from the board, or 3) it's the 
opponent's turn to move but the opponent can't move any 
pawns.

It's not a very exciting game for human players, but it's 
reasonably stimulating for humans who are required to write 
programs to get computers to play this game, such as 
yourselves.  (It's not entirely clear how the computers feel 
about it.)  Hexapawn also serves as a very nice mechanism for 
demonstrating the principles of game search with heuristics.


IV.  Hexapawn: catch the fever

Let's look at the beginning of a sample game of hexapawn and 
see how we might get a computer to play the game.  We'll let 
our opponent take the side of the white pawns, and we'll play 
the black pawns.  The initial board configuration will be 
represented like this:

                                   W W W                   
                                   - - -                   
                                   B B B                   

As we said, white always gets to move first.  This gives 
white three possible initial moves, which are represented in 
this way:

                                   W W W                   
                                   - - -                   
                                   B B B                   
                                  /  |  \                  
                                 /   |   \                 
                                /    |    \                
                               /     |     \               
                              /      |      \              
                             /       |       \             
                           - W W   W - W   W W -           
                           W - -   - W -   - - W           
                           B B B   B B B   B B B           

But of course, white doesn't get to make all three moves.  
White has to choose one and go with it.  So let's say that 
white opts for that move on the left.  Our resulting state 
space then looks like this:

                                   W W W                   
                                   - - -                   
                                   B B B                   
                                  /                        
                                 /                         
                                /                          
                               /                           
                              /                            
                             /                             
                           - W W                           
                           W - -                           
                           B B B                           

Everything was fine up until now.  Now it's our turn.  What 
will we do?  Well, what we'd like to do is look at all of our 
possible next moves and make the best one, right?  Sure.  So 
let's see what our options are on this move:

                                   W W W                   
                                   - - -                   
                                   B B B                   
                                  /                        
                                 /                         
                                /                          
                               /                           
                              /                            
                             /                             
                           - W W                           
                           W - -                           
                           B B B                           
                           / | \                           
                        /    |    \                        
                     /       |       \                     
                  /          |          \                  
               /             |             \               
            /                |                \            
         /                   |                   \         
       - W W               - W W               - W W       
       B - -               W B -               W - B       
       B - B               B - B               B B -       

Can we tell from just this which possible next move is the 
best one?  Maybe.  That one on the left looks sort of nice, 
since it leaves us with one more pawn than our opponent.  But 
we really can't tell just by looking at these different 
boards which move is likely to lead to a win for us.  Maybe 
we could get a better idea of which of our three possible 
moves is the best one by looking even further ahead to see 
what white might do on the next turn:

                                   W W W                   
                                   - - -                   
                                   B B B                   
                                  /                        
                                 /                         
                                /                          
                               /                           
                              /                            
                             /                             
                           - W W                           
                           W - -                           
                           B B B                           
                           / | \                           
                        /    |    \                        
                     /       |       \                     
                  /          |          \                  
               /             |             \               
            /                |                \            
         /                   |                   \         
       - W W               - W W               - W W       
       B - -               W B -               W - B       
       B - B               B - B               B B -       
     /   |   \              / \              /   |   \     
    /    |    \            /   \            /    |    \    
   /     |     \          /     \          /     |     \   
  /      |      \        /       \        /      |      \  
- - W  - - W  - W -    - W -   - W -    - W W  - - W  - - W
W - -  B W -  B - W    W B W   W W -    - - B  W W B  W - W
B - B  B - B  B - B    B - B   B - B    B W -  B B -  B B -
                         ^                ^
                         |                |
                       white            white
                       wins             wins

Well now, maybe we know a little more than before.  In fact, 
we can see two boards there that indicate victory for our 
opponent, and we could probably make some reasonable attempts 
at estimating how close the other boards might be to either a 
win or a loss for us.  However, we could also look ahead yet 
another move, and then another, and so on until we had mapped 
out all the possibilities.  The problem with doing this is 
that it's going to cost us lots of computational resources.  
This may not be a big deal when we're playing hexapawn with 
only three pawns on a side, but it will be a big deal if we 
extended the game to eight pawns on a side, for example.  Or 
maybe instead of hexapawn variations, we're playing something 
like chess.  Now the computational expense will be far too 
prohibitive, so we're going to have to settle on some 
arbitrary cutoff for looking ahead in this or any game.  
Since I'm running out of room to display all the possible 
boards at the same level, let's make life easy for me and set 
our arbitrary cutoff for looking ahead at two levels or two 
moves ahead.


V.  The static board evaluation function

Above it was noted that two of those bottom-level boards were 
wins for white.  But what about those other boards?  What do 
they indicate for us?  Will they lead to wins or losses for 
us?  How can we estimate that?  How can we get a computer to 
estimate that?

Providing that estimate is the job of something called the 
"static board evaluation function".  A static board 
evaluation function takes as input the current state of a 
game (i.e., the board) and returns a value corresponding to 
the "goodness" of that current state or board.  By "goodness" 
we mean how close that board is to a victory for us---the 
closer, the better.  A simple static board evaluation 
function might return, say, a positive number if the board is 
good for us, a negative number if the board is not good for 
us (but is consequently good for our opponent), and maybe a 
zero if the board is neither bad nor good for either player.

How might we design such a function?  Here's a weak first 
attempt at one.  Since we're playing on the black side of the 
board, we'll have the function return a +10 if the board is 
such that black wins.  And we'll have it return a -10 if 
white wins.  (If we were playing on the white side of the 
board, we'd want it to return a +10 if white won, and a -10 
if black won.)  Since we win if we can get one of our pawns 
across the board to the other side, we should have the 
function take that into account too.  So if neither side has 
won, let's have our function return the number of black pawns 
with clear paths in front of them minus the number of white 
pawns in front of them.  Oh, and since we win if our 
opponent's pawns are all removed from the board, let's have 
the function incorporate that.  We'll have the function count 
the number of black pawns on the board, subtract the number 
of white pawns, add that number to the previous number, and 
return that result.  There, that wasn't so bad, was it?

Oh, please note that the evaluation function above is 
slightly different from the one that is used in the 
accompanying powerpoint slides.  I simplified the
one on the slides to streamline the lecture a little bit.
The difference doesn't change things much, but the 
evaluation function used in the slides may return different 
numbers than what you'll see here as the example progresses.


VI.  The minimax procedure

Now that we have a reasonably well-defined static board 
evaluation function, how do we use it?  Remember that the 
idea behind creating this thing was to estimate the 
"goodness" of a board---in this case, the function returns a 
positive number when the board is good for us, and a negative 
number when the board is good for our opponent.  Let's apply 
the function we defined above to all the boards at the bottom 
level of the hexapawn state space we generated way back up 
there:

                                   W W W                   
                                   - - -                   
                                   B B B                   
                                  /                        
                                 /                         
                                /                          
                               /                           
                              /                            
                             /                             
                           - W W                           
                           W - -                           
                           B B B                           
                           / | \                           
                        /    |    \                        
                     /       |       \                     
                  /          |          \                  
               /             |             \               
            /                |                \            
         /                   |                   \         
       - W W               - W W               - W W       
       B - -               W B -               W - B       
       B - B               B - B               B B -       
     /   |   \              / \              /   |   \     
    /    |    \            /   \            /    |    \    
   /     |     \          /     \          /     |     \   
  /      |      \        /       \        /      |      \  
- - W  - - W  - W -    - W -   - W -    - W W  - - W  - - W
W - -  B W -  B - W    W B W   W W -    - - B  W W B  W - W
B - B  B - B  B - B    B - B   B - B    B W -  B B -  B B -

  0      1      1       -10     -1       -10     0     -1

The two boards that have been assigned a value of -10 are, of 
course, the boards that represent victories for white.  In 
the case of the leftmost of those two boards, it's a victory 
for white because it's now our turn (i.e., black's turn) and 
we can't move any of our pawns.  The rightmost of these two 
boards is a victory for white because white has moved one of 
its pawns all the way across the board.

But let's take a look at another board.  The one at the very 
left, for example, has been given a value of zero.  Yet it's 
easy for us to see that, since it's our turn, and we only 
have one black pawn that we can move, if we just move that 
one black pawn forward one space, we'll have blocked any 
possible move by white and we win the game.  So why isn't 
that board given a value of +10?  Because in order to figure 
that out, our static board evaluation function would have to 
look ahead one more move.  But a static board evaluation 
function is exactly that---static.  It doesn't look ahead.  
If we set a limit on the number of moves we want to look 
ahead in order to play the game in a reasonable amount of 
time, but then we have our board evaluation function look 
even further ahead, we're going to eat up additional 
computing resources that we were trying to save, and we're 
also going to end up writing the same code twice.  So there's 
absolutely no advantage to having the board evaluation 
function look ahead an additional move or two or three---
instead, we should just readjust our original depth cutoff so 
that it allows us to look more moves into the future.

Now let's go back and look at those bottom two levels in our 
hexapawn state space:

       - W W               - W W               - W W       
       B - -               W B -               W - B       
       B - B               B - B               B B -       
     /   |   \              / \              /   |   \     
    /    |    \            /   \            /    |    \    
   /     |     \          /     \          /     |     \   
  /      |      \        /       \        /      |      \  
- - W  - - W  - W -    - W -   - W -    - W W  - - W  - - W
W - -  B W -  B - W    W B W   W W -    - - B  W W B  W - W
B - B  B - B  B - B    B - B   B - B    B W -  B B -  B B -

  0      1      1       -10     -1       -10     0     -1

What can we do with those numbers that have been assigned to 
the boards?  Those boards all represent possible results of a 
move by white.  Those numbers can be used to tell us which of 
those moves white is more likely to make.  For example, in 
the leftmost subtree, we might guess that white is more 
likely to make the move that results in a board with value 0 
than the moves that result in boards with value 1, because a 
board with value 0 is better for white than a board with 
value 1, which favors us.  That assumes, of course, that we 
trust our evaluation function.  Similarly, in the middle and 
rightmost subtrees, white is going to prefer the moves that 
result in a board with a value of -10 (a victory for white), 
right?  We can indicate those preferences by taking the 
minimum values among those board values in each subtree and 
propagating them up one level: 

                           - W W                           
                           W - -                           
                           B B B                           
                           / | \                           
                        /    |    \                        
                     /       |       \                     
                  /          |          \                  
               /             |             \               
            /                |                \            
         /                   |                   \         
       - W W               - W W               - W W       
     0 B - -           -10 W B -           -10 W - B       
       B - B               B - B               B B -       
     /   |   \              / \              /   |   \     
    /    |    \            /   \            /    |    \    
   /     |     \          /     \          /     |     \   
  /      |      \        /       \        /      |      \  
- - W  - - W  - W -    - W -   - W -    - W W  - - W  - - W
W - -  B W -  B - W    W B W   W W -    - - B  W W B  W - W
B - B  B - B  B - B    B - B   B - B    B W -  B B -  B B -

  0      1      1       -10     -1       -10     0     -1

OK, now how can we use that information?  We use it in almost 
exactly the same way as we did before.  We can figure out 
which move we should make of the three that are available to 
us by finding the maximum of the values that we just 
propagated upward.  One of those values was a 0 and the other 
two were -10.  Of course, the 0 is better for us, so we'd 
choose to make that move.

Let's go back and see what we've done here.  First we started 
with some arrangement of pawns on the board and the knowledge 
that it was our turn.  We generated all the moves we might 
make, and then we generated all the moves that our opponent 
could make after we made our move.  We arbitrarily chose to 
look only two moves ahead, but we could have looked further 
if we wanted to give up the computational resources to do so.  
We then applied our static board evaluation function to the 
bottom-most boards (i.e., the terminal leaves on the tree 
that is our state space) and assigned a numeric value 
corresponding to "goodness" to each of those boards.  Those 
bottom-most boards are each the result of a possible move by 
white.  We assumed that white would always make the best move 
it possibly could, so we propagated the minimum values up 
from the leaves to the immediate parents.  And then we 
assumed that we would want to make the best possible move 
that we could, and we chose that move by selecting the 
maximum of the values that had just been propagated upwards.

Because we chose to look ahead only two moves, the first 
propagation was of minimum values from the very bottom level, 
followed by a propagation of maximum values upward from that 
level.  If we had chosen to look ahead three moves, we'd 
first propagate maximum values from the bottom, then 
minimums, then maximums.  If we were looking ahead four 
moves, we'd start with minimums, then maximums, then 
minimums, then maximums.  And so on, and so on.  The 
procedure that we just described has a name, "minimax", and 
it's the heart of game-playing computer programs.

The minimax procedure relies on two assumptions.  First, 
there must be an adequate and reasonably accurate board 
evaluation technique.  It doesn't have to be perfect, but it 
does have to work more often than not.  The second assumption 
is that the relative merit of a move becomes more obvious as 
you search deeper and deeper into the state space.  Why?  
Because if that weren't so, there wouldn't be any value in 
doing the search in the first place.  But keep in mind that 
for any given game, or at least any given implementation, one 
or the other (or both) of these assumptions may not be true.

Last revised: October 30, 2004