Probability, when you really, REALLY get down to things, is the art of
counting things. That's all. Well, more specific than "things", probability is the art of counting
possibilities.
Beginning with the basics, let's flip a coin. Assuming a fair coin, there are two possibilities... heads or tails (let's not count "edge" for the time being, thanks). What's the "probability" of having the coin land tails up? Well, we just count the possibilities. There are two. Next we count how many of those possibilities includes our result. The answer is one. One divided by two is one half, or point five, or fifty percent. We also call this a 5050 chance. All that means is that there are two possible results, and one of those results is the answer we want.
To visualize this (visual diagrams are much easier to understand, right?) let's draw out a simple tree. It's absurdly simple now, but we'll be adding complexity soon enough.
__Heads
/
Coin(
\__Tails
Now we can clearly visualize the possibilities.
Lets say that we want to flip the coin twice instead of once. What's the probability of getting tails both times? Let's draw the tree for this one...
__Heads
/
__Heads(
/ \__Tails
/
Coin( __Heads
\ /
\__Tails(
\__Tails
We're looking at the last column now. Starting from the top, we have the case where the coin landed on heads twice, next is once heads, once tails, the next is once tails, once heads, and the final one is tails twice. There are four possibilities, and only one of those possibilities is tails twice, and so we say that the probability is one fourth, or twentyfive percent. On the other hand, the chance of having the two flips be different is back to 5050, no more, no less. There are two possibilities here where the two flips were different, and four choices. Two divided by four is one half.
Something you might want to notice though, is that if you were to flip the coin, note the result, and then ask "on my next flip, what are the chances the result is the same as this one?" the answer would be 5050. Because once we know what the result of the first flip is, we can eliminate the other branch from the probability matrix.
What if there are more than just two choices? What if, for example, we were to roll dice instead of flipping a coin. Using a standard six sided die, there are six choices. Rolling the result twice then offers considerably more choices, and makes the tree quite ungainly. Nevertheless, I shall offer you the ASCII image, but this will be the last time I make such a large chart (don't worry for my sanity, it involved mostly copy/paste).
/One

Two

/One+Three
 
 Four
 
 Five
 
 \Six

 /One
 
 Two
 
Two+Three
 
 Four
 
 Five
 
 \Six

 /One
 
 Two
 
Die+Three+Three
 
 Four
 
 Five
 
 \Six

 /One
 
 Two
 
Four+Three
 
 Four
 
 Five
 
 \Six

 /One
 
 Two
 
Five+Three
 
 Four
 
 Five
 
 \Six

 /One
 
 Two
 
\Six+Three

Four

Five

\Six
From this you can count that there are 36 distinct possibilities. If we're interested in the probability of rolling a total of 12, that is to say, a six and a six, then we can look at this list and say that one out of thirtysix possibilities include two sixes. But if this is what we're looking for, then instead we can trim the tree down a little bit...
(5/6)
__Not Six
/
/ (5/6)
Die( __Not Six
\ /
\__Six(
(1/6) \__Six
(1/6)
What I just did here is compress every NonSix possibility under a single name. In order to account for this though, I needed to find the
probability of any of those results occurring, and add them together. The
possibility for any one specific number showing up on a single throw of the die is one in six. There are five numbers which are not six. Now to get the answer for what the probability for rolling a 12, I just multiply the probabilities together for each leg of the tree. (1/6) * (1/6) = (1/36)... which is the answer we got by counting results on our huge
megatree, except that this time we didn't have to count. Go
arithmetic! Woo! You made our lives easier again!
Now then, what if, instead of rolling dice, we were pulling balls out of bingo chamber. The balls in the chamber are numbered 16, and there are two of each ball. At face value, you would think that the odds of getting a total of twelve using two pulls would be the same as throwing the dice twice, right? Let's set up the tree again and find out. Note that there are 12 balls in the chamber for the first pull, and 11 balls for the second pull.
(10/12)
__Not Six
/
/ (10/11)
Ball( __Not Six
\ /
\__Six(
(2/12) \__Six
(1/11)
The probability of getting a 12 is thus (2/12) * (1/11) = (3/132).
For decimal comparison,
(1/36) = 0.027777...
(3/132) = 0.0227272727...
What happened? Why aren't the two answers the same? The answer is that removing the one ball from the bingo chamber changed the count. It shrunk the probability matrix a little. In this particular case, the answer doesn't differ much at all, however, we're using very simple examples right now. Let's move up to two more complex examples.
Take the game of chess. It's a game of perfect knowledge, meaning that both players can see the entire game at all times. Theoretically, according to game theory, this means that there should be a certain set of moves, or a system that someone can follow, that is literally unbeatable. To find it in chess, you would first need to set up a probability tree like we've been doing above, except not the simplified kind. The entire tree, for every possible move. Then for every move, you'd take a look at that tree and see which moves involve the opposing player having the opportunity to force a loss to you, and not make those moves.
You can see right away that this tree would be tremendously large. Astronomically large, even. However, it does necessarily start trimming itself down as the game goes on. Moves are limited by pieces being blocked. As units are removed from the board, they can no longer be included in the possibilities. There are so many different game possibilities that we cannot even begin to imagine them all, but they are finite, and by following certain rules, you can cull it down even more, to the point where IBM can make a super computer that, just by looking up probabilities in a tree, can defeat a human chess champion.
But not so in go. Go is a game which does not limit itself much as the game goes on. Oh true, in go, you can't place a stone where a stone has already been placed, and it's possible to remove choices that are clearly suicide, but beyond that, the entire board is open. Even more frightening to someone trying to calculate a perfect game, the board is a nineteen by nineteen grid, which means that on the first move, there are 361 choices as to where to go! Compare this to the much more limited 20 possible first moves that can be made in chess (8 pawns which have two possible choices each, plus two knights that also have two possible moves each). And on the other player's turn, there are 360 choices, and then 359 choices.
Looking at just three moves then, which is barely enough to allow a corner stone to be captured if the player placing the corner stone lets it be taken, we have 361 * 360 * 359 = 46,655,640 possible games that can be played. And the number of possible games keeps rising exponentially. It is for this reason that brute forcing the game of go is considered a nighimpossible feat, whereas brute forcing chess is getting closer and closer to possible with each new technological advance (it still hasn't been done fully, but it will one day be possible).
Computers thus suck at go. Brute forcing is more or less not an option. Instead, we must try to build artificial intelligence routines and actually "teach" the dumb machine how to play. Unfortunately, computers are not very good at this at all, and so far, the very best computer go routines are considered amateur level at best, and most go routines that you can download and play against are middle or poor amateurs.
According to http://www.reiss.demon.co.uk/webgo/rankings.htm a program named Go4++ is the highest ranking program as of the year 2001.
Errata: I've just been informed that we're not actually anywhere near close to calculating all the games of Chess. See
Bremermann Limit. Fortunately,
Deep Blue doesn't have to do this, as each turn decreases the number of games it needs to search through, and besides, in most situations, you only need to go a few turns ahead. Beyond that, you can predict likely moves that an opponent would make, and sort the games that way. That's what the probability matrix is all about.
Even using these same techniques, Go's search tree for even a few moves ahead is absurdly large. A fully realized Chess tree would be downright managable in comparison.