A Joint Probability Distribution (JPD) is a system used in
probability modeling to determine the likelihood of certain events happening, given certain other events happening.
For example, I wish to know the likelihood that my car is broken, given the fact that smoke may or may not be billowing out from under the hood. I will use two variables, Broken and Smoke, to indicate the probabilities of these events taking place. Here is a simple JPD for these variables.
===========================
| Broken | -Broken |
===========================
Smoke | 0.02 | 0.01 |
===========================
-Smoke | 0.06 | 0.91 |
===========================
What the table is telling us is that the probability P of my car being broken (Broken) and smoke coming up from under my hood (Smoke), P(Broken ^ Smoke), is 0.02, whereas the probability of my car not being broken (-Broken), even though smoke is coming from under my hood (Smoke), P(-Broken ^ Smoke) is 0.01.
This comes in extremely handy when trying to determine the conditional probability of two events. The probability of X happening given that we know Y and only Y is written P(X|Y). This is equal to the probability of both X and Y happening divided by the probability of Y happening, P(X ^ Y)/P(Y). So, P(X|Y) = P(X ^ Y)/P(Y).
We can use the JPD to easily computer conditional probabilities by expanding it a bit, to include the probabilities of the individual events.
==================================================
| Broken | -Broken | |
==================================================
Smoke | 0.02 | 0.01 | 0.03 = P(Smoke) |
==================================================
-Smoke | 0.06 | 0.91 | 0.97 = P(-Smoke) |
==================================================
| 0.08 | 0.92 | |
|=P(Broken)|=P(-Broken)| |
==================================================
The probabilities of the individual events can be obtained by summing over the rows or columns they occupy. Also, notice that P(Smoke) + P(-Smoke) = 1.00; obviously, the probability that something happens plus the probability that it doesn't must be 100%. So, using the JPD, the probability that my car is broken given the fact that smoke is coming from under my hood is:
P(Broken|Smoke) = P(Broken ^ Smoke) = 0.02 = 0.66
P(Smoke) 0.03
As nice as this is, a JPD does not end up being quite as useful as one would like. For starters, these probabilities may not be easy to compute. Second, a JPD with n variables requires n2 entries on the table. So, to use a JPD with a computer requires a whole bunch of space, and is way hard to make in the first place. Using Bayes' Theorem is much more efficient, and is at the base of most artificial intelligence probablistic inference systems. In fact, it is entirely possible that one may not be able to create a JPD without using Bayes' theorem in the first place.