Say you've got this ridiculously large system of equations that you need to solve. Well, you might try to sit down and do it by hand. After the first ten pages of paper, you'll come to the realization that computers are really good at computing things. Maybe it'd be easier to just whip up some code to do it for you. There's always MATLAB, but why not reinvent the wheel?

First, take all your equations and represent them as the matrix A. Next, take the solutions to these equations and represent them as the column vector b. The x values will be represented by the column vector x. This is the familiar form **Ax = b**.

The first step in solving the system is getting the A matrix into upper triangular form. You can do this via Gaussian Elimination (or Givens rotations, or a variety of other methods). I'll produce methods for doing those sorts of things eventually. Now having your upper triangular matrix A and a modified form of b, we can get to the back substitution.

But what exactly is back substitution? Well first, we should take a look at an example system of equations. Say you've reduced a 4x4 matrix into upper triangular form like so:

| 1 2 3 4 |
| 0 5 6 7 |
| 0 0 8 9 |
| 0 0 0 10 |

This represents the system:

x_{1} + 2x_{2} + 3x_{3} + 4x_{4} = b_{1}
5x_{2} + 6x_{3} + 7x_{4} = b_{2}
8x_{3} + 9x_{4} = b_{3}
10x_{4} = b_{4}

You'll notice that there's four equations and four unknows. This is a good thing. You may also notice that it is quite easy to solve for x_{4} - just divide b_{4} by 10 and you'll have it. So now we've got one of the unknowns solved for. Then take this x_{4} and substitute it back into all the other equations (hence the name).

Now, x_{3} is easy to solve for: subtract b_{3} - 9x_{4} and divide that by 8. And now you've got x_{3}. Substitute that into all the other equations and keep going. This is a very easy algorithm to code. Here's pseudocode for it. Translate it into your language of your choosing.

loop i = n ... 1 /* a loop on the rows of A */
loop j = i+1 ... n /* loop on columns */
b_{i} -= a_{ij} * x_{j}
endloop
x_{i} = b_{i}/a_{ii}
endloop

What about a brief analysis of back substitution? Sure, I can do that. First off, the number of operations for back substitution is as follows: n divisions, (n-1)n/2 multiplications, and (n-1)n/2 additions & subtractions. The time required using one processor is O(n^2). If we've got a whole bunch of processors (the same number as the number of equations), we can reduce this to O(n).

Back substitution is a relatively slow process. It's a fan-in algorithm, which means we are essentially solving things in a binary tree-like structure. This isn't particularly fast, and it does not parallelize well either. Luckily, there are other methods which are faster.