An important consideration in the design of

parallel algorithms is cost optimality.

The amount of

work an algorithm performs is the

run time of the algorithm multiplied by the number of processors it uses. A conventional (sequential) algorithm may be thought of as a parallel algorithm, designed for one processor. An algorithm is said to be cost optimal if the amount of work it does is the same as the best known

sequential algorithm.

It is easy to adapt a cost optimal algorithm to run on P processors in time N to run on P/n processors in time Nn (The most basic way being to simulate the running of P processors, assigning each real processor a (P/n sized) pool of virtual processors to simulate). Pathologically, it may be run on a single processor machine with the same

asymptotic runtime as the best sequential algorithm.

It is possible for an algorithm to be faster than the best sequential algorithm without being cost optimal, for example we can sort a sequence of length n by using n

! processors, each of which permute the sequence based on their processor id (constant time), and then check to see if it is sorted (

**O**n), for an

asymptotic run time of

**O**n. The best possible linear sorting algorithm has a run time of

**O**n.log(n), making this algorithm log(n) faster than the best sequential algorithm. Unfortunately, this parallel algorithm requires n

! processors

^{1} (as opposed to one for the sequential algorithm), meaning that it does n!/log(n) more work than then sequential algorithm. If n! processors are not available (which is pretty much guaranteed for all but tiny values of n), this algorithm will run significantly more slowly than the conventional one.

1 - There exist algorithms that sort in **O**n time on n^{2} processors, but the are much harder to explain. Rather than blindly permuting, they compare each entry to each other entry (in n^{2}), store the results, and use them to build a sorted list in constant time.