High Performance Fortran is an extension of Fortan 90 that exploits what is called data parallelism. data parallelism exists when a single operation is done over a collection of data.
Array operations in Fortran 90 allows us to write code that is data parallel. Consider, for example, that we want to add 1 to every element in an array A, and put the results in in another array, named B. In Fortran 90, we can simply write:
B = A + 1
Clearly, a sufficiently smart compiler can find the parallelism in this statement. Fortran 90 also has a base set of operations that can be implemented in parallel, such as matrix multiplication and vector dot products. However, there are limitations to Fortran 90 that HPF augments
Consider for example, zeroing out the diagonal of a square matrix A. There is no explicit matrix operation in Fortran 90 that can do this. In HPF, however, there is a FORALL statement that allows us to say:
FORALL (i = 1:n, j=1:n, i == j) A(i,j) = 0
Fortran 90 also provides directives that allows finer control of the concurrency. A HPF directive looks like a Fortran comment, and can thus be ignored by conventional compilers.
A basic HPF directive is the PROCESSORS directive, which allocates a number of "abstract processors", which is usually (but not necessarily) the same as the number of actual physical processors. For example, this directive allocates an array of 16 abstract processors:
!$HPF: PROCESSORS p(16)
Another useful directive is the DISTRIBUTE which distributes the data in an array along processors in a combination of cyclic, and or block modes. For example, this directive will distribute tha data in array A in block mode, and map them to the previously allocate processor array p:
!$HPF: DISTRIBUTE A(BLOCK) ONTO p
HPF, with its almost fully intrinsic model allows Fortran programmers to quickly move sequential code into parallel code. The use of directives allow people to experiment with various data distribution schemes without having to change a lot of code. Another good thing about HPF is that the program behaves exactly the same way whether run on a single computer or multiple computers.
However, there are parallelizable code that is not strictly data parallel, and these will not be recognized by the compiler. Also, in this implicit model, communication costs are hidden and and code that is inefficient in communication are often not easily detectd.
Ian foster, Designing and Building Parallel Programs, Addison Wesley