display | more...

A rule of thumb in computing is that a program will spend 90% of its execution time in 10% of the code. One impact this has is the use of caching to contain those instructions which have been used recently, have been used often, and are near the current instruction 'physicially'.

Two types of locality have been noted: temporal locality and spatial locality. A data structure which uses locality of reference to achieve better long-term performace is the splay tree (a binary search tree with extra rules).

The claim that access to storage when computing is typically highly localized, i.e. that accesses which occur within a short time of each other tend to refer to memory addresses which are close to each other.

Since it is (usually, somewhat, hopefully, maybe, often, ...) true, much of computer architecture and systems architecture is about exploiting locality of reference. Caches are used (for both instructions and data). The cache is much smaller but faster than main memory. But locality of reference means cache misses are relatively rare, making caches attractive in improving performance. Operating systems typically employ virtual memory -- using main memory as a cache for disk storage. Again, this works because the desired page is almost always in memory, thanks to locality. Good OSes also try to ensure that the files themselves are stored contiguously on disk -- locality of reference will mean access to the files is faster, since reading nearby sectors is generally faster. (Interleaving doesn't come into this; think of "nearby sectors" as refering not to "geographic" layout but to the interleaved layout).

A modern CPU typically accesses memory for both instructions (code, program) and for data. Codes which contain few branches will obviously exhibit high locality of reference: unless a branch is taken, the next instruction fetched is adjacent to the current instruction! Even most branches are typically to near locations (loops with a small- to medium-sized body generally account for a large proportion of execution time). So locality of reference for instruction fetching is generally assured. (Note, however, that overly structured codes can spend a large portion of their time jumping between subroutines in different modules, and exhibit correspondingly poor locality of reference).

Data access is also often highly localized. In cases where this is not so (and if performance matters!), improvements to algorithms can help. After all, architectures are today optimized for codes with locality of reference --- so it makes sense to force locality onto your code!

For example, suppose we store matrices with adjacent elements in row major order. There are two ways to write a routine to add matrices: one can add all pairs of elements in row-major order, or in column-major order. The row-major loop will exhibit excellent locality of reference, while the column-major one will exhibit terrible locality of reference. For large matrices, the difference will be significant. Obviously, one should pick the row-major loop.

What about matrix multiplication? Here, the usual algorithm traverses one matrix in row-major order and the other in column-major order! Obviously, we cannot traverse both matrices efficiently. A change of algorithm is called for; various "blocking" strategies produce an end result by working on smaller blocks in the matrices, which exhibits better locality.

Log in or register to write something here or to contact authors.