display | more...

The cardinal sin of computer programming. You're supposed to initialize all memory carefully before using it. Many languages even have features to force you to initialize all your memory; others have tools like Purify (or BoundsChecker) which will scream at you for using uninitialized memory.

Of course, the global prohibition is nonsense. If initialization is very expensive, and if you're either:

  • not going to be using all the memory you allocate:
    For instance, you allocate a large vector of objects, of which you use a small portion by random access (so you don't want to initialize all objects, but you also can't initialize fewer objects, because you don't know which indices you'll be using!).
  • prefer to initialize items as required, to prevent a huge slowdown after the initial allocation,
then you need to use uninitialized memory. The following technique is part of folklore. It's described in Jon Bentley's book Programming Pearls (not to be confused with the Camel book Programming Perl!), I think, but didn't originate there. It also makes for a great interview question.


The idea is to use a huge array of memory without initializing more than a constant size of memory at any stage. This is what a filesystem could do to a partition to speed up formatting. The cost is 2 extra arrays of indices of the same number of elements. So, assuming indices are words of 4 bytes and objects are 64 bytes, you'll need to allocate 72 instead of 64 bytes for each element. On the other hand, you don't need to initialize anything, not even the two auxiliary arrays!

Here's the setting. The array we wish to allocate is A, with N elements. Elements of A may be very large. We allocate 2 additional arrays of indices into A, each with N elements, call them B and C. And we'll use one additional index n_used, giving the number of elements of A which have already been used.

Whenever a new element of A is accessed, we'll initialize it. That will be the n_used'th element of A which is used (and we'll increment n_used afterwards). The idea is to keep, in Bi, the value of n_used when Ai was first used. But how can we tell if Aj has already been used? Bj will tell us when it was used, but only if it's already been used! So CB will store the value of i.

It works, because we know that the elements of C already used are precisely C0, C1, ..., Cn_used-1. So we can precisely determine whether Ai has already been used or not, by testing if Bi points back at a "safe" value in C, and that value also points back at Bi:

Before using Ai:

if Bi ≥ n_used or CB ≠ i:
; Ai hasn't yet been used
initialise Ai.
Cn_used := i. ; i was the n_used'th element used.
Bi := n_used. ; Point back at n_used.
n_used := n_used+1. ; We've just used another element.
Ai may now safely be used.
Purify and its ilk will scream UMR and similar curses at you. Ignore them -- they're just jealous.

The only initialization needed is to set n_used to 0 after allocation. And only a constant amount of work is required for every used of an element of A. Of course, this work is greater than that required for just using A, but it's still constant.

If the sizeof Ai is quite small compared to the size of an index, we're still wasting too much memory. The solution is quite simple: bin together every k indices. Allocate N'=N/k elements for each of the arrays B,C. When you want to access entry i, check that bin j=i div k has been initialized as above; if it hasn't, initialize Aj*k, ..., Aj*k+k-1 and set Cn_used and Bj to indicate that this has been done. In a word, treat every k contiguous elements of A as a single object, and deal with these N/k "objects" using the basic technique.

When initialization of an entire array is too expensive, the above solution does work. And the next time someone mentions over lunch that you can't use uninitialized memory, you'll have a comeback that will make all other diners want to leave.

Log in or register to write something here or to contact authors.