An S-box (substitution box) is a table which maps an input onto an output in a cipher, typically a block cipher (though a few stream ciphers do use S-boxes internally). There are actually two different kinds of S-boxes, and a large number of tricks and techniques for making them secure and efficient.
First, ciphers that are SP networks use a different kind of S-box than Feistel ciphers do. These ciphers actually do need an invertible S-box, because the design involves putting each block of data through the S-box once each round (and then doing the reverse, with the inverted S-box, when decrypting). Typically, these are 8x8 S-boxes (that is, they take in 8 bits and output 8 bits). Examples of ciphers that use this technique are Rijndael, W (the cipher built into the Whirlpool hash function), and SAFER. Using a larger S-box (such as 16x16 or 32x32) is of course possible, but the large amount of memory that would be needed has prevented this from being common (for example, a single 16x16 S-box would require 128 kilobytes). Serpent actually uses a large set of 4x4 S-boxes. S-boxes of this variety are almost always fixed, due in large part to the general design of an SP network cipher.
Secondly, there are S-boxes like those used in DES, CAST-128, Twofish, and many other ciphers. These take in a smaller input (again, typically a byte, though not always), and output a larger amount (16-64 bits). These S-boxes are not, by design, invertible (they don't have to be); they are designed to maximize the diffusion of their input, and are often used in one way functions. That is because Feistel ciphers do not need, or want, their internal functions to be invertible (read the writeup in Feistel network for how this works).
So, now we get to the security bit. No matter how an S-box is being used, there are some attacks that we want to try and prevent, including linear cryptanalysis and differential cryptanalysis. How to do this. First, we want to make it hard (or impossible) to find any kind of linear approximation for the S-box. For example, if half of the time the output is 3*input+9, the S-box isn't going to do a very good job of things. Even much more subtle can cause a lot of problems, so a great deal of time is spent classifying and measuring the linearity of an S-box. Similar measures are used to prevent the various known differential attacks.
Finally, we reach designing efficient S-boxen. In software, bits are bits, and the exact values don't matter. But in hardware, it's a totally different story. A randomly generated S-box (or, even worse, a key-dependent one), requires that a hardware implementation keep the values of the S-box in memory, which is a) expensive, and b) slow. To get around this, a lot of research has been done on designing S-boxes which are both secure, and can be represented in the smallest number of gates possible. A common technique for doing this is to define a tiny Feistel cipher, which 'encrypts' the input using a fixed key to produce the S-box output. Of course, the design of this mini-cipher is carefully chosen so it produces a secure S-box. In software, one would just precompute all of the values, and store them in a table, since memory use isn't a big concern there.
I have just barely scratched on the surface of modern S-box design and use. For further reading on this topic, I recommend the following as a good start:
- Handbook of Applied Cryptography
- M. Matsui, "New Structure of Block Ciphers with Provable Security Against Differential and Linear Cryptanalysis", Fast Software Encryption 1995
- http://home.ecn.ab.ca/~jsavard/crypto/co4513.htm
- The Whirlpool Documentation Package: http://planeta.terra.com.br/informatica/paulobarreto/whirlpool.zip