A superscaler processor is one that accepts more than one instruction into the pipeline per clock cycle.

The Pentium 4 processor accepts up to 6 instructions per clock cycle, while emitting the results of 3 instructions per clock cycle. Why it emits fewer than it gets per clock and still performs well is left as an exercise for the reader.

This is different from a VLIW processor. Very Long Instruction Word processors accept multiple instructions per cycle, but they must be formed in groups, and certian 'slots' of the very long instruction can only perform certian operations - so a series of add instructions may only use two of, say, the four execution slots available per long instruction. This processor design relies on compiler intelligence to parallelize code, whereas a superscaler processor parallelizes the code dynamically, requiring no special compilation.