SSE2 - Everything2.com

SSE 2 is Intel's third set of SIMD enhancements to their Pentium line of x86 CPU's, following in the footsteps of the original SSE and MMX. SSE2 is available on Intel Pentium 4+ CPU's (including P4 Xeon CPU's and Pentium M CPU's, the heart of Intel's Centrino chipset) and AMD's line of Athlon64 processors. SSE stands for Streaming SMID Extensions.

SSE2, originally marketed as a way to "make the Internet faster", builds on the original SSE instruction set. General consensus is that the original SSE extensions were not very useful, as the new instructions they offered were not a major improvement existing computational methods, and the added overhead of checking for the appropriate CPU before issuing the new commands made their use more cumbersome.

In order to understand the exact improvements offered by SSE and SSE2, one needs to understand the basics of how CPU's work:

The bit size of the CPU determines the size of the numbers the CPU works fastest on. Current desktop computers are all "32 bit¹", which means they operate natively on 32 bits of data at a time. That is, they work fastest when working on 32 bit numbers. Any number larger than 32 bits has to be broken into parts in order to fit into the 32 bit CPU. This is slow, comparatively. In addition, the original x86 CPU's were designed to work only on whole numbers (integers). Any fractions (floating point numbers) required some trickery to compute. Modern CPU's have a floating point unit, which is a part of the CPU designed to work on fractional numbers.

SSE's major contribution to the Pentium line was the addition of eight new 128 bit³ registers, the places CPU's store numbers while they're working on them. These new registers were four times the size of the standard 32 bit registers, however, the design of the original SSE only allowed for them to be treated as four 32 bit numbers all bunched (or packed) together, rather than one 128 bit number or two 64 bit² numbers. This ment that even though the CPU had a shiny new set of giant registers and a fancy new set of instructions, it still operated on 32 bits at a time.

SSE2 introduced a set of instructions that worked on those 128 bit registers SSE added but allowed the CPU to work on 64 bit numbers, rather than 32 bit numbers. This ment those new registers could be put to use calculating numbers larger than the standard 32 bit registers could handle and much faster than splicing together two 32 bit registers.

The applications of these new instructions and registers are very much more academic than practical. Most applications are fine operating on standard 32 bit numbers, even if they require the use of fractional numbers or numbers larger than 32 bits occasionally. The primary use for the new SSE2 instructions are mathematical computations requiring high degrees of precision or large numbers. For example, operations such as calculating the square root of a number or dividing large or fractional numbers are much faster when using SSE2 than without. For applications which perform these types of operations frequently, the use of SSE2 can speed up the process by an order of magnitude.

In addition to mathematical enhancements, SSE2 adds new instructions for moving data from the SSE registers to memory and back, and controlling when and where numbers are cached by the CPU. The vast majority of these enhancements are for serious mathematical purposes only, and the additional overhead of checking that these instructions are available (as well as providing alternatives if necessary) make writing programs which use them that much more complex.

It could be argued that the new instructions allow for more CPU intensive compression algorithms for video and music, thus making for smaller media files. Smaller files, in turn, would download faster, and thus seem to make the Internet faster. One could also argue that it's just a marketing ploy.

But wait! If I've already got eight new fancy 128 bit registers in my PC right now, why is everyone telling me I need to upgrade to a 64 bit CPU?

Well, first off, the new 128 bit SSE registers are only usable by SSE and SSE2 instructions. A 64 bit CPU has all of its important registers set to 64 bits in size, so no special programming is needed to make use of larger numbers. Also, 64 bit CPU's can address 64 bits of memory, meaning they can hold far more RAM than current 32 bit CPU's. The 128 bit SSE registers in your PC are much faster for working with 64 bit numbers, but are not as accessible to program for as a 64 bit CPU would be.

¹The largest number 32 bits can hold is 4,294,967,295 (2^32 - 1).
²The largest number 64 bits can hold is 18,446,744,073,709,551,615 (2^64 - 1).
³The largest number 128 bits can hold is 3.4028236692093846346337460743177 x 10^38 (2^128 - 1).

SSE3	Colossal Squid	Wanged	order of magnitude
SSE	MMX	SIMD	register
Seed	Centrino	MASM	ia32
Monster Rancher 2	gamecube	Pentium