SSE3 is Intel's 3rd generation Streaming SIMD Extensions. It is comprised of 13 new single instruction multiple data CPU instructions. Intel introduced SSE3 in 2004 with the Prescott core revision of its Pentium 4 CPU.

Here is a brief outline of what each new instuction does:
  • FISTTP (Store Integer and Pop from x87-FP with Truncation) behaves like the FISTP instruction but uses truncation, irrespective of the rounding mode specified in the floating point control word (FCW).
  • MOVSHDUP loads/moves 128-bits, duplicating the second and fourth 32-bit data elements.
  • MOVSLDUP loads/moves 128-bits, duplicating the first and third 32-bit data elements.
  • MOVDDUP loads/moves 64-bits (bits: 63-0 if the source is a register) and returns the same 64 bits in both the lower and upper halves of the 128-bit result register. This duplicates the 64 bits from the source.
  • LDDQU is a special 128-bit unaligned load designed to avoid cache line splits.
  • ADDSUBPS has two 128-bit operands. The instruction performs single precision addition on the second and fourth pairs of 32-bit data elements within the operands; and single precision subtraction on the first and third pairs.
  • ADDSUBPD has two 128-bit operands. The instruction performs double precision addition on the second pair of quadwords, and double precision subtraction on the first pair.
  • HADDPS performs a single precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the third and fourth elements of the first operand; the third by adding the first and second elements of the second operand; and the fourth by adding the third and fourth elements of the second operand.
  • HSUBPS performs a single precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the fourth element of the first operand from the third element of the first operand; the third by subtracting the second element of the second operand from the first element of the second operand; and the fourth by subtracting the fourth element of the second operand from the third element of the second operand.
  • HADDPD performs a double precision addition on contiguous data elements. The first data element of the result is obtained by adding the first and second elements of the first operand; the second element by adding the first and second elements of the second operand.
  • HSUBPD performs a double precision subtraction on contiguous data elements. The first data element of the result is obtained by subtracting the second element of the first operand from the first element of the first operand; the second element by subtracting the second element of the second operand from the first element of the second operand.
  • MONITOR sets up an address range used to monitor write-back stores.
  • MWAIT enables a logical processor to enter into an optimized state while waiting for a write-back store to the address range set up by the MONITOR instruction.
AMD has announced that will support the SSE3 instructions in future versions of its Athlon 64 processors.

Log in or register to write something here or to contact authors.