The original 3DNow! instruction set consists of:
- FEMMS
- A faster version of EMMS. Always use this instead if you're targetting AMD processors specifically.
- PAVGUSB
- Averages unsigned bytes. This is not a floating point operation. According to AMD's docs, this is intended to speed up MPEG playback.
- PF2ID & PI2FD
- Convert between packed floating point and packed 32-bit integer.
- PFACC
- Adds "sideways" as compared to PFADD.
- PFADD, PFSUB, & PFSUBR
- Add or subtract respective elements of the source and destination. PFSUBR (subtract reverse) subtracts the destination from the source.
- PFCMPEQ, PFCMPGE, & PFCMPGT
- Packed floating point compare: equal, greater than or equal, and greater than.
- PFMAX & PFMIN
- Select maximum or minimum of the given packed floating point values. Pretty straight forward.
- PFMUL
- Multiply. Note that there is no divide instruction per se, you have to multiply by the reciprocal, see PFRCP.
- PFRCP, PFRCPIT1, & PFRCPIT2
- Calculate the reciprocals. This can be done with one instruction for low (14-bit) accuracy or with three and an extra register for hih (24-bit) accuracy.
- PFRSQRT & PFRSQIT1
- Calculate the reciprocal square root, multiply by the input values to get the square root. As with PFRCP, you can calculate the reciprocal square root to either low (15-bit) or high accuracy.
- PMULHRW
- Not a floating point operation: like PMULMH, but rounds instead of truncating.
- PREFETCH & PREFETCHW
- Suggest that data be loaded into the cache without actually using it. PREFETCHW hints that the memory will be modified. Otherwise it's a nop.
One benefit of 3DNow! instructions is that you can feely mix them with MMX instructions. For example, to get an absolute value, just use PAND to set the sign bit to zero.