McKinley is
Intel's name for the currently newest version of the
Itanium processor, which will be released under the name
Itanium 2. It runs Intel's
IA-64 Instruction Set Architecture (
ISA), which is a
VLIW architecture based off of
EPIC.
Hewlett-Packard and
Intel jointly developed the entire Itanium line and much of the old
Compaq Alpha Processor design team now works for Intel designing the two future
Itanium processors,
Madison and
Deerfield.
Itanium Features
Itanium has several design features from the beginning that allow the chip to operate much quicker than traditional processors. The Explicitly Parallel Instruction Computing (
EPIC) allows for
speculation,
predication,
prefetch/
branch/
cache instruction hints, and register stacking. The
interconnect technology allows for easy scaling for multiple processor systems. There also exists a radically increased
register file, which makes it easy to resolve register conflicts.
Runs old
IA-64 code, no
recompile neccesary. The biggest improvement probably comes from the new reduced
pipeline. McKinley has an 8 stage pipeline down from
Itanium's 10+ stage pipeline. The Core Pipe is made up of the following stages:
CORE - | IPG | ROT | EXP | REN | REG | EXE | DET | WB |
FPU - | FP1 | FP2 | FP3 | FP4 | WB |
L2 - | L2N | L2I | L2A | L2M | L2D | L2C | L2W |
The stages are defined as follows:
IPG - IP Generate, L1I Cache (6 inst) and TLB IP Generate, L1I Cache (6 inst) and TLB
ROT - Instruction Rotate and Buffer (6 inst)
EXP - Expand, Port Assignment and Routing
REN - Integer and FP Register Rename (6 inst)
REG - Integer and FP Register Rename (6 inst)
EXE - ALU Execute(6), L1D Cache and TLB ALU Execute(6)
DET - Exception Detect, Branch Correction
WB - Writeback, Integer Register update
FP1-WB - Floating Point Pipeline
L2N-L2W - Memory Access Pipeline
The new processor has improved cache latencies. Which makes cache miss, a huge performance hit on Itanium, much quicker. Faster FSB frequency allows the proccessor to interact quicker with the mainboard. Lower branch prediction penalties as well as a faster core clock frequency. With the shrink in transistor size, there is increased die area for more integer units and overall more ways to implement the potential 6 instructions per clock cycle.
Processor Features
McKinley has several improvements over the first Itanium processor (marked with an asterisk), and the following important design features.
System Bus
128 bits wide
200 Mhz/400 MT/s
6.4 GB/s*
Width
2 bundles per clock
6 integer units*
2 floating point units
328 total registers
2 loads and* 2 stores per clock
11 issue ports
Caches
L1 - 2 X 16 KB - 1 clock latency*
L2 - 256K - 5 clock latency
L3 - 3MB - 12 clk latency
32 GB/s bandwidth
Addressing
50 bit physical addressing*
64 bit virtual addressing
Maximum page size 4GB
"Intel estimates McKinley based systems will deliver ~1.5X – – 2X performance
improvement over today’s Itanium™ based systems based systems."
Sources:
ISSCC McKinley Design Improvement Paper.
Available at: http://www.cpus.hp.com/technical_references/ia64.shtml