The PowerPC G3 family made famous by the various "G3"-named Apple Macintosh computers actually consists of two CPUs; The 740 and 750. Both are software- and bus-compatible with the PowerPC 603e and PowerPC 604e processors. The only difference between the two is that the 750 contains a L2 cache controller and the tag ram necessary to support the L2 cache, while the 740 has no capacity for L2 cache whatsoever. The 750 is the CPU featured in the systems from Apple Computer. "G3" is equivalent to "3G" as in "third generation"; The first generation implementation of the PowerPC ISA is the PowerPC 601, and the second is the PowerPC 603 and 604 processors, which share a core design.
FEATURES
HISTORY
The PowerPC G3 CPU is the direct descendant of the first RISC CPU ever brought to silicon - The IBM 801. This CPU laid the ground work for the ROMP architecture, which begat POWER (Used in RS/6000 servers and workstations, which begat PowerPC.) The PowerPC architecture is the result of a joint effort between IBM and Motorola with input from Apple Computer.
The G3 is the third generation CPU in the PowerPC line (hence the name); First generation was the 601 (MPC601), and the second was the (MPC)603 and (MPC)604.
ARCHITECTURE
A simplified Block Diagram of the G3/740/750 follows:
+--------+ +==========+============+
|System | # Instruction Unit #
|Register| +----------+------------+<-\
|Unit | # Fetcher | Batch # |
+--------+ +----------+ Processing + |
A # Instr. Q | Unit # |
| +==========+============+ |
| | |
| V |
\---------------------------------\ |
| | | | |
V V V V |
+---------+---------+ +---+ +---+ |
|Int. Unit|Int. Unit|<--|LSU|------>|FPU| |
+---------+---------+ +---+ +---+ |
| |
| |
| |
+----------+ | +-----------+ |
| MMU | | | MMU | |
+----------+<-------------/ +-----------+--/
|Data Cache| |Inst. Cache|
+----------+ +-----------+
A A
| /---------------------------+
| | |
+------------------------\ |
| | .......|........|.........
V V : V V :
+-------------+ : +----------------+ :
|Bus Interface|<-------->|L2 Cache Control| :
| Unit | : | And Tag RAM | :
+-------------+ : +----------------+ :
A 64b A 32b : A 64b A 17b :
| data | addr. : | data | addr. :
V V : V V :
+----------------+ : +----------+ :
| System Bus | : | L2 Cache | :
+----------------+ : +----------+ :
: :
: In MPC750 Only :
:........................:
As shown above, the G3 contains six execution units. These units are capable of processing three instructions simultaneously in superscalar fashion.
These units are:
- Two integer units
The two integer units share thirty-two GPRs (general purpose registers) for integer operands. IU1 can execute any integer instruction, and IU2 can execute all integer instructions except multiply and divide instructions.
- Load/Store unit (LSU)
The LSU has a two-entry reservation station. It features pipelined cache access, and a dedicated adder. It performs alignment and precision conversion for floating-point data, and alignment and sign extension for integer data. Further, it supports both big- and little-endian modes.
- Floating-point unit (FPU)
The FPU is fully IEEE 754-1985-compliant for both single- and double-precision operations, with a non-IEEE mode for time-critical operations. It has hardware support for denormalized numbers. The FPU uses thirty-two 64-bit floating point registers (FPRs) for single- or double-precision operands.
- System register unit (SRU)
The SRU executes CR logical and Move to/Move from SPR (special purpose register) instructions.
- Branch processing unit (BPU)
The BPU features a 64-entry (16-set, four-way set-associative) branch target instruction cache and a 512-entry branch history table.
There are also other significant functional units within the CPU.
- Memory Management Units (MMU)
The G3 CPUs have separate MMUs for instructions and data. They feature 52 bit virtual addressing, and 32 bit physical addressing, and will do address translation for 4-Kbyte pages, variable-sized blocks, and 256-Mbyte segments.
- L1 Cache
Attached to the MMUs are the 32kB instruction and data caches which are eight-way set-associative.
- L2 Cache Controller
The MPC750 (but not the 740) contains an on-chip L2 cache controller and tag memory to support offboard L2 SRAM cache of 256kB, 512kB, and 1MB. The cache controller supports core-to-L2 frequency divisors of 1, 1.5, 2, 2.5, and 3. It has a 17-bit address bus and a 64-bit data bus.
For those who are wondering how the 17 bit address bus is sufficient for 1mb of cache: "Depending on its size, the L2 cache is organized into 64- or 128-byte lines, which in turn are subdivided into 32-byte sectors (blocks), the unit at which cache coherency is maintained."2
- Bus Interface
The bus interface controller supports bus-to-core clock frequency ratios from 2x to 8x including every half-multiplier step in between. It provides a 64-bit, split-transaction external data bus with burst transfers, support for address pipelining and limited out-of-order bus transactions. In the MPC750 the L2 cache has its own bus interface unit.
The bus interface does both immediate single and four-beat burst data transfers when committing 8 (or less) and 32 byte reads or writes respectively, as it has a 64 bit data bus. 32 bytes is the size of an L1 cache block. Single-beat transactions are caused by uncacheable read and write operations that access memory directly (that is, when caching is disabled), cache-inhibited accesses, and stores
in write-through mode.2
Instruction Flow
The instruction unit in the G3 controls the flow of execution. Its fetcher loads instructions from the instruction cache into the instruction queue. The BPU extracts branch instructions from the fetcher. Branch instructions that cannot be resolved
immediately are predicted using either the MPC750-specific dynamic branch prediction or the architecturedefined
static branch prediction.2 The order of operation is preserved by not executing instructions which depend upon a pending branch operation.
A unit called the completion unit (not shown in the simplified diagram above) contains a six entry reorder buffer. The completion unit tracks intructions from dispatch through execution and retires them in program order from the two bottom entries in the
completion queue (CQ0 and CQ1). Instructions cannot be dispatched to an execution unit unless there is a vacancy in the completion queue.2.
Functional units such as the integer and floating point units write their results to the rename buffers so that they can be placed in an appropriate register. Load and store units communicate with the bus interface units to get data from or put data in system (off-CPU) memory. At this time the entry in the completion queue is cleared and another instruction may be dispatched.
REGISTERS
The G3 has two modes; User and Supervisor. There are various registers available in supervisor mode which are not accessible in user mode. Each special purpose register is 32 bits wide.
User Mode Registers
- General Purpose Registers
There are 32 general purpose registers, numbered GPR0 to GPR31. Each is 32 bits wide.
- Floating Point Registers
There are 32 floating point registers, numbered FPR0 to FPR31. Each is 64 bits wide.
- Floating Point Status and Control
The floating-point status and control register (FPSCR) contains the floating-point exception
signal bits, exception summary bits, exception enable bits, and rounding control bits needed
for compliance with the IEEE-754 standard.2
- Condition Register
The Condition Register CR is made up of eight four-bit fields which show the result of operations, and is used for branching.
- Count Register
The 32 bit count register CTR is used for branch-and-count instructions.
- XER
Contains the summary overflow bit, integer carry bit, overflow bit, and a field specifying the number of bytes to be transferred by a Load String Word Indexed (lswx) or
Store String Word Indexed (stswx) instruction.2
- Link Register
The link register LR is used to provide a branch target address and to hold the address to return to.
- Data Address Register
DAR holds the address of an access after an alignment or DSI exception.
- DSISR
DSISR defines the cause of alignment or DSI exceptions.
- Performance Monitor Registers
The user-mode registers UMMCR0 and UMMCR1 provide read access to the supervisor-mode registers MMCR0 and MMCR1.
The user-mode registers UPMC1 through UPMC4 provide read access to the supervisor-mode registers PMC1 through PMC4.
The user-mode register USIA provides read access to the supervisor-mode register SIA.
- Time Base Register
The 64-bit TB register maintains the time of day and operates interval timers. It consists of TBH (higher) and TBL (lower) 32 bit registers. It is read-only in user mode and read/write in supervisor mode.
Supervisor Mode Registers
- Machine State Register
Register MSR defines the processor state and is saved during exception handling, then restored.
- Segment Registers
The sixteen 32-bit segment registers SR0 through SR15 define the 4 gigabyte physical address space as 256MB segments.
- Block Address Translation Registers
There are 16 block address trasnslation registers which work in pairs to define blocks of memory; four pairs of data BATs (DBAT0U & DBAT0L, DBAT1U & DBAT1L, DBAT2U and DBAT2L, and DBAT3U and DBAT3L) and four pairs of instruction BATs (IBAT0U & IBAT0L, IBAT1U & IBAT1L, IBAT2U and IBAT2L, and IBAT3U and IBAT3L).
- Data Address Breakpoint Register
DABR supports the data address breakpoint facility.
- Decrementer Register
DEC is used to schedule decrementer exceptions.
- External Access Register
EAR is used to support the ECIWX and ECOWX instructions, used for external access.
- Processor Version Register
PVR is a read-only register which identifies the processor.
- Page Table Format Register
SDR1 specifies the page table format used in virtual-to-physical page address translation.2
- Machine Status Save/Restore Registers
SRR0 and SRR1 are used to restart an interrupted program when a RFI (Return From Interrupt) instruction is executed. SRR0 stores the execution address, and SRR1 stores the machine state (From MSR).
- Operating System Reserved Registers
Registers SPRG0-SPRG3 are reserved for operating system use.
- Hardware Implementation Registers
The hardware implementation register HID0 provides checkstop enables and other functions.
The hardware implementation register HID1 allows software to read the PLL configuration signals.
- Instruction Address Breakpoint Register
When the value of the instruction address breakpoint register IABR matches an instruction address it causes an instruction address breakpoint exception.
- Instruction Cache-Throttling Control Register
The instruction cache-throttling control register ICTC has bits for controlling the interval at which instructions are fetched into the instruction buffer in the instruction unit. This helps control the MPC750’s overall junction temperature.2
- L2 Cache Control Register
L2CR (only present in the MPC750) is used to set the cache type and features.
- Performance Monitor Registers
The monitor mode registers MMCR0 and MMCR1 are used to enable performance monitoring interrupt functions.
The performance monitor counter registers PMC1 through PMC4 are used to count specified events.
The sampled instruction address register SIA holds the EA of an instruction executing at or around the time the processor signals the performance monitor interrupt condition.
INSTRUCTION
SET
As befits a RISC CPU, all PowerPC instructions are encoded as single-word (32 bit) opcodes.
PowerPC instructions are divided into the following categories:
- Integer Instructions
Integer arithmetic, compare, logical, rotate, and shift instructions.
- Floating-point Instructions
Floating-point arithmetic, multiple/add, rounding and conversion, compare, and status and control instructions.
- Load/store Instructions
Integer and Floating-point load and store instructions, and atomic memory operation instructions.
- Flow Control Instructions
Branch and trap instructions, and condition register logical instructions (such as conditional jumps).
- Processor Control Instructions
Moves to/from special purpose registers and the machine status register, synchronize, instruction synchronize, order loads and stores.
- Memory Control Instructions
Cache management instructions, segment register manipulation, translation lookaside buffer TLB management.
Integer instructions operate on byte, half-word, and word operands. Floating-point instructions operate on
single-precision (one word) and double-precision (one double word) floating-point operands. The PowerPC
architecture uses instructions that are four bytes long and word-aligned. It provides for byte, half-word, and
word operand loads and stores between memory and a set of 32 GPRs. It also provides for word and doubleword
operand loads and stores between memory and a set of 32 floating-point registers (FPRs).2
The G3 CPUs provide hardware support for the entire 32-bit PowerPC instruction set, as well as several additional instructions. These are:
- eciwx, External Control In Word Indexed
- ecowx, External Control Out Word Indexed
- fsel, Floating Select
- fres, Floating Reciprocal Estimate Single-Precision
- frsqrte, Floating Reciprocal Square Root Estimate
- stfiwx, Store Floating-Point as Integer Word
References:
- MPC750 Product Summary Page. Motorola. (http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MPC750&nodeId=01M98653)
- PDF: MPC750 RISC Microprocessor
Technical Summary. Motorola, August 1997. (http://e-www.motorola.com/brdata/PDFDB/docs/MPC750.pdf)