Architecture Description Language
Closely related to hardware description languages, an architecture description language is a language which can be used to specify an instruction set architecture.
Motivation
People keep designing new and exciting instruction set architectures; for all sorts of new and exciting reasons: special purpose embedded processors might require special purpose instructions to do interesting and application-specific calculations; very small processors may benefit from instruction sets tailored to their application by removing instructions and indeed concepts to give a smaller, more compact instruction set; interesting new microarchitectural innovations may benefit from novel instruction sets (see IA64, for example); but mostly, people keep inventing new instruction sets because they don't want to pay the licensing fees to use someone else's existing instruction set.
Once something's called into existence, it's hard to get rid of it. This means that the number of ISAs in existence keeps increasing, and those whose business it is to supply compilers, assemblers, simulators and other tools to the software and embedded computing industries are faced with the prospect of having to port their tools to an ever-expanding variety of architectures in order to cover enough of the market to be able to turn a profit.
In the past (and to a large extent in the present), this has been a tedious and largely manual process. Instruction sets have a lot of similarities, but differ wildly in the details of things like instruction encoding, precise instruction semantics, register file sizes, and the timing of the execution of instructions on the processors implementing the instruction sets; they differ in a way that implies a lot of typing, and likely in places scattered throughout the source code of a compiler toolchain.
Hunting for these places, and modifying the relevant code, is something that has to be done for each tool in the toolchain, and for each new instruction set architecture that's to be supported. When engineers see something like this, their instinct is to draw these things together into one place so they can be replaced or exchanged in a modular fashion. Where the actual code involved has repetitive or predictable patterns, the next logical step is usually to invent a description language which can be used to specify only the part which varies, independent of the underlying code it needs to be plugged in to, thus abstracting the information from the code.
Enter the Architecture Description Language
The idea of an architecture description language is one which has obvious appeal to compiler tool-chain writers. Write a general purpose compiler and tool set, with architecture-dependent code generated automatically from a description of the target architecture (expressed in your ADL of choice), and compose a new architecture description for each new instruction set architecture you wish to port your tool-chain to.
So, how do we describe an instruction set architecture? Almost all our general purpose instruction set processors share a lot of common traits: they have instructions which are generally modelled as atomic state transformations, acting on some set of architecturally defined state (register file, special purpose registers such as program counter and flags registers) and data memory. So we have the three main components of most architecture description languages:
-
Instruction encoding: how an instruction is represented in program memory as an encoded binary data object; how operands are derived from the binary, and how each operation is identified (i.e.. the opcode).
The encoding specification features of most ADLs are strikingly similar in both concept and syntax to a C struct: they define the structure of the instruction word in terms of the purpose of bit fields within the instruction, and the conditions under which an instruction is identified as a particular operation.
-
Architectural state: the architecturally visible state of the processor, as may be visible to or manipulated by programs running on that processor. Defines the width, address ranges and semantics of the memory, the width and depth of the register file, any special purpose registers, etc.
-
Instruction semantics: the interesting bit, the instruction semantics are what the instructions actually do; how they transform the architectural state. These definitions tend to look a lot like imperative code in a conventional imperative programming language; a definition of the function of an "ADD rD, rA, rB" instruction in a modern instruction set architecture would likely look a lot like "R[rD] = R[rA] + r[rB]".
In imperative terms, this is just the equivalent of what the instruction does when executed. It's probably almost identical to what's written next to the instruction in the processor's handbook or compiler writer's guide.
These are the core features that all ADLs require in one form or another. Other commonly found features include:
-
Assembler syntax: a definition of the syntax of assembly code corresponding to instructions and their encoding.
-
Timing model: a model of the pipeline or instruction timing or similar information. Timing information is not strictly related to the instruction set architecture, but instead to a specific microarchitecture implementing the instruction set. Nonetheless, it's often included to meet requirements from various tools in the toolchain.
That should be about all we really need to completely specify an instruction set architecture. If we take a look through any architecture's instruction set reference, this is just about all the information there is, once we've distilled the true "information" from the unstructured human-readable text.
Applications of ADLs
Now we've established the information that can be represented by an architecture description language, what do we want to do with it? Almost the entire compiler toolchain, from the back-end of the compiler downwards require information that can be represented in one, definitive form, in the ADL.
Assembler
The assembler needs to know about the encoding of instructions and the assembler syntax in order to translate assembly code to binary instructions. For many assemblers and architectures, it may be argued that the assembler syntax is not necessarily part of the instruction set architecture, and indeed differing assemblers for the same architecture frequently use slightly different formats or names for the same instruction. In cases of ADLs which do not explicitly specify instruction assembly syntax, the assembler will supply its own idea of the assembly syntax, but will likely derive the names of opcodes from the architecture description.
Linker
The linker only needs a little general knowledge of the instruction set: encoded instruction sizes, and the format of load/store, and branch and jump instructions in order to relocate and link code.
Debugger / disassembler
Debuggers generally require more information about a target architecture than is available in an architecture description language: access to specific debugging features of the target architecture such as breakpoints and watchpoints, for example. However, disassembling instructions, the precise reverse of assembly, needs requires knowledge of assembly syntax and instruction encoding.
Compiler
The compiler is where it starts to get interesting. The compiler's job of generating assembly code to efficiently implement a high-level language program requires virtually all of the information included in an architecture description.
The instruction semantics description is used to guide instruction selection, to match the semantics of the selected instructions to the semantics of the high level language program being compiled.
The definition of the architectural state determines which registers can be used for operands and hence guides register allocation, and deals with mappings between special purpose registers and instructions which use them.
Any timing model present is used to guide instruction scheduling (as well as instruction selection) to attempt to avoid pipeline stalls caused by hazards and maximise parallelism in superscalar processors.
Instruction set Simulator
Instruction set simulators (or, if you prefer, "emulators") are invaluable development tools in the embedded systems domain, where the processor that your code will run on is not the same as the processor that's running your desktop workstation. An instruction set simulator requires information about the encoding of instructions in order to decode the instructions in an executable image. It requires knowledge of the architectural state and instruction semantics in order to simulate the execution of instructions.
A cycle-accurate simulator which simulates the behaviour of the processor on a cycle-by-cycle basis will require all of the above, as well as the pipeline or timing model if it exists in the description.
Processor Implementation and Synthesis
If a processor implementing the instruction set is being developed at the same time as the tools or the instruction set architecture, as is often the case in the embedded software industry and particularly for special-purpose instruction sets, the instruction set description in the ADL is a potentially valuable source of machine-readable information to feed into automatable processes in the development of the processor.
For example, automatically constructing an instruction decoder in synthesizable Verilog from the encoding information in the architecture description is a fairly straightforward step, and cuts out the tedious and potentially dangerous necessity to manually maintain a decoder as a separate entity.
Going the whole hog, it's possible to more or less automatically synthesise the micro-architecture of an entire implementation; if a pipeline model is included in the architecture description, decoder, functional units and register file specifications in synthesisable HDL can be generated which can then be synthesised, laid out and used to create a processor implementation in an FPGA or ASIC.
Human Readable Documentation
It may seem a bit of an anticlimax to go to all the bother of defining a machine-readable specification of an instruction set architecture, and then tout one of its benefits as "human readable", but the process of designing, implementing and verifying a complex instruction set processor and its surrounding toolchain is one which involves many people with differing objectives, and a clear, unambiguous and definitive specification of the system under development can form a valuable tool for communication between developers.
Existing Architecture Description Languages
There are a handful of ADLs around, many of which are, to varying degrees, proprietary since they were called into existence in response to processes of designing custom instruction set processors and toolchains, in much the same way that instruction sets themselves tend to multiply. A few of the more popular ADLs are:
LISA
A proprietary ADL, owned by ACE (Associated Compiler Experts) and used to drive their CoSy toolchain. Much research, including automatic HDL generation from architecture descriptions, has focussed around this language. Its syntax is similar to that of C, so is instantly readable to almost anyone who would have any reason to attempt to read it.
EXPRESSION
An open ADL used a little in academic research, it's based on a LISP-like syntax so is superficially similar to GCC's instruction set definition. Research based on EXPRESSION seems to be moving into the area of graphical design of instruction sets and ISPs.
ArchC
Something of a newcomer, ArchC is based on SystemC, and has perhaps too strong a focus on the generation of simulators. Since it's based on SystemC, it's difficult (nay, impossible) to parse the entire language without being able to parse the entirety of C++, or analyse instruction semantics without, again, understanding the semantics of C++. However, because the descriptions are executable C++ off the bat, generation of simulators is incredibly easy.
GCC Machine Description
Perhaps the easiest to come across in day-to-day life is the machine description language used by the GNU Compiler Collection, which includes most of the features listed above, in a compiler-oriented language that's halfway between LISP and C.