display | more...

An object file is the output file generated by an assembler or a compiler. It's got nothing in particular to do with object-oriented programming. In Windows, object files usually end in ".obj"; in UN*X and friends1, with ".o".

An object file contains the machine code and symbols defined in one source file, or translation unit. It's not an executable yet; some things are missing. Other things, which an executable doesn't really need, are present. It will contain zero or one (or possibly more, if it's code or data) of each of the following:

Code

.text, on some planets. This is where the machine code goes.

Data

Global variables — or constants, if the language does that, but it's up to the compiler to enforce constness; here, it's just zeroes and ones. Maybe literals, too. Whatever's Right.

Relocation information

Every symbol in the object file is located somewhere, and no manly language ever calls them by name at run-time. Instead, there'll be a number: A byte offset from the beginning of the object file (this is sometimes called a "Relative Virtual Address", or RVA). Somebody may MOV an RVA into a register or PUSH it on a stack prior to dereferencing it, or it may be an operand of a CALL instruction. Or whatever. The same is true with jump instructions: They jump to an RVA.

This object file will be linked with others, and none of them will get to start at (virtual!) offset zero when the image is loaded. All of the addresses in all of the object files being linked must be changed so that they're all relative to some common point: If we're calling someting at 0x0010 in foo.obj, and foo.obj gets linked in at offset 0x2000 in the executable, that 0x0010 will have to change to 0x2010. We can't blindly add the base address to everything in the file, of course.

One way to do it is to set flags on instructions with relocatable operands, but then the poor linker has to look at every instruction in the entire file, which takes time. If very few of their operands need to be relocated, most of that time is wasted.

The more fun way to do it is to have a relocation table: A list of RVAs of operands (the operand contains an RVA, but it's also located at another RVA) that'll need to be relocated by the linker. After the linker decides where to put the object file, it just waltzes through the table. If the table says a dword at offset 0x100 needs to be relocated, the linker goes there, adds the base address to whatever's there, and plugs the result back in. ELF and COFF do that.

Imports

The module may refer to symbols which are defined in other modules. This is similar to relocation: The compiler had no way of knowing exactly where the thing in question would ultimately be located in the executable, so it did the best it could. In the case of externs, "the best it could" was to fill in a zero and add an entry to the imports table. For each extern, listed by name, that table will have a list of RVAs of operands that should be pointing to it.

Exports

The module may define symbols which are visible to other modules2. If so, there'll be a table listing those symbols by name along with their RVAs. The linker can then build a grand table of all the public symbols defined in all the object files it's linking, berate the user if any names collide, relocate all those RVAs, and then start filling in the symbol addresses requested by all the import tables.

Somewhere in a header, they usually have the name of the source file. There may also be debug information, line numbers, and other goodies. As of this writing, there's a dandy writeup in ELF file format which goes into some detail about what might be found in one.

All of this explains why C++ (not C, as I erroneously stated and ariels was kind enough to straighten me out on) implementations give you two different kinds of "it ain't there" errors: The compiler doesn't know about any object files but the one it's working on at the moment, so it trusts the header files. It squawks not on missing stuff, but on missing declarations: It's the compiler that moans about "undeclared identifiers". The linker then comes along and complains if something really isn't there at all; that's an "unresolved external".



If there's any misinformation in here, for God's sake tell me, please. Thanks.





1 And don't say UN*X has no friends; that's mean. It's just... shy.

2 In C and C++, they're all visible by default. You have to use the file-scope static keyword to hide them.

Log in or register to write something here or to contact authors.