Return Oriented Programming

Return Oriented Programming is a technique described by Erik Buchanan, Ryan Roemer, Stefan Savage, and Hovav Shacham that allows an attacker to execute malicious code without actually injecting any code. This is a stack smashing attack and is a generalized form of the return-into-libc attack. The idea is to overwrite the stack with a series of addresses that point to bits of instruction sequences found in existing subroutines in the program's address space. These sequences, called "gadgets", typically end in a return instruction.

To give some background, on Intel's x86 (IA-32) platform, regular execution works as follows:

Initial state of the stack
|---------------------------------------------------------------------|
|          Unused Stack                     |      Used Stack         |
|---------------------------------------------------------------------|
                                            ^     <-------- stack grows
                                            |
                            $esp (stack pointer) points here


The machine then executes some code
my_function:
    ...
    call my_label
    mov $edi, $eax
    ...

The call instruction does two things. It pushes the address of the instruction following the call, in this case the mov instruction, on the stack, and then does an unconditional branch to the address of my_label. The stack now looks as follows:

|---------------------------------------------------------------------|
| Unused Stack | Address of mov instruction |      Used Stack         |
|---------------------------------------------------------------------|
               ^                                  <-------- stack grows
               |                            
        $esp points here 


Machine executes my_label
my_label:
    ...
    ret

The machine then executes code starting from my_label. Once done, it executes the ret instruction. This instruction does two things as well. It pops the location to jump to from the stack and branches unconditionally to it, in this case to my_function. The stack now looks as it did initially:

|---------------------------------------------------------------------|
|          Unused Stack                     |      Used Stack         |
|---------------------------------------------------------------------|
                                            ^     <-------- stack grows
                                            |
                                     $esp points here

The stack is not used only for the call/return sequence but also to store local variables. This is what Stack Smashing attacks take advantage of. The idea behind these attacks is to overwrite the stack; specifically, to overwrite the return address with another address. One way of doing so is through a buffer overflow. Without going into too much detail:

int main (int argc, char * argv[])
{
  char myBuffer[1024];
  strcpy(myBuffer, argv[0]);
  return 0;
}

The code above very optimistically assumes that users of the program will pass, as an argument, a string that is less than 1024 characters long. If the user inputs a string bigger than 1024 characters, the program will crash with a segmentation fault. This is because myBuffer sits on the stack; writing more than 1024 bytes on the stack means overwriting other data that is needed for the program to function correctly, possibly including the return address. Exploiting this type of vulnerability is known as a buffer overflow attack, and is a way one might smash the stack.

Once an attacker has a way of smashing the stack, he/she can try to exploit the vulnerability in many ways. They could inject code and have the saved return address on the stack updated to point to the malicious code. However, this requires the stack to be executable, which isn't always the case. One way of getting past a non-executable stack is via an attack known as Return-into-libc. This involves updating the return address to have the program execute C Library functions like system() and exit() to get access to a root shell (command line prompt with administrator privileges). Return oriented programming takes this idea one step further.

First, the attacker needs to go through the vulnerable program's address space and find all the ret instructions; this is done in order to build a list of gadgets, which as mentioned before, are short sequences of machine instructions. Once they have this list, they need to figure out the sequence of gadgets such that, when executed in a particular order, will perform the malicious task the attacker wishes. Finally, they need to smash the stack and update it with an ordered list of addresses that correspond to the ordered list of gadgets. Recall that a ret instruction just pops an address off of the stack and then unconditionally branches to it. Hence, once the program uses the first updated return address on the stack, the rest of the malicious code just follows. This sounds simple, but as with everything in life, actually pulling it off is not an easy task.

Techniques have been developed to address this type of attack. One defense is known as Address Space Layout Randomization (ASLR). When ASLR is enabled, the Operating System loads shared libraries at different locations in memory every time a program is run. This means Return Oriented Programming cannot depend on hard coded addresses to overwrite the stack with. However, even this isn't completely sufficient as there are other techniques that attackers have devised to get past this. Another approach is for Operating Systems to track that a return address transfers control back to a location directly following a call instruction, but this introduces significant runtime overhead. To add to that, attackers do not need use return instructions; they can perform the attack with other branch instructions.

Overall, this is one of the more impressive and cooler breed of computer security attacks out there because it uses a program's own code against it. It goes to show that preventing the introduction of malicious code is not enough to prevent the introduction of malicious computation. It also demonstrates how important it is to program with security in mind; Return Oriented Programming, or any kind of attack, can only be applied if the program is vulnerable to begin with.

Sources:
http://en.wikipedia.org/wiki/Return-oriented_programming
http://cseweb.ucsd.edu/~hovav/dist/blackhat08.pdf
https://cseweb.ucsd.edu/~hovav/dist/rop.pdf

stack pointer	libc	We can't even sort out the space between people, we have no business building rockets.	optimistic
stack overflow	Unconditional	Technique	Overhead
Orient	Administrator	return	sufficient
branch	push	simple	exploit
stack	vulnerability	address space	buffer overflow
Segmentation fault	executable	security	String