Precedent exploit models
The stack memory of a given program is meant to serve as a scratchpad to store temporary data and keep track of the functions running in a program [1]. Over the years, this stack memory has been vulnerable to various attacks from intruders trying to hijack systems. This is especially true in Von-Neumann architecture, which uses a single address space for data and program code. Among other housekeeping data, the stack stores the return address for function calls. In most stack related exploits, the attacker provides input data that is too long to fit in the buffer. This causes a buffer overflow, overwriting the return address of that function. If this payload is crafted just right, the attacker can execute malicious code and gain control over the system (stack smashing). These type of attacks are prevalent in memory unsafe languages (such as C and C++) where the performance is of greater importance than correctness, as well as in scripting languages such as JavaScript* [2].
To avoid these type of attacks, the stack memory was made non-executable (NX), which is known as known as Data Execution Prevention (DEP) [3]. While this prevents the injection of code on the stack, new attacks emerged which did not inject any new code. Attackers just prepared a fake frame on the stack and transferred program execution to the beginning of an existing library function (for example, return to libc). The advent of Address Space Layout Randomization (ASLR) made it hard for intruders to guess the address of the stack and the lib code on 64-bit systems, but on 32-bit systems, attackers could brute search for the return addresses within minutes [3]. Other techniques, such as format string vulnerability and read overflow, were used to attack 64-bit systems. To defend against these attacks, potentially dangerous functions were removed [4] from shared libraries altogether.
Introduction to ROP
But as they say, where there is will, there is a way. Attackers have succeeded in circumventing all of these defenses by coming up with a cool trick called Return Oriented Programming (ROP). ROP stitches together pieces of code ending with a ret instruction (gadgets)—hence the name return oriented—which are already present in programs, thereby bypassing DEP. These gadgets get their argument via mov, pop, etc, which are stored on the stack between addresses of the gadgets themselves [4]. Thus, if an attacker can overwrite the return address initially stored on the stack within the address of the first ROP gadget, the attacker can chain many such gadgets to execute any code they want to obtain control of the system. In a way, in ROP, the stack pointer acts as the instruction pointer and the ret instruction at the end of every gadget causes a jump to the next ROP gadget, which is analogous to a normal program.
Details of an ROP exploit
From the attacker’s point of view, successfully exploiting a system has constraints, since the exploit can only selectively execute existing, valid code on the victim system until shellcode entry (the attacker’s customized code) is fully set up. Here, valid code should belong to applications, shared libraries, or Just-In-Time compiled (JIT’ed) code. Given these constraints, the attacker’s only option to build a working exploit model is to leverage user input under the legitimate code set. In particular, data as input should include some encoded information to set up shellcode execution. The attackers’ ultimate goal is to have the system execute their custom code. Since they leverage the weakness of the system to accomplish this, the attack is known as an exploit type of attack.
Ingredients to set up shellcode execution
Most of the vulnerability elements mentioned in prior attack models are still leveraged today in the ROP attack model. Modern systems attempt to mitigate this by separating data and code by memory (or page) attributes. This separation means that an essential step for a successful attack to occur is to transform data input into byte-executable code. To do so, a page attribute change has to happen; for example, a system call should be executed under privileged mode. That raises the question of how to convert input data into such a system call?
In conventional program execution modes, there are always nested function calls and their returns. Stack memory is used to keep track of nested functions’ return site to maintain the correct program flow (control flow). What makes the stack interesting is that by design it is a mix of code and data. Besides logging return addresses, the stack incorporates local variables (scratchpad) as well as arguments of the callee. The subtle and critical operations of the stack are such that even a small contamination or slight misalignment might result in corruption of the stack, especially function returns.
This provides a hint as to how a small intentional stack overwrite can divert control flow to a designated system call. From there, it’s possible to overwrite stack memory and leverage input data.
Exploit type attacks use legitimate applications’ and libraries’ code; however, in reality, human-written code can contain many bugs known as vulnerabilities. One example of this is a buffer overflow. When a buffer is placed in stack memory, overflow of the buffer directly impacts stack data and the return addresses of currently nested functions. The buffer is crafted to look like a collection of new return addresses for control-flow changes. The base address of loaded images should be somewhat predefined; otherwise, diverted return addresses are nondeterministic. While the base address randomization is a very popular technique to mitigate exploits, it has limited entropy, and anti-ASLR techniques are already mentioned in some articles. [13, 14, 15, 16]
Exploit stages: sequential layout of the pieces
So far, we’ve identified some key steps and concepts: vulnerabilities, diverted control-flow, and system calls. Now we will describe the exploit attack model in stages to provide a complete, comprehensive description. While it is true that specific ROP exploit cases can be mutated, most of them have very distinct milestones for accomplishing successful attacks. Stages one through five occur during runtime, while the zero-stage is basically offline analysis.
Stage 0: Encoding input data
This stage is the most burdensome for attackers, but it is invisible from the defender’s perspective. Searching for useful code snippets offline is called harvesting. Encoding is done with a collection of return addresses and some values that include byte code for shellcode.
Stage 1: Vulnerability hit and stack contamination
Another way attackers can prepare to exploit a system is to find a bug in an application. This might be the first thing that attackers attempt. From a runtime perspective, this is the very first step and is the beginning of a ROP exploit. Unfortunately, there is no general rule to detect software bugs in real time without tool support. In this stage, once a bug is exploited, the stack is overwritten by data prepared in Stage 0.
Stage 2: First stage of ROP chains
Once the stack is overwritten, new returns are established around small code snippets (gadgets). A lot of Turing-complete gadgets can be identified in the x86 instruction set because it is very large. Because these instructions have different lengths, the same machine code can be interpreted as different instructions. The table below shows a few example of ROP gadgets:
Loading constants |
Pop register; ret |
Pop last element from stack (the constant), and store it in register. |
mov $constant,register; ret |
Move the constant to a register. |
Loading from memory |
mov [memory],register; ret |
Load some value in memory to a register. |
Storing to memory |
mov register,[memory]; ret |
Store some value in register to memory. |
Arithmetic operations |
ADD register, register
Add constant, register
|
Add/subtract/multiply/divide bitwise operations on registers. |
Other useful gadgets |
int 0x80;ret
call gs:0x10
|
System call instruction followed by ret will execute a kernel interrupt. |
Prior to calling critical system calls and their relevant Application Programming Interfaces (APIs), arguments for the API need to be established in stack memory. As far as the API is concerned, there are typically a couple of argument types:
- Providing memory page attribute changes.
- Allocating new memory space in heap space.
The first case typically consumes a greater number of gadgets during this stage than the second ROP chain stage (Stage 4). Information should be carrying memory range and desired memory page attributes.
Stage 3: System call execution
Each software platform has its own layered APIs that eventually do kernel and user mode transitions. After the system call, there are returns within the API layers. It is typically assumed to be safe within the relevant API ranges.
Stage 4: Second stage of ROP chains
After a system call, another set of ROP gadgets needs to be consumed to enter the attackers customized code. System calls to allocate arbitrary memory in Stage 4 require longer ROP chains to locate a randomly assigned address than the first argument type seen in Stage 2. Similar to this first argument type, the memory range is already provided prior to the system call. Thereafter, the remaining part of stage 4 is an additional control flow effort to enter and execute shellcode.
Stage 5: Shellcode execution
Once the attacker’s customized code is installed, this execution is a sign of success in ROP exploits. The shellcode can open network ports, opening the gates of the victim system.
These five stages should help you understand the overall steps of ROP exploits, as well as help you consider strategic defense policies. When considering your defense policies and mechanisms, remember that no single exploit mitigation effort can cover all stages of the ROP exploit chain. Hence, a successful approach attempts to mitigate multiple stages of the ROP exploit chain as part of a holistic defense strategy.
Challenges in defense mechanism
When considering defense mechanisms, keep in mind that maintaining control flow integrity (CFI) is one huge area necessary to protect against ROP attacks. CFI provides some collective coverage against other control flow exploits as well, like Jump and Call Oriented Programming (JOP and COP). There are a couple of challenges to overcome to achieve CFI by static analysis:
- The stack is mixed with return address and variables. Given an arbitrary stack snapshot, extracting return address and analyzing them is not clear.
- From the callee’s perspective, tracking back precedent callers is fundamentally limited, because the return site can be modified by pushing a different address than the instruction’s address right after the call site.
Runtime limitations like these might have promoted ROP exploits so far. Moreover, recent trends penetrate into very low-levels in or nearby kernel mode [12, 17]. There are also hardware-based, system-wide mitigation technologies proposed that we will discuss in a future article.\
Footnotes