9 minute read

We all have heard this term over and over again. But most of us might have never tried (or even understood) how exaclty a buffer overflow can be used to run a completely different (and often malicious) code. Before we jump to performing such an attack, let us spend some time to understand some basics.

In this post, we are looking at performing a buffer overflow attack on a Linux OS running on a x86_64 machine (following the official System V AMD64 ABI).

Before we begin, this article assumes that you are familiar with stack, stack frame and function calls conventions on an x86-64 machine. If not I recommend taking a look at Stack frame layout on x86-64 by Eli Bendersky.

What is a buffer Overflow?

Wikipedia defines Buffer overflow as follows:

… buffer overflow, or buffer overrun, is an anomaly where a program, while writing data to a buffer, overruns the buffer’s boundary and overwrites adjacent memory locations…

In other words, a buffer overflows when a programmer writes more data than what the buffer can hold. This can be easily demonstrated using a C program as C does not perform any implicit check on array bounds. In fact the very strcpy function (char *strcpy(char *dest, const char *src)) defined in string.h does not check whether all the bytes in source would fit into the destination. If src is bigger than dest then the extra bytes in the src will result in an overflow of dest.

It is often advised to use strncpy() instead of strcpy() as the later is considered unsafe due to chances of an overflow.

Ok. So what if a buffer overflows?

When a buffer overflows, the extra byte(s) that do not fit into the buffer spills over to the adjacent memory locations. Depending on where the buffer is actually placed, the buffer overflow can spill into memory locations containing executable code, or any other important information such as return address (address/location of the instruction that is to be executed upon returning from a function call).

There are different types of buffer overflow attacks. In our example we try to modify the return address in a stack frame. By overwritting the return address with another address, we can make the execution to resume at any other memory location (and hence execute any code we want).

Now that we have set the background, lets get cracking!

Memory Layout

Programs make use of different regions of memory such as stack, heap, code etc. As you would already know by now, stack is where the variables and frames are stored. The actual location of these memory regions depends the system. They may vary from time the same program is run or it might be fixed.

Stack is where the return address (the address of the instruction to be executed upon returning from a function call) is stored. Pushing the return address to the stack is done by the caller, and is done as part of the callq instruction. So the very first entry in the stack frame of the called function (callee), would be this return address. In addition, you may also recall that stack is also used to store some varibles.

Manipulating the stack

As mentioned before the stack is used for storing varibles and also (temporarily) store address of certain instructions as well. The very fact that these two pieces sometimes are dangerously close to each other in a stack frame is what we are are going to exploit as to orchestrate such an attack in a poorly written C program.

Before we begin, let us have look at the poorly written C program that we are going to use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
    
/*
 * Demonstrate a simple buffer overflow attack
 */
#include <stdio.h>
#include <stdlib.h>

/*
 * A function that is never called!
 */
void not_used(void)
{
    printf("I am never here!\n");
    exit(1);
}

/*
 * Function to read an array and return the product of
 * all the elements in the array.
 */
long serial_mult(void)
{
    long a[5];
    int size;
    int i;
    long ret;

    printf("Enter array size\n");
    scanf("%d", &size);

    printf("Enter array the array\n");
    for (i=0; i < size; ++i) {
        scanf("%ld", &a[i]);
    }

    ret = 1;
    for (i=0; i < size; ++i) {
        ret *= a[i];
    }
    return ret;
}

int main()
{
    long prod;

    prod = serial_mult();
    printf("Product: %ld\n", prod);
    return 0;
}

main() blindly calls the serial_mult() function and prints the value returned by serial_mult(). serial_mult() asks the user for the number of integers to multiply. These numbers are then read from the shell using scanf() and is stored into a[]. Though a[] by definition can only hold upto 5 elements, serial_mult() does not put any limit on how many numbers the user can input. The very fact that there is no such check is what makes this program poorly written.

In addition, the program also contains another function not_used() and is no where called (or used) in the program. In the normal flow, we should never see this function getting called.

Compile the program using the command below. Do not use the -g, as we only want to look at the assembly in the debugger.

$ gcc -Og overflow.c -o overflow

If interested, we can also generate the assembly code from the binary using the objdump utility.

$ objdump -D overflow > disassembly.txt

First let us try to run the program with 5 numbers. It should print the product of all of them.

$ ./overflow
Enter array size
5
Enter array the array
1
2
3
4
5
Product: 120

Now let us try with 6 numbers. This shouldn’t work, since a can only hold upto 5 integers. But still it does work (magic??).

$ ./overflow
Enter array size
6
Enter array the array
1
2
3
4
5
1
Product: 120

What if we try 10 numbers?

$ ./overflow
Enter array size
10
Enter array the array
1
1
1
1
1
1
1
1
1
1
*** stack smashing detected ***: <unknown> terminated
Aborted (core dumped)

An example

First we try to find the location of the return address on the stack. Then we use the error prone array access to overwrite the return address and try to return from serial_mult() to an address where the program should’t be.

Since the stack grows down, a[] is located at a lower address than the location of the return address in the stack. If a[] overflows, we might end up writting all the way upto the return address. Now suppose that our stack frame looks something like this:

        ----    ---------------------
         |      |   return address  | 0x7bcdffffffff0038
         |      ---------------------
         |      |                   | 0x7bcdffffffff0030
         |      ---------------------
         |      |                   | 0x7bcdffffffff0028
         |      ---------------------
         |      |       a[4]        | 0x7bcdffffffff0020
   stack |      ---------------------
   frame |      |       a[3]        | 0x7bcdffffffff0018
         |      ---------------------
         |      |       a[2]        | 0x7bcdffffffff0010
         |      ---------------------
         |      |       a[1]        | 0x7bcdffffffff0008
         |      ---------------------
         |      |       a[0]        | 0x7bcdffffffff0000
        ----    ---------------------
                |                   |

If this is case, by writting 8 numbers, we can easily overwrite the return address, with the last number essentially being the new return address.

Let the attack begin!

The stack we saw is just an example explaining our approach. To find the actual location of a[] and the return address in out overflow.c program we make use of gdb.

Run the overflow debugger in gdb.

$ gdb overflow

At first we look at the disassembly of main() function to get the actual return address.

(gdb) disassem main
Dump of assembler code for function main:
   0x0000000000000889 <+0>:	sub    $0x8,%rsp
   0x000000000000088d <+4>:	callq  0x7e4 <serial_mult>
   0x0000000000000892 <+9>:	mov    %rax,%rdx
   0x0000000000000895 <+12>:	lea    0xe7(%rip),%rsi        # 0x983
   0x000000000000089c <+19>:	mov    $0x1,%edi
   0x00000000000008a1 <+24>:	mov    $0x0,%eax
   0x00000000000008a6 <+29>:	callq  0x680 <__printf_chk@plt>
   0x00000000000008ab <+34>:	mov    $0x0,%eax
   0x00000000000008b0 <+39>:	add    $0x8,%rsp
   0x00000000000008b4 <+43>:	retq
End of assembler dump.

In the above disassembly of main(), callq 0x7e4 <serial_mult> is the call to the serial_mult() function (which is located at 0x7e4). Upon returning from serial_mult() the program should ideally return to the instruction at address 0x892. So the return address in this call is 0x892.

Immediately upon entering any function (before any instruction of that function is excuted), the stack pointer will be pointing to the location of the return address in the stack. For this we set a break point at the very beginning of the serial_mult() function.

(gdb) break serial_mult

After we set the breakpoint as shown above we let the program run untill we hit the breakpoint.

(gdb) run

Upon hitting the break point, we can print the value of the stack register %rsp to get the location of the return address in the stack.

(gdb) print/x $rsp

You may also look into the location pointed to by %rsp to get the return address using print/x *<location>, where <location> is the value of the stack pointer.

Now that we have got the location of the return address in the stack frame of serial_mult(), the next step would be get the base address of a[]. This is nothing but the address of a[0]. Recall that we get the each number from the user using scanf() function. The second argument to scanf() is the address of the location where the number has to be stored. In fact, the very first call to scanf() within the for loop has the address of a[0] as second argument. From ABI, we know the very first argument is passed to callee using %rdi and the second argument is passed using %rsi. For this we can disassemble serial_mult and put a break point at the very first call to scanf().

(gdb) disassem serial_mult

and

(gdb) break *<address_of_call_to_scanf>

Once we reach this breakpoint we can print the value of %rsi to get base address of a[]. Using this address of a[] and the location of the return address, we can figure out how many numbers we need to input inorder to overwrite the return address.

Suppose its the 8th number that overwrites the return address. Lets try to give 0 as the 8th number. The program would try to fertch instruction at address 0x0 and shall crash.

Now what if instead of inputing 0, we give a valid address as the 8th number? For example the address of the function not_used()?

You can again use disassem not_used within gdb to get the address of our not_used function. Remember to convert the address to a decimal value before feeding it as the 8th number to the program.

Upon successfully overwritting the return address with the address of not_used function, you can see the print within the not_used function even though the function was never called anywhere in our program.

I am never here!

Since not_used() exits using the exit() function instead of returning back to its caller, we would never see the part of main() after the call to serial_mult() getting exectuted.

Summarizing

There are some limitations to the attack we did: we can only execute an existing function. But by being even more clever, we can feed even more numbers, overwriting beyond the return address, into main() function’s stack frame. Note that we can write any number we wish to. This essentially changes the behaviour of main() and/or other functions. What if we encode actual machine instructions into data using scanf(), set the return address to those injected instructions? If so, we are essentially executing the injected code within the program.

References

  1. Wikipedia page on bufffer overflow
  2. Stack frame layout on x86-64