intro

A very basic binary reversing.

file-archive
3KB
archive

In this binary, we will look at the basic toolkit that we use to disassemble and analyze binaries. This binary isn't meant to be difficult but rather as an introduction to assembly.

First Checks

When you download the binary, the first thing I recommend is to run it. Because you downloaded the binary online, execution will be turned off by default. To fix this:

$  chmod +x intro
$  ./intro

We are provided the following output:

$  ./intro
Enter the flag here: flag
Nope, try again!

We need just to enter the flag, and it will tell us if we get it right.

Second, we check the security measures of the file. Use checksec for this. checksec intro prints the following:

[*] '/home/joybuzzer/Documents/vunrotc/live/00-introduction/intro/intro'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    No canary found
    NX:       NX enabled
    PIE:      PIE enabled

Let's break this down:

  • Arch: amd64-64-little - This means that the binary is compiled for a 64-bit architecture. This is important because it means we will use 64-bit registers and instructions.

  • RELRO: Full RELRO - This means that the Global Offset Table (GOT) is read-only. This is important because it means that we cannot overwrite the GOT to redirect execution to a different function. More on this later.

  • Stack: No canary found - This means that a Canary does not protect the stack. This is important because it means we can overwrite stack data without the program knowing.

  • NX: NX enabled - This means that the stack is not executable. This is important because it means we cannot put outside instructions into the program and force it to execute them.

  • PIE: PIE enabled - This means the binary is compiled with Position Independent Executable (PIE) enabled. This is important because it means that the binary will be loaded into a random location in memory. More on this later.

What does all this mean?

  • We are dealing with a 64-bit binary.

  • If we wanted to write to a specific place in memory, we can't because the stack & heap will be in different places every execution.

Let's start disassembling this binary.

Disassembly

Running this binary with gdb will let us see the assembly behind the program. To do this, we run gdb intro. From here, the best thing to check is the available functions (info functions):

Most of these functions are standard library functions. The ones suffixed @plt are external library functions. The only function that we are interested in is main. Let's take a look at the assembly for main (disas main):

Now this is a lot. The way I recommend doing this is to go from external function call to external function call, understanding what gets passed to them. Let's do this one at a time.

Hold on, One more thing

We need to understand how 64-bit functions pass parameters. These are done via registers. The first six parameters are passed in the following registers:

  • rdi (sometimes as edi, the lower 4-bytes of rdi)

  • rsi (sometimes as esi, the lower 4-bytes of rsi)

  • rdx (sometimes as edx, the lower 4-bytes of rdx)

  • rcx (sometimes as ecx, the lower 4-bytes of rcx)

  • r8 (sometimes as r8d, the lower 4-bytes of r8)

  • r9 (sometimes as r9d, the lower 4-bytes of r9)

The difference between rdi and edi is confusing. rdi and edi are connected to the same storage location, meaning that changing edi affects rdi, and vice versa. edi is just the bottom half of rdi, as we see below.

Assembly often passes parameters, especially parameters with small values, using the lower four bytes (i.e. edi). Assembly does this because it's faster but is no different than passing all 8 bytes. The same goes for rsi, rdx, rcx, r8, and r9.

Now, let's continue.

malloc

malloc() in C is how we allocate memory. It makes space on the heap for us to use. If you want to see the parameters for malloc(), we look at the man pages. This is where the function header is defined. man malloc shows the following:

This tells us that malloc() takes a parameter: the number of bytes passed to it. Since we know that rdi is the register that always holds the first parameter, we check above the function call for rdi being stored.

Here, we see that edi (aka rdi) is being loaded with 0x20 (32). This means that the equivalent function here is malloc(0x20), which allocates 32 bytes on the heap.

The str variable resides in memory at address rpb-0x8. Immediately after the malloc returns, register rax contains the malloc return value, and that value is stored in the memory at address rpb-0x8 (i.e., it is stored in str).

printf

Let's check the arguments for printf.

The man pages aren't super helpful here. printf just prints text to the screen. rdi will have the string in memory that we want to print.

The first command is the most confusing. lea stands for Load Effective Address, which takes the address of the memory location and stores it in the register. This is the same as mov rax, 0x2004. The next command moves the address of the string into rdi, and the last command calls printf. Why does assembly use lea? lea is faster than mov because it doesn't have to access memory to get the value. It just gets the address and stores it in the register.

After this, it moves this address into rdi. If we check what's at this address, we see the following:

This makes sense with what we saw in the program output.

fgets

Let's check the man pages to see what fgets does.

This tells that fgets() is an input function. Since fgets() takes three arguments, we need to find what rdi, rsi, and rdx are to understand what's going into the function.

Let's find the parameters.

  • We see that rdi is loaded with rax, which is loaded with the address at rbp-0x8. From before, we know that this is where the malloc happened.

  • rsi is loaded with 0x20, meaning we are writing up to 0x20 bytes.

  • rdx is loaded with the address of rip+0x2e0b (which GDB tells us is at 0x4010). GDB informs us that this is stdin, which is the stream that the data comes from.

From this, we know that the C code looks like this:

What does this mean? From earlier, we know that when malloc is called, data is written on the heap. This means that we're writing data to the heap, rather than the stack.

circle-exclamation

strcmp

strcmp, or string compare, is how C compares strings. The man pages show us that it takes 2 arguments -- the strings to compare.

Let's go find rdi and rsi.

From this, we notice two things:

  • rdi is loaded with the address of rbp-0x8, which is where the malloc happened. This means that rdi is the first string.

  • rsi is loaded with the address of rip+0xdf9, which is 0x201a. This is not something that we loaded. We notice that it is based on the instruction pointer (because of PIE), which tells us that it is something hardcoded. More on this later, but for now, let's check what's there.

As we expected, this is our flag. The following puts statements aren't really relevant to us, but we can guess by the code there that it prints a "yes" or "no" based on the return value of strcmp.

Conclusions

Based on this program, we see that the flag was hardcoded. This is not going to be the case in 99% of the binary exploitation problems. There are going to be some reverse engineering problems where the flag is obfuscated, and your challenge is to figure out how it is converted, but not for binary exploitation problems.

For your reference, here is the source code:

PS: You could have run strings intro | grep flag to find the flag. strings returns the hardcoded strings in binaries, so it's not a bad thing to check as a first step.

Last updated