intro
A very basic binary reversing.
Last updated
A very basic binary reversing.
Last updated
In this binary, we will look at the basic toolkit that we use to disassemble and analyze binaries. This binary isn't meant to be difficult but rather as an introduction to assembly.
When you download the binary, the first thing I recommend is to run it. Because you downloaded the binary online, execution will be turned off by default. To fix this:
We are provided the following output:
We need just to enter the flag, and it will tell us if we get it right.
Second, we check the security measures of the file. Use checksec
for this. checksec intro
prints the following:
Let's break this down:
Arch: amd64-64-little
- This means that the binary is compiled for a 64-bit architecture. This is important because it means we will use 64-bit registers and instructions.
RELRO: Full RELRO
- This means that the Global Offset Table (GOT) is read-only. This is important because it means that we cannot overwrite the GOT to redirect execution to a different function. More on this later.
Stack: No canary found
- This means that a Canary does not protect the stack. This is important because it means we can overwrite stack data without the program knowing.
NX: NX enabled
- This means that the stack is not executable. This is important because it means we cannot put outside instructions into the program and force it to execute them.
PIE: PIE enabled
- This means the binary is compiled with Position Independent Executable (PIE) enabled. This is important because it means that the binary will be loaded into a random location in memory. More on this later.
What does all this mean?
We are dealing with a 64-bit binary.
If we wanted to write to a specific place in memory, we can't because the stack & heap will be in different places every execution.
Let's start disassembling this binary.
Running this binary with gdb
will let us see the assembly behind the program. To do this, we run gdb intro
. From here, the best thing to check is the available functions (info functions
):
Most of these functions are standard library functions. The ones suffixed @plt
are external library functions. The only function that we are interested in is main
. Let's take a look at the assembly for main
(disas main
):
Now this is a lot. The way I recommend doing this is to go from external function call to external function call, understanding what gets passed to them. Let's do this one at a time.
We need to understand how 64-bit functions pass parameters. These are done via registers. The first six parameters are passed in the following registers:
rdi
(sometimes as edi
, the lower 4-bytes of rdi
)
rsi
(sometimes as esi
, the lower 4-bytes of rsi
)
rdx
(sometimes as edx
, the lower 4-bytes of rdx
)
rcx
(sometimes as ecx
, the lower 4-bytes of rcx
)
r8
(sometimes as r8d
, the lower 4-bytes of r8
)
r9
(sometimes as r9d
, the lower 4-bytes of r9
)
The difference between rdi
and edi
is confusing. rdi
and edi
are connected to the same storage location, meaning that changing edi
affects rdi
, and vice versa. edi
is just the bottom half of rdi
, as we see below.
Assembly often passes parameters, especially parameters with small values, using the lower four bytes (i.e. edi
). Assembly does this because it's faster but is no different than passing all 8 bytes. The same goes for rsi
, rdx
, rcx
, r8
, and r9
.
Now, let's continue.
malloc
malloc()
in C is how we allocate memory. It makes space on the heap for us to use. If you want to see the parameters for malloc()
, we look at the man
pages. This is where the function header is defined. man malloc
shows the following:
This tells us that malloc()
takes a parameter: the number of bytes passed to it. Since we know that rdi
is the register that always holds the first parameter, we check above the function call for rdi
being stored.
Here, we see that edi
(aka rdi
) is being loaded with 0x20
(32
). This means that the equivalent function here is malloc(0x20)
, which allocates 32
bytes on the heap.
The str
variable resides in memory at address rpb-0x8
. Immediately after the malloc
returns, register rax
contains the malloc
return value, and that value is stored in the memory at address rpb-0x8
(i.e., it is stored in str
).
printf
Let's check the arguments for printf
.
The man
pages aren't super helpful here. printf
just prints text to the screen. rdi
will have the string in memory that we want to print.
The first command is the most confusing. lea
stands for Load Effective Address, which takes the address of the memory location and stores it in the register. This is the same as mov rax, 0x2004
. The next command moves the address of the string into rdi
, and the last command calls printf
. Why does assembly use lea
? lea
is faster than mov
because it doesn't have to access memory to get the value. It just gets the address and stores it in the register.
After this, it moves this address into rdi
. If we check what's at this address, we see the following:
This makes sense with what we saw in the program output.
fgets
Let's check the man
pages to see what fgets
does.
This tells that fgets()
is an input function. Since fgets()
takes three arguments, we need to find what rdi
, rsi
, and rdx
are to understand what's going into the function.
Let's find the parameters.
We see that rdi
is loaded with rax
, which is loaded with the address at rbp-0x8
. From before, we know that this is where the malloc
happened.
rsi
is loaded with 0x20
, meaning we are writing up to 0x20
bytes.
rdx
is loaded with the address of rip+0x2e0b
(which GDB tells us is at 0x4010
). GDB informs us that this is stdin
, which is the stream that the data comes from.
From this, we know that the C code looks like this:
What does this mean? From earlier, we know that when malloc
is called, data is written on the heap. This means that we're writing data to the heap, rather than the stack.
Why is this important?
This means that we can't perform many of the stack overflow techniques we are going to learn. There are a series of heap overflow techniques, but they are relatively out of the scope of this guide.
strcmp
strcmp
, or string compare, is how C compares strings. The man
pages show us that it takes 2 arguments -- the strings to compare.
Let's go find rdi
and rsi
.
From this, we notice two things:
rdi
is loaded with the address of rbp-0x8
, which is where the malloc
happened. This means that rdi
is the first string.
rsi
is loaded with the address of rip+0xdf9
, which is 0x201a
. This is not something that we loaded. We notice that it is based on the instruction pointer (because of PIE), which tells us that it is something hardcoded. More on this later, but for now, let's check what's there.
As we expected, this is our flag. The following puts statements aren't really relevant to us, but we can guess by the code there that it prints a "yes" or "no" based on the return value of strcmp
.
Based on this program, we see that the flag was hardcoded. This is not going to be the case in 99% of the binary exploitation problems. There are going to be some reverse engineering problems where the flag is obfuscated, and your challenge is to figure out how it is converted, but not for binary exploitation problems.
For your reference, here is the source code:
PS: You could have run strings intro | grep flag
to find the flag. strings
returns the hardcoded strings in binaries, so it's not a bad thing to check as a first step.