intro
A very basic binary reversing.
In this binary, we will look at the basic toolkit that we use to disassemble and analyze binaries. This binary isn't meant to be difficult but rather as an introduction to assembly.
First Checks
When you download the binary, the first thing I recommend is to run it. Because you downloaded the binary online, execution will be turned off by default. To fix this:
$ chmod +x intro
$ ./intro
We are provided the following output:
$ ./intro
Enter the flag here: flag
Nope, try again!
We need just to enter the flag, and it will tell us if we get it right.
Second, we check the security measures of the file. Use checksec
for this. checksec intro
prints the following:
[*] '/home/joybuzzer/Documents/vunrotc/live/00-introduction/intro/intro'
Arch: amd64-64-little
RELRO: Full RELRO
Stack: No canary found
NX: NX enabled
PIE: PIE enabled
Let's break this down:
Arch: amd64-64-little
- This means that the binary is compiled for a 64-bit architecture. This is important because it means we will use 64-bit registers and instructions.RELRO: Full RELRO
- This means that the Global Offset Table (GOT) is read-only. This is important because it means that we cannot overwrite the GOT to redirect execution to a different function. More on this later.Stack: No canary found
- This means that a Canary does not protect the stack. This is important because it means we can overwrite stack data without the program knowing.NX: NX enabled
- This means that the stack is not executable. This is important because it means we cannot put outside instructions into the program and force it to execute them.PIE: PIE enabled
- This means the binary is compiled with Position Independent Executable (PIE) enabled. This is important because it means that the binary will be loaded into a random location in memory. More on this later.
What does all this mean?
We are dealing with a 64-bit binary.
If we wanted to write to a specific place in memory, we can't because the stack & heap will be in different places every execution.
Let's start disassembling this binary.
Disassembly
Running this binary with gdb
will let us see the assembly behind the program. To do this, we run gdb intro
. From here, the best thing to check is the available functions (info functions
):
gef➤ info functions
All defined functions:
Non-debugging symbols:
0x0000000000001000 _init
0x0000000000001080 __cxa_finalize@plt
0x0000000000001090 puts@plt
0x00000000000010a0 printf@plt
0x00000000000010b0 fgets@plt
0x00000000000010c0 strcmp@plt
0x00000000000010d0 malloc@plt
0x00000000000010e0 _start
0x0000000000001110 deregister_tm_clones
0x0000000000001140 register_tm_clones
0x0000000000001180 __do_global_dtors_aux
0x00000000000011c0 frame_dummy
0x00000000000011c9 main
0x0000000000001258 _fini
Most of these functions are standard library functions. The ones suffixed @plt
are external library functions. The only function that we are interested in is main
. Let's take a look at the assembly for main
(disas main
):
gef➤ disas main
Dump of assembler code for function main:
0x00000000000011c9 <+0>: endbr64
0x00000000000011cd <+4>: push rbp
0x00000000000011ce <+5>: mov rbp,rsp
0x00000000000011d1 <+8>: sub rsp,0x20
0x00000000000011d5 <+12>: mov DWORD PTR [rbp-0x14],edi
0x00000000000011d8 <+15>: mov QWORD PTR [rbp-0x20],rsi
0x00000000000011dc <+19>: mov edi,0x20
0x00000000000011e1 <+24>: call 0x10d0 <malloc@plt>
0x00000000000011e6 <+29>: mov QWORD PTR [rbp-0x8],rax
0x00000000000011ea <+33>: lea rax,[rip+0xe13] # 0x2004
0x00000000000011f1 <+40>: mov rdi,rax
0x00000000000011f4 <+43>: mov eax,0x0
0x00000000000011f9 <+48>: call 0x10a0 <printf@plt>
0x00000000000011fe <+53>: mov rdx,QWORD PTR [rip+0x2e0b] # 0x4010 <stdin@GLIBC_2.2.5>
0x0000000000001205 <+60>: mov rax,QWORD PTR [rbp-0x8]
0x0000000000001209 <+64>: mov esi,0x20
0x000000000000120e <+69>: mov rdi,rax
0x0000000000001211 <+72>: call 0x10b0 <fgets@plt>
0x0000000000001216 <+77>: mov rax,QWORD PTR [rbp-0x8]
0x000000000000121a <+81>: lea rdx,[rip+0xdf9] # 0x201a
0x0000000000001221 <+88>: mov rsi,rdx
0x0000000000001224 <+91>: mov rdi,rax
0x0000000000001227 <+94>: call 0x10c0 <strcmp@plt>
0x000000000000122c <+99>: test eax,eax
0x000000000000122e <+101>: jne 0x1241 <main+120>
0x0000000000001230 <+103>: lea rax,[rip+0xdfd] # 0x2034
0x0000000000001237 <+110>: mov rdi,rax
0x000000000000123a <+113>: call 0x1090 <puts@plt>
0x000000000000123f <+118>: jmp 0x1250 <main+135>
0x0000000000001241 <+120>: lea rax,[rip+0xe03] # 0x204b
0x0000000000001248 <+127>: mov rdi,rax
0x000000000000124b <+130>: call 0x1090 <puts@plt>
0x0000000000001250 <+135>: mov eax,0x0
0x0000000000001255 <+140>: leave
0x0000000000001256 <+141>: ret
End of assembler dump.
Now this is a lot. The way I recommend doing this is to go from external function call to external function call, understanding what gets passed to them. Let's do this one at a time.
Hold on, One more thing
We need to understand how 64-bit functions pass parameters. These are done via registers. The first six parameters are passed in the following registers:
rdi
(sometimes asedi
, the lower 4-bytes ofrdi
)rsi
(sometimes asesi
, the lower 4-bytes ofrsi
)rdx
(sometimes asedx
, the lower 4-bytes ofrdx
)rcx
(sometimes asecx
, the lower 4-bytes ofrcx
)r8
(sometimes asr8d
, the lower 4-bytes ofr8
)r9
(sometimes asr9d
, the lower 4-bytes ofr9
)
The difference between rdi
and edi
is confusing. rdi
and edi
are connected to the same storage location, meaning that changing edi
affects rdi
, and vice versa. edi
is just the bottom half of rdi
, as we see below.
rdi = [ _ _ _ _ _ _ _ _ ]
edi = [ _ _ _ _ ]
Assembly often passes parameters, especially parameters with small values, using the lower four bytes (i.e. edi
). Assembly does this because it's faster but is no different than passing all 8 bytes. The same goes for rsi
, rdx
, rcx
, r8
, and r9
.
Now, let's continue.
malloc
malloc
malloc()
in C is how we allocate memory. It makes space on the heap for us to use. If you want to see the parameters for malloc()
, we look at the man
pages. This is where the function header is defined. man malloc
shows the following:
SYNOPSIS
#include <stdlib.h>
void *malloc(size_t size);
DESCRIPTION
The malloc() function allocates size bytes and returns a pointer to
the allocated memory. The memory is not initialized. If size is
0, then malloc() returns either NULL, or a unique pointer value
that can later be successfully passed to free().
This tells us that malloc()
takes a parameter: the number of bytes passed to it. Since we know that rdi
is the register that always holds the first parameter, we check above the function call for rdi
being stored.
0x00000000000011dc <+19>: mov edi,0x20
0x00000000000011e1 <+24>: call 0x10d0 <malloc@plt>
0x00000000000011e6 <+29>: mov QWORD PTR [rbp-0x8],rax
Here, we see that edi
(aka rdi
) is being loaded with 0x20
(32
). This means that the equivalent function here is malloc(0x20)
, which allocates 32
bytes on the heap.
The str
variable resides in memory at address rpb-0x8
. Immediately after the malloc
returns, register rax
contains the malloc
return value, and that value is stored in the memory at address rpb-0x8
(i.e., it is stored in str
).
printf
printf
Let's check the arguments for printf
.
SYNOPSIS
printf FORMAT [ARGUMENT]...
printf OPTION
DESCRIPTION
Print ARGUMENT(s) according to FORMAT, or execute according to OPTION:
The man
pages aren't super helpful here. printf
just prints text to the screen. rdi
will have the string in memory that we want to print.
0x00000000000011ea <+33>: lea rax,[rip+0xe13] # 0x2004
0x00000000000011f1 <+40>: mov rdi,rax
0x00000000000011f4 <+43>: mov eax,0x0
0x00000000000011f9 <+48>: call 0x10a0 <printf@plt>
The first command is the most confusing. lea
stands for Load Effective Address, which takes the address of the memory location and stores it in the register. This is the same as mov rax, 0x2004
. The next command moves the address of the string into rdi
, and the last command calls printf
. Why does assembly use lea
? lea
is faster than mov
because it doesn't have to access memory to get the value. It just gets the address and stores it in the register.
After this, it moves this address into rdi
. If we check what's at this address, we see the following:
gef➤ x/s 0x2004
0x2004: "Enter the flag here: "
This makes sense with what we saw in the program output.
fgets
fgets
Let's check the man
pages to see what fgets
does.
SYNOPSIS
char *fgets(char *s, int size, FILE *stream);
DESCRIPTION
fgets() reads in at most one less than size characters from stream and stores
them into the buffer pointed to by s. Reading stops after an EOF or a newline.
If a newline is read, it is stored into the buffer. A terminating null byte
('\0') is stored after the last character in the buffer.
This tells that fgets()
is an input function. Since fgets()
takes three arguments, we need to find what rdi
, rsi
, and rdx
are to understand what's going into the function.
0x00000000000011fe <+53>: mov rdx,QWORD PTR [rip+0x2e0b] # 0x4010 <stdin@GLIBC_2.2.5>
0x0000000000001205 <+60>: mov rax,QWORD PTR [rbp-0x8]
0x0000000000001209 <+64>: mov esi,0x20
0x000000000000120e <+69>: mov rdi,rax
0x0000000000001211 <+72>: call 0x10b0 <fgets@plt>
Let's find the parameters.
We see that
rdi
is loaded withrax
, which is loaded with the address atrbp-0x8
. From before, we know that this is where themalloc
happened.rsi
is loaded with0x20
, meaning we are writing up to0x20
bytes.rdx
is loaded with the address ofrip+0x2e0b
(which GDB tells us is at0x4010
). GDB informs us that this isstdin
, which is the stream that the data comes from.
From this, we know that the C code looks like this:
buffer = malloc(0x20);
fgets(buffer, 0x20, stdin);
What does this mean? From earlier, we know that when malloc
is called, data is written on the heap. This means that we're writing data to the heap, rather than the stack.
Why is this important?
This means that we can't perform many of the stack overflow techniques we are going to learn. There are a series of heap overflow techniques, but they are relatively out of the scope of this guide.
strcmp
strcmp
strcmp
, or string compare, is how C compares strings. The man
pages show us that it takes 2 arguments -- the strings to compare.
Let's go find rdi
and rsi
.
0x0000000000001216 <+77>: mov rax,QWORD PTR [rbp-0x8]
0x000000000000121a <+81>: lea rdx,[rip+0xdf9] # 0x201a
0x0000000000001221 <+88>: mov rsi,rdx
0x0000000000001224 <+91>: mov rdi,rax
0x0000000000001227 <+94>: call 0x10c0 <strcmp@plt>
From this, we notice two things:
rdi
is loaded with the address ofrbp-0x8
, which is where themalloc
happened. This means thatrdi
is the first string.rsi
is loaded with the address ofrip+0xdf9
, which is0x201a
. This is not something that we loaded. We notice that it is based on the instruction pointer (because of PIE), which tells us that it is something hardcoded. More on this later, but for now, let's check what's there.
gef➤ x/s 0x201a
0x201a: "flag{welcome_to_runtime}\n"
As we expected, this is our flag. The following puts statements aren't really relevant to us, but we can guess by the code there that it prints a "yes" or "no" based on the return value of strcmp
.
Conclusions
Based on this program, we see that the flag was hardcoded. This is not going to be the case in 99% of the binary exploitation problems. There are going to be some reverse engineering problems where the flag is obfuscated, and your challenge is to figure out how it is converted, but not for binary exploitation problems.
For your reference, here is the source code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define STR_SIZE 0x20
int main(int argc, char* argv[])
{
char* line = malloc(STR_SIZE * sizeof(char));
printf("Enter the flag here: ");
fgets(line, STR_SIZE, stdin);
if (strcmp(line, "flag{welcome_to_runtime}\n") == 0)
printf("That's the right flag!\n");
else
printf("Nope, try again!\n");
return 0;
}
PS: You could have run strings intro | grep flag
to find the flag. strings
returns the hardcoded strings in binaries, so it's not a bad thing to check as a first step.
Last updated
Was this helpful?