Reversing the x64 calling convention
In this article I will explain how the x64 calling convention looks like in Windows and we’ll dive into how it influences reverse engineering.
x64 Intro
So, Ready to talk about some x64 assembly? As you may know, windows uses a “fastcall” calling convention in x64. In contrast to most windows 32 bit calling conventions, arguments are passed using registers: RCX, RDX, R8, R9, and the rest is passed on the stack.
Let’s say I have this function call:
void FourArgs(int arg1, int arg2, int arg3, int arg4);
void CallFour()
{
FourArgs(1, 2, 3, 4);
}
// Compiler Output:
sub rsp, 40 // Magical Allocation - explained soon
mov r9d, 4 // arg4
mov r8d, 3 // arg3
mov edx, 2 // arg2
mov ecx, 1 // arg1
call FourArgs
add rsp, 40
retn
But wait, what is this magical allocation over there? 🤔 I thought stack space is used only for functions with more than 4 arguments, so what’s this? Maybe our answer lies in the called function:
void FourArgs(int arg1, int arg2, int arg3, int arg4)
{
// This function is empty
}
// Compiler Output
mov [rsp + 32], r9d
mov [rsp + 24], r8d
mov [rsp + 16], edx
mov [rsp + 8], ecx
retn
The “fastcall” convention actually means the caller has to reserve stack space for at least 4 parameters (even if the function receives no arguments..! 😨) AND - looks like the called function saves the parameters on the stack. (😮)
This is another example: (debug build)
void CallAdd()
{
Add(1, 2, 3, 4, 5);
}
// Compiler Output:
sub rsp, 56
mov dword ptr [rsp + 32], 5 // arg5
mov r9d, 4 // arg4
mov r8d, 3 // arg3
mov edx, 2 // arg2
mov ecx, 1 // arg1
call Add
add esp, 56
retn
Why Is This Calling Convention Useful?
Some questions arise regarding this new calling convention:
- What is the reason for this movement to fastcall on x64?
- Why are the parameters saved on the stack?
- Why does the caller allocate this stack space and not the called function?
Ahh.. Hard questions.
Probably, the reason for the fastcall calling convention is the addition of new general purpose registers. (r8-r15) Architectures with a large number of registers try to utilize the large number of registers by using them for arguments and local variables instead of the stack because it’s faster.
The reason the function parameters are saved on the stack is for debugging purposes. The compiler typically won’t use this space to store the parameters on release builds. The compiler is free to use this space for anything - for local variables for example.
I don’t know why the caller allocates this space and not the called function, sounds weird to me.
Reversing x64 code
This calling convention makes reverse engineering and hacking a bit harder. For many reasons:
Forwarded Arguments
Say you have this function: (release build)
Annoying:
sub rsp, 32
// local_variable = arg1 + arg2
mov eax, ecx // arg1
add eax, edx // arg2
; pass this to DoSomething1
mov ecx, eax // pass arg1
mov edx, 2 // pass arg2
call DoSomething1
add rsp, 32
retn
If I had asked you: “What are the parameters to DoSomething1” - what would you say? The obvious answer is: DoSomething1 receives (int, int) and the values are (arg1 + arg2) and 2. right?
But this is actually not correct! The source code is:
int Annoying(int a, int b, int c)
{
return DoSomething1(a + b, 2, c);
}
Annoying()
contains a third argument! As we said eariler, R8 contains the third argument of functions.
In this case, it’s passed as is to DoSomething1. This means that there’s no reason to move it,
the argument is already in R8. This makes reverse engineering a bit harder because now we have to look at DoSomething1
to figure out what parameters it receives.
Generally, the best way to know the number of arguments a function receives is to inspect this function. If a function uses R8 without initializng it, it probably means it receives the third argument from the caller.
DoSomething1:
sub rsp, 32
mov eax, ecx
add eax, edx
add eax, r8d // R8 is used without initialization
add rsp, 32
retn
Extracting Function Arguments
Say we are debugging code compiled with release this is it’s source:
void DoSomethingImportantWithReason(const char* reason, int a, int b)
{
// Do a bunch of stuff
DoSomethingImportant(a, b);
}
void DoSomethingImportant(int a, int b)
{
// Do Something Important
}
We have a breakpoint on DoSomethingImportant
. When the code breaks, we look at the call stack and we see
that DoSomethingImportantWithReason
called it. Now, we want to extract “const char* reason”.
To do this in 32bit assembly, we could have looked at the stack and simply see the function’s argument. We even had automatic tools
to do this.
in x64, we have to look at a disassembly of DoSomethingImportantWithReason and see where it stores “reason”. It makes debugging harder. This is the reason that on debug builds the arguments are saved on the stack (like the first example..)
This exact case of extracting stack arguments also can happen if we place hooks.
Summary
There are more reasons.. But this is it for today:)
If you found any mistake contact me @0xrepnz ;) I hope it was interesting.