Reverse Engineering Optimizations: Division By Multiplication
Intro
Reverse engineering compiler optimizations can delay a reverse engineer a-lot. By learning how the compiler optimizes certain things, you can save lots of time. Knowning the pattern, the next time you see this optimization you’ll recognize right away how to decompile it.
In this blog post series I’ll document how to decompile certain compiler optimizations, I hope it’ll save some time for you.
Division By Multiplication
There’s no heavy math in this post lol.
While reverse engineering a rootkit sample (That I soon will publish about;) ) I saw this weird assembly code:
mov ecx, dword ptr [X]
mov rax, 0x3333333333333400
mul rcx
Every time I see weird assembly code, I think to myself: did the author write this code? After understanding what this code does, I knew this is a compiler optimization - This code simply performs a division operation without using the ‘div’ instruction. ‘div’ is an expensive operation for the CPU - even more than multiplication. Let’s look how we can understand this optimization:
Let’s say we want to divide X by 5. In math, (X / 5) = (X * (1⁄5)). This is the basis of this optimization. So, we can invert 5 to the value 1⁄5 and then use multiplication.
But you may ask: How can we perform (X * (1⁄5)) without using floating point instructions?
We can utilize the “mul” instruction: The mul instruction performs multiplication between rax and an operand. If the multiplication operands are 64 bit, the result of the multiplication can be 128 bit. That’s why the result of the “mul” instruction is stored in 2 registers:
- rax - the lower 64 bit of the result
- rdx - the upper 64 bit of the result
This is a pseudo-code implementation of ‘mul’:
void mul(uint64_t operand) {
__int128 result = (rax * operand);
// take the lower 64 bits and store in rax
rax = (result & 0xffffffffffffffff);
// store the upper 64 bits and store in rdx
rdx = (result >> 64) & 0xffffffffffffffff;
}
So how can we utilize this instruction to perform (X * (1⁄5))?
Let’s represent a floating point decimal number that way: the upper 64 bits represent the whole number part of the decimal number, and the lower bits represent the fractional part. This means that after a mul instruction the whole number part is stored in “rdx” and the fractional part is stored in rax.
In this representation, 2^64 represents the value 1:
fractional_part = 0
whole_number = 1
full_representation: 0x 0000000000000001.0000000000000000
In this representation, to get the value (1⁄5) we have to divide 2^64 by 5.0:
2**64 / 5.0 = 0x3333333333333400
(When trying these calculations, beware of floating point truncations and rounding.)
So, 0x3333333333333400 represents the number (1⁄5). Now, Let’s see how we can divide 10 by 5:
0x0000000000000000 0x000000000000000A (10)
*
0x0000000000000000 0x3333333333333400 (0.2)
=
0x0000000000000002 0x0000000000000000 (2)
As you can see, using this representation the result is 2. To get the result we have to shift the result right 64 bits to get the higher bits. The reason the first operand (10) is not shifted is because we already shifted the second operand.
Let’s look at how we can use this trick to optimize away division:
//
// Real calculation:
//
Result = (X / 5.0)
//
// Calculation Using Multiplication:
//
Result = (X * (1/5.0))
//
// Calculation using our decimal representation:
//
Result = (X * 0x3333333333333400) >> 64
//
// Calulation using assembly:
//
// Store (2^64 / 5.0) in rcx
mov rcx, 0x3333333333333400
// Store X in rax.
mov rax, qword ptr [X]
// Multiply.
// The result of the multiplication is stored in rdx, which stores the higher 64 bits.
mul rcx
This assembly snippet is basically the snippet from the beginning of the article :)
Combining this trick with shifts can replace the “div” instruction.
Reverse Engineering Example
Let’s look at another example:
mov rcx, qword ptr [X]
mov rax, 0xCCCCCCCCCCCCCCCD
mul rcx
mov rdi, rdx
shr rdi, 4
Ok, so how can we reverse engineer this?
Let’s first understand what 0xCCCCCCCCCCCCCCCD
represents by doing the inverse of it’s creation:
>> 0xCCCCCCCCCCCCCCCD / float(2**64)
0.8
Ok, we know that the first “mul” instruction multiplies X by 0.8 (4⁄5) and stores the result in rdx. After that, there’s a shift right of 4 bits - which really means dividing by (2^4) = 16. this is the basic formula:
Result = (X * 4/5) / 16
We can change the representation of the last division:
(X * 4/5 * 1/16)
Then, combine the last multiplications:
(X * 4/80) -> (X * 1/20)
Now we know this code simply divides by 20!
Exercise
This is an exercise for the reader:
mov rdi, qword ptr [X]
mov rax, 0xA41A41A41A41A41B
mul rdi
sub rdi, rdx
shr rdi, 1
add rdi, rdx
shr rdi, 8
Summary
That’s it! I hope it was interesting. I’ll continue posting stuff about cool optimization patterns that hopefully will be helpful. Also, I’ll post about this rootkit hopefully this week:)
References
https://homepage.divms.uiowa.edu/~jones/bcd/divide.html
Tweet Link
While reverse engineering a rootkit sample (That I soon will publish about 😉) I saw this weird compiler optimization.
— Ori Damari (@0xrepnz) October 26, 2019
I thought I'll start documenting compiler optimizations for reverse engineers. Read my first article in the series:https://t.co/nThidJKygf pic.twitter.com/RufA3fHDa4