Reverse Engineering Optimizations: Division By Multiplication

Intro

Reverse engineering compiler optimizations can delay a reverse engineer a-lot. By learning how the compiler optimizes certain things, you can save lots of time. Knowning the pattern, the next time you see this optimization you’ll recognize right away how to decompile it.

In this blog post series I’ll document how to decompile certain compiler optimizations, I hope it’ll save some time for you.

Division By Multiplication

There’s no heavy math in this post lol.

While reverse engineering a rootkit sample (That I soon will publish about;) ) I saw this weird assembly code:

mov     ecx, dword ptr [X]        
mov     rax, 0x3333333333333400          
mul     rcx                              

Every time I see weird assembly code, I think to myself: did the author write this code? After understanding what this code does, I knew this is a compiler optimization - This code simply performs a division operation without using the ‘div’ instruction. ‘div’ is an expensive operation for the CPU - even more than multiplication. Let’s look how we can understand this optimization:

Let’s say we want to divide X by 5. In math, (X / 5) = (X * (15)). This is the basis of this optimization. So, we can invert 5 to the value 15 and then use multiplication.

But you may ask: How can we perform (X * (15)) without using floating point instructions?

We can utilize the “mul” instruction: The mul instruction performs multiplication between rax and an operand. If the multiplication operands are 64 bit, the result of the multiplication can be 128 bit. That’s why the result of the “mul” instruction is stored in 2 registers:

This is a pseudo-code implementation of ‘mul’:

void mul(uint64_t operand) { 
    __int128 result = (rax * operand);
    // take the lower 64 bits and store in rax
    rax = (result & 0xffffffffffffffff);

    // store the upper 64 bits and store in rdx
    rdx = (result >> 64) & 0xffffffffffffffff;                
}

So how can we utilize this instruction to perform (X * (15))?

Let’s represent a floating point decimal number that way: the upper 64 bits represent the whole number part of the decimal number, and the lower bits represent the fractional part. This means that after a mul instruction the whole number part is stored in “rdx” and the fractional part is stored in rax.

In this representation, 2^64 represents the value 1:

fractional_part = 0
whole_number    = 1

full_representation: 0x 0000000000000001.0000000000000000

In this representation, to get the value (15) we have to divide 2^64 by 5.0:

2**64 / 5.0 =  0x3333333333333400

(When trying these calculations, beware of floating point truncations and rounding.)

So, 0x3333333333333400 represents the number (15). Now, Let’s see how we can divide 10 by 5:

0x0000000000000000 0x000000000000000A (10)
                  *
0x0000000000000000 0x3333333333333400 (0.2)
                  =
0x0000000000000002 0x0000000000000000 (2)
                 

As you can see, using this representation the result is 2. To get the result we have to shift the result right 64 bits to get the higher bits. The reason the first operand (10) is not shifted is because we already shifted the second operand.

Let’s look at how we can use this trick to optimize away division:

//
// Real calculation:
//
Result = (X / 5.0)

//
// Calculation Using Multiplication:
//
Result = (X * (1/5.0))

//
// Calculation using our decimal representation: 
//
Result = (X * 0x3333333333333400) >> 64


//
// Calulation using assembly:
//

// Store (2^64 / 5.0) in rcx
mov rcx, 0x3333333333333400 

// Store X in rax.
mov rax, qword ptr [X] 

// Multiply.
// The result of the multiplication is stored in rdx, which stores the higher 64 bits.
mul rcx

This assembly snippet is basically the snippet from the beginning of the article :)

Combining this trick with shifts can replace the “div” instruction.

Reverse Engineering Example

Let’s look at another example:

mov     rcx, qword ptr [X]
mov     rax, 0xCCCCCCCCCCCCCCCD
mul     rcx
mov     rdi, rdx
shr     rdi, 4

Ok, so how can we reverse engineer this? Let’s first understand what 0xCCCCCCCCCCCCCCCD represents by doing the inverse of it’s creation:

>> 0xCCCCCCCCCCCCCCCD / float(2**64)
0.8

Ok, we know that the first “mul” instruction multiplies X by 0.8 (45) and stores the result in rdx. After that, there’s a shift right of 4 bits - which really means dividing by (2^4) = 16. this is the basic formula:

Result = (X * 4/5) / 16 

We can change the representation of the last division:

(X * 4/5 * 1/16)

Then, combine the last multiplications:

(X * 4/80) -> (X * 1/20)

Now we know this code simply divides by 20!

Exercise

This is an exercise for the reader:

mov     rdi, qword ptr [X]
mov     rax, 0xA41A41A41A41A41B
mul     rdi             
sub     rdi, rdx        
shr     rdi, 1                
add     rdi, rdx
shr     rdi, 8

Summary

That’s it! I hope it was interesting. I’ll continue posting stuff about cool optimization patterns that hopefully will be helpful. Also, I’ll post about this rootkit hopefully this week:)

References

https://homepage.divms.uiowa.edu/~jones/bcd/divide.html

https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi/41224096