Windows Library Code

Mon, Dec 9, 2019

Intro

I thought I will make a guide about windows library code.. The target audience are beginners that want to understand more about windows reverse engineering, development and compilation. I tried to make this guide as simple as possible.

A “Library” is a term used in computer science for a collection of pre-written code / variables. Libraries are pretty useful for developers because it saves development time.

There are 2 types of libraries:

1) Static Libraries - Library code that is added to the client executable at link time.

2) Dynamic Libraries (DLLs) - Library code that is loaded from another file at runtime.

In this guide, I will explain both of these types, explain how we can reverse engineer the two types of libraries and how does it work.

The Empty Binary

Let’s look at the simplest example. This code is an empty main() function:

// SimplestExecutable.c

int main() { 
    return 0;
}

This code does not depend on any function. Using visual studio we can remove the usage of the standard library (C Runtime Library) the resulting binary does not have an import table. This is the only function in the binary:

.text:0000000140001000 main            proc near
.text:0000000140001000                 xor     eax, eax
.text:0000000140001002                 retn
.text:0000000140001002 main            endp

Although this binary does not have an import table, several DLLs will be loaded:

ntdll.dll: This DLL must be loaded into every process in the system. (excluding WSL processes because of Pico).
kernel32.dll: This DLL is loaded automatically into every win32 subsystem process
kernelbase.dll: This DLL contains some functions imported by kernel32.dll

The Bare Program: Many Object Files

Let’s look at another simple example. This example contains 2 object files:

// SimpleCalculator.h
// Header file of the SimpleCalculator functions
//

#pragma once

int add(int x, int y);

int sub(int x, int y);

// SimpleCalculator.c 
// The implementation of the SimpleCalculator functions

#include "SimpleCalculator.h"

int add(int x, int y) {
	return x + y;
}

int sub(int x, int y) {
	return x - y;
}

// SimpleCalculatorMain.c

#include "SimpleCalculator.h"

int main() {

	int x = add(10, 20);
	int y = sub(30, 40);

	return add(x, y);
}

This simple program still does not use any external function. Each C file is compiled into an object file with the code from the C source file. The format of an object file is COFF, PE is based on COFF - PE simply wraps COFF with the MS-DOS header, PE signature and “abuses” the optional header of COFF (_IMAGE_OPTIONAL_HEADER).

To disassemble object files we can use the dumpbin tool or IDA pro.

Let’s look at the contents of the object files:

The functions in SimpleCalculator.c are compiled into SimpleCalculator.obj as expected:

>dumpbin /disasm SimpleCalculator.obj

Dump of file SimpleCalculator.obj

File Type: COFF OBJECT

add:
  0000000000000000: 89 54 24 10        mov         dword ptr [rsp+10h],edx
  0000000000000004: 89 4C 24 08        mov         dword ptr [rsp+8],ecx
  0000000000000008: 55                 push        rbp
  0000000000000009: 48 83 EC 40        sub         rsp,40h
  000000000000000D: 48 8B EC           mov         rbp,rsp
  0000000000000010: 8B 45 58           mov         eax,dword ptr [rbp+58h]
  0000000000000013: 8B 4D 50           mov         ecx,dword ptr [rbp+50h]
  0000000000000016: 03 C8              add         ecx,eax
  0000000000000018: 8B C1              mov         eax,ecx
  000000000000001A: 48 8D 65 40        lea         rsp,[rbp+40h]
  000000000000001E: 5D                 pop         rbp
  000000000000001F: C3                 ret

sub:
  0000000000000000: 89 54 24 10        mov         dword ptr [rsp+10h],edx
  0000000000000004: 89 4C 24 08        mov         dword ptr [rsp+8],ecx
  0000000000000008: 55                 push        rbp
  0000000000000009: 48 83 EC 40        sub         rsp,40h
  000000000000000D: 48 8B EC           mov         rbp,rsp
  0000000000000010: 8B 45 58           mov         eax,dword ptr [rbp+58h]
  0000000000000013: 8B 4D 50           mov         ecx,dword ptr [rbp+50h]
  0000000000000016: 2B C8              sub         ecx,eax
  0000000000000018: 8B C1              mov         eax,ecx
  000000000000001A: 48 8D 65 40        lea         rsp,[rbp+40h]
  000000000000001E: 5D                 pop         rbp
  000000000000001F: C3                 ret

Let’s look at SimpleCalculatorMain.obj:

>dumpbin /disasm SimpleCalculatorMain.obj

Dump of file SimpleCalculatorMain.obj

File Type: COFF OBJECT

main:
  0000000000000000: 40 55              push        rbp
  0000000000000002: 48 83 EC 70        sub         rsp,70h
  0000000000000006: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  000000000000000B: BA 14 00 00 00     mov         edx,14h
  0000000000000010: B9 0A 00 00 00     mov         ecx,0Ah
  0000000000000015: E8 00 00 00 00     call        add ; <----------
  000000000000001A: 89 45 00           mov         dword ptr [rbp],eax
  000000000000001D: BA 28 00 00 00     mov         edx,28h
  0000000000000022: B9 1E 00 00 00     mov         ecx,1Eh
  0000000000000027: E8 00 00 00 00     call        sub ; <-------------
  000000000000002C: 89 45 04           mov         dword ptr [rbp+4],eax
  000000000000002F: 8B 55 04           mov         edx,dword ptr [rbp+4]
  0000000000000032: 8B 4D 00           mov         ecx,dword ptr [rbp]
  0000000000000035: E8 00 00 00 00     call        add ; <-------------
  000000000000003A: 48 8D 65 50        lea         rsp,[rbp+50h]
  000000000000003E: 5D                 pop         rbp
  000000000000003F: C3                 ret

Ok, we can see the implementation of the main() function in this object file. As you can see - there are references to the add and sub functions. Sharp readers will notice something weird about these call instructions:

0000000000000015: E8 00 00 00 00     call        add

The “E8” opcode is a call instruction but the offset is 0 - The reason the offset is 0 is because the compiler does not know where the “add” function is located. This information is known only later during link time - the linker then replaces the zeros with the actual offsets.

How does the linker know which bytes should be replaced?

Each object file has a symbol table that contains imported and exported symbols.

Let’s examine this table inside SimpleCalculatorMain.obj:

>dumpbin /symbols SimpleCalculatorMain.obj

Dump of file SimpleCalculatorMain.obj

File Type: COFF OBJECT

COFF SYMBOL TABLE
..... (truncated)
.....
00E 00000000 UNDEF  notype ()    External     | add
00F 00000000 UNDEF  notype ()    External     | sub
010 00000000 SECT3  notype ()    External     | main
....
.... (truncated)

We can see that the table contains “add”, “sub” and “main”. The “main” function is located in SECTION 3. The “add” and “sub” functions are declared as “UNDEF” - this means these symbols will be looked up in the global namespace later during linkage.

After finding the functions in the global namespace, the linker needs to update all the references to these symbols (the calls we saw before) with the updated offsets. This is why the object file also contains a relocation table:

>dumpbin /relocations SimpleCalculatorMain.obj

Dump of file SimpleCalculatorMain.obj

File Type: COFF OBJECT

RELOCATIONS #3
                                                Symbol    Symbol
 Offset    Type              Applied To         Index     Name
 --------  ----------------  -----------------  --------  ------
 00000016  REL32                      00000000         E  add
 00000028  REL32                      00000000         F  sub
 00000036  REL32                      00000000         E  add
 ...
 ...

As you can see, Each reference to a symbol is added to the relocation table so the linker will be able to fix the offset to the actual location of the function.

After the linker fixes the offsets, the main function looks like this:

>dumpbin /disasm SimpleCalculator.exe

Dump of file SimpleCalculator.exe

File Type: EXECUTABLE IMAGE

add:
...
... (truncated)
...
sub:
...
... (truncated)
...
main:
  0000000140001040: 40 55              push        rbp
  0000000140001042: 48 83 EC 70        sub         rsp,70h
  0000000140001046: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  000000014000104B: BA 14 00 00 00     mov         edx,14h
  0000000140001050: B9 0A 00 00 00     mov         ecx,0Ah
  0000000140001055: E8 A6 FF FF FF     call        add ; <-------
  000000014000105A: 89 45 00           mov         dword ptr [rbp],eax
  000000014000105D: BA 28 00 00 00     mov         edx,28h
  0000000140001062: B9 1E 00 00 00     mov         ecx,1Eh
  0000000140001067: E8 B4 FF FF FF     call        sub ; <-------
  000000014000106C: 89 45 04           mov         dword ptr [rbp+4],eax
  000000014000106F: 8B 55 04           mov         edx,dword ptr [rbp+4]
  0000000140001072: 8B 4D 00           mov         ecx,dword ptr [rbp]
  0000000140001075: E8 86 FF FF FF     call        add ; <-------
  000000014000107A: 48 8D 65 50        lea         rsp,[rbp+50h]
  000000014000107E: 5D                 pop         rbp
  000000014000107F: C3                 ret

As you can see, the functions look exactly the same - but the offsets are fixed.

Whole Program Optimization

The “add” and “sub” functions are pretty small, why not inline them?
Inlining small functions has runtime benefits because it does not require the CPU to perform a CALL instruction.
In C and C++, A “link time optimization” lets the linker perform all sorts of optimizations. The issue with normal object files is that the linker does not have enough information to perform the optimizations. Optimization is typically performed on an internal compiler intermidiate representation of the program ( think of it like the middle code between your source code and machine code) So it is not possible using normal object files because they contain machine code. the linker is not clever enough to modify the machine code after it was generated.

We can achive link time optimization by using the /GL compiler flag. This flag instructs the compiler to generate more information and add it to the object file. This “more information” is highly compiler version dependent, because it typically means that the compiler will not omit machine code but the intermidiate representation code. This means the object file will not be valid with tools like dumpbin and IDA Pro anymore. This flag should only be used in case the code is compiled and linked on the same computer.

In this case, link time optimizations can even evaluate these functions in compile time - this results in this binary:

public main
    main proc near
    mov     eax, 0x14
    retn
main endp

This is simply beautiful.

Static Library Development

Let’s start by exploring the simpler type of libraries called “static libraries”. As we saw earlier, the linker is responsible to take the object files produced by the compiler and gather them into the final executable file. Each object file represents a compiled C file. Let’s say I wrote a library that implements a calculator (just like SimpleCalculator from before..) and I want to give this library to my friend. Theoretically, if the implementation were a single file I could compile it and give my friend the object file so he could add it as a linker input. This would add the code of the library to his executable, just like the example from before. The problem is: What if it’s more than 1 file?

Let’s say I have this “library”:

// Add.c
int add(int x, int y) { 
    return x + y;
}

// Sub.c
int sub(int x, int y) { 
    return x - y;
}

After compilation there are 2 object files: add.obj, sub.obj. I could give them to my friend, but it’s not scalable - What if I had 100 files?

Maybe there is a way to merge these object files into 1 file somehow..

It turns out there is - it’s called a static library (.LIB file in windows). Static libraries allows the developer to add a bunch of symbols to the linker namespace easily. This can be done by changing the Configuration Type in Visual Studio to “Static Library”. This will instruct the linker to produce a “.LIB” file from the object files (instead of generating an executable file)

The .LIB file is simply an archive file of object files. The format of the archive is AR format. In Unix, linkers can decompress AR archives and extract object files from them. In Windows the idea is similar, we add the LIB file as an input to the linker and it simply adds the object files from the LIB file. (In windows the format is a bit different though..)

To reverse engineer LIB files, you can use “dumpbin /disasm” or load the binary into IDA pro. This is example output of dumpbin /disasm:

>dumpbin /disasm StaticCalculator.lib
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file StaticCalculator.lib

File Type: LIBRARY

sub:
  0000000000000000: 89 54 24 10        mov         dword ptr [rsp+10h],edx
  0000000000000004: 89 4C 24 08        mov         dword ptr [rsp+8],ecx
  0000000000000008: 57                 push        rdi
  0000000000000009: 8B 44 24 18        mov         eax,dword ptr [rsp+18h]
  000000000000000D: 8B 4C 24 10        mov         ecx,dword ptr [rsp+10h]
  0000000000000011: 2B C8              sub         ecx,eax
  0000000000000013: 8B C1              mov         eax,ecx
  0000000000000015: 5F                 pop         rdi
  0000000000000016: C3                 ret

add:
  0000000000000000: 89 54 24 10        mov         dword ptr [rsp+10h],edx
  0000000000000004: 89 4C 24 08        mov         dword ptr [rsp+8],ecx
  0000000000000008: 57                 push        rdi
  0000000000000009: 8B 44 24 18        mov         eax,dword ptr [rsp+18h]
  000000000000000D: 8B 4C 24 10        mov         ecx,dword ptr [rsp+10h]
  0000000000000011: 03 C8              add         ecx,eax
  0000000000000013: 8B C1              mov         eax,ecx
  0000000000000015: 5F                 pop         rdi
  0000000000000016: C3                 ret

LIB files have one more use: They can be used to add imports to the import table (“import library”) - we will introduce this concept after explaining about dynamic libraries later.

Do not use the “Whole Program Optimization” with lib files unless you are compiling and linking on the same computer. If you compile something with the /GL flag and give the lib file to someone else with a different MSVC version undefined behavior is in your nose

Dynamic Library Development

As we said before, Dynamic Libraries are libraries that are loaded from another file (typically the DLL file). There are couple advantages to dynamic libraries in contrast to static libraries:

Save space in disk for shared libraries - Many executables can use the same DLL file (for example, kernel32.dll)
Save space in memory for shared libraries - Utilize the Copy On Write machanism of the virtual memory manager to share physical pages if possible.
Minimize load time for shared libraries - Because pages can be reused (if they are already in memory), the load time may be better than a bigger executable.
Allow updating the library code without recompilation of the client executables - simply replace the DLL on disk

The main disadvantages of dynamic libraries:

DLL Hell - could cause “DLL not found” errors and compatability issues
Runtime speed (in certain cases) - In static libraries, the linker handles the linking of the library. Because the code of the library is embedded inside the executable, the linker can optimize stuff - for example: inline functions.

In windows, DLLs can be loaded in 2 ways:

Import Table - The PE file format has an “import table”. This is a a table that contains references to dynamic libraries that will be loaded at runtime by the windows loader.
Calling LoadLibrary(string path) - The LoadLibrary function can be called to load a DLL file from the file system.

Creating a dynamic library

Let’s create a simple DLL. This DLL implements the Calculator interface (ahhh again..)

// DynamicCalculator.c
#include <Windows.h>


BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

__declspec(dllexport) int add(int x, int y) {
	return x + y;
}

__declspec(dllexport) int sub(int x, int y) {
	return x - y;
}

This is the simple implementation of our DLL. This DLL has a DllMain function. This function is called by the windows loader in several cases:

DLL_PROCESS_ATTACH: The DLL is loaded
DLL_PROCESS_DETACH: The DLL is unloaded
DLL_THREAD_ATTACH: A new thread is created. The call is made in the context of the new thread
DLL_THREAD_DETACH: A thread is exiting cleanly.

It’s funny, but the initialization of a DLL is not always performed inside the DllMain function. The reason is that there are many restrictions on the DllMain function. The easiest thing to do is to expose a “Initialize” function that will perform the actual initialization of the state of the DLL.

Each DLL has an export table. The export table is a table that contains all the exported functions of the DLL. (name and location of the function. Functions can be exported by an “Ordinal” number, we won’t talk about this)

“__declspec” is a microsoft specific attribute that allows the developer so specify a “storage class” to declarations. “dllexport” is a storage class that exports the function in the export table of a DLL. The developer can also use a “.def” file to define the names of the exports, but using dllexport is somtimes easier.

So, after compiling this code it will result in a DLL file that contains:

C runtime library code
“add” export
“sub” export

A DLL file is in the PE format. The only difference between an executable PE and a DLL is a simple flag. Most of the tools that can be used to explore an executable file can also be used to explore DLL files, including:

IDA
CFF Explorer
pestudio
dumpbin
….

Let’s see how we can use the DynamicCalculator DLL.

Using a DLL with LoadLibrary() and GetProcAddress()

The win32 api provides 2 useful functions to clients of DLLs:

HMODULE LoadLibrary(WSTR DllPath) - Load a specific library from the file system. (kernelbase!LoadLibraryW -> kernelbase!LoadLibraryExW -> ntdll!LdrLoadDll) Calling this function also invokes the “DllMain” procedure of the DLL we are loading.
PVOID GetProcAddress(HMODULE ModuleHandle, STR FunctionName) - Look at the export table of a module and get the address of an exported function.

Example Client Program (Ignoring errors for simplicity..)

// DynamicCalculatorClient.c
#include <windows.h>

// define pointer types
typedef int (*ptr_add)(int x, int y);
typedef int (*ptr_sub)(int x, int y);

// pointers to export functions
ptr_add add;
ptr_sub sub;
HMODULE dynamicCalculator;

int main() {
  // Load the library
  dynamicCalculator = LoadLibraryA("DynamicCalculator.dll");
  add = (ptr_add)GetProcAddress(dynamicCalculator, "add");
  sub = (ptr_sub)GetProcAddress(dynamicCalculator, "sub");

  // Actual logic of the client program
  int x = add(10, 20);
  int y = sub(30, 20);

  int sum = add(x, y);

  // Free the library
  FreeLibrary(dynamicCalculator);

  return sum;
}

Notice that this method requires us to:

Declare the prototype of the declared functions
Call “LoadLibrary” with the name of the DLL
Call GetProcAddress() with the function names we are about to use and save the function pointers in a variable (global or local)
Call the functions through these pointers

Let’s look at the disassembly of our program:

>dumpbin DynamicCalculatorClient.exe /disasm
...
...
main:
  00000001400116C0: 40 55              push        rbp
  00000001400116C2: 57                 push        rdi
  00000001400116C3: 48 81 EC 48 01 00  sub         rsp,148h
                    00
  00000001400116CA: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  00000001400116CF: 48 8B FC           mov         rdi,rsp
  00000001400116D2: B9 52 00 00 00     mov         ecx,52h
  00000001400116D7: B8 CC CC CC CC     mov         eax,0CCCCCCCCh
  00000001400116DC: F3 AB              rep stos    dword ptr [rdi]
  00000001400116DE: 48 8D 0D CB 84 00  lea         rcx,[??_C@_0BG@JAKNIHCK@DynamicCalculator?4dll@] ; "DynamicCalculator.dll"
                    00
  00000001400116E5: FF 15 25 D9 00 00  call        qword ptr [__imp_LoadLibraryA]
  00000001400116EB: 48 89 05 D6 B1 00  mov         qword ptr [dynamicCalculator],rax
                    00
  00000001400116F2: 48 8D 15 D3 84 00  lea         rdx,[??_C@_03BDGOHNNK@add@] ; "add"
                    00
  00000001400116F9: 48 8B 0D C8 B1 00  mov         rcx,qword ptr [dynamicCalculator]
                    00
  0000000140011700: FF 15 02 D9 00 00  call        qword ptr [__imp_GetProcAddress]
  0000000140011706: 48 89 05 DB B1 00  mov         qword ptr [add],rax
                    00
  000000014001170D: 48 8D 15 BC 84 00  lea         rdx,[??_C@_03KCMAIMAP@sub@] ; "sub"
                    00
  0000000140011714: 48 8B 0D AD B1 00  mov         rcx,qword ptr [dynamicCalculator]
                    00
  000000014001171B: FF 15 E7 D8 00 00  call        qword ptr [__imp_GetProcAddress]
  0000000140011721: 48 89 05 B8 B1 00  mov         qword ptr [sub],rax
                    00
  0000000140011728: BA 14 00 00 00     mov         edx,14h
  000000014001172D: B9 0A 00 00 00     mov         ecx,0Ah
  0000000140011732: FF 15 B0 B1 00 00  call        qword ptr [add]
  0000000140011738: 89 45 04           mov         dword ptr [rbp+4],eax
  000000014001173B: BA 14 00 00 00     mov         edx,14h
  0000000140011740: B9 1E 00 00 00     mov         ecx,1Eh
  0000000140011745: FF 15 95 B1 00 00  call        qword ptr [sub]
  000000014001174B: 89 45 24           mov         dword ptr [rbp+24h],eax
  000000014001174E: 8B 55 24           mov         edx,dword ptr [rbp+24h]
  0000000140011751: 8B 4D 04           mov         ecx,dword ptr [rbp+4]
  0000000140011754: FF 15 8E B1 00 00  call        qword ptr [add]
  000000014001175A: 89 45 44           mov         dword ptr [rbp+44h],eax
  000000014001175D: 48 8B 0D 64 B1 00  mov         rcx,qword ptr [dynamicCalculator]
                    00
  0000000140011764: FF 15 96 D8 00 00  call        qword ptr [__imp_FreeLibrary]
  000000014001176A: 8B 45 44           mov         eax,dword ptr [rbp+44h]
  000000014001176D: 48 8D A5 28 01 00  lea         rsp,[rbp+128h]
                    00
  0000000140011774: 5F                 pop         rdi
  0000000140011775: 5D                 pop         rbp
  0000000140011776: C3                 ret
  ...
  ...

As you can see in this code, the calls to functions from other DLLs are made through pointers:

0000000140011732: FF 15 B0 B1 00 00 call qword ptr [add]

This makes sense because the compiler (and even the linker) cannot know the address of the “add” function (it is loaded in runtime) - That’s why calls to DLL functions have to be made using pointers. The opcode “FF 15 <32bit offset>” means: Load the value from *(RIP+offset) then perform a “call” instruction to this address.

Using the import table to load DLLs

All this work with LoadLibrary() and GetProcAddress() can be very annoying to maintain for real systems. This is why the PE format header has a structure called the import table. The import table can be used to instruct the windows loader to load certain DLLs. For each DLL, there’s a list of names (or ordinals) of functions that need to be imported. After the windows loader loads these DLLs, it enumerates the list of imported functions and saves the function pointers in a global location (actually patches the import table with the function pointers). The linker knows the offset of the import table from the beginning of the image in memory (the RVA) so it can fix the offset of call instructions to the imported function pointers.

So, how can we add new imports to the import table?

If you compile the DynamicCalculator project (the Dynamic Library we have created before) you will see that it creates the DynamicCalculator.dll file (as expected) BUT then you will see another file called DynamicCalculator.lib - this may look pretty weird because we created a dynamic library (not a static library).

The DynamicCalculator.lib file is an “import library”. This .LIB file does not contain any object file (You can decompress and look) - It simply allows developers to reference imported functions as linker symbols and get the linker resolve the addresses conveniently. After adding the DynamicCalculator.lib file as a linker input, A developer can write the following code:

// DynamicCalculatorClientImportTable.c
#include <windows.h>

__declspec(dllimport) int add(int x, int y);
__declspec(dllimport) int sub(int x, int y);


int main() {
	int x = add(10, 20);
	int y = sub(30, 20);

	int sum = add(x, y);

	return sum;
}

Looking at the import table of the generated executable, we see the following:

>dumpbin /imports DynamicCalculatorClientImportTable.exe

Dump of file DynamicCalculatorClientImportTable.exe

File Type: EXECUTABLE IMAGE

  Section contains the following imports:

    DynamicCalculator.dll
             140016000 Import Address Table
             140016090 Import Name Table
                     0 time date stamp
                     0 Index of first forwarder reference

                           1 sub
                           0 add

This means the only DLL that is imported is DynamicCalculator.dll and the imported functions are “sub” and “add”. This is the generated assembly code:

main:
  0000000140011030: 40 55              push        rbp
  0000000140011032: 48 83 EC 70        sub         rsp,70h
  0000000140011036: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  000000014001103B: BA 14 00 00 00     mov         edx,14h
  0000000140011040: B9 0A 00 00 00     mov         ecx,0Ah
  0000000140011045: FF 15 BD 4F 00 00  call        qword ptr [__imp_add] ; <-----
  000000014001104B: 89 45 00           mov         dword ptr [rbp],eax
  000000014001104E: BA 14 00 00 00     mov         edx,14h
  0000000140011053: B9 1E 00 00 00     mov         ecx,1Eh
  0000000140011058: FF 15 A2 4F 00 00  call        qword ptr [__imp_sub] ; <----
  000000014001105E: 89 45 04           mov         dword ptr [rbp+4],eax
  0000000140011061: 8B 55 04           mov         edx,dword ptr [rbp+4]
  0000000140011064: 8B 4D 00           mov         ecx,dword ptr [rbp]
  0000000140011067: FF 15 9B 4F 00 00  call        qword ptr [__imp_add] ; <----
  000000014001106D: 89 45 08           mov         dword ptr [rbp+8],eax
  0000000140011070: 8B 45 08           mov         eax,dword ptr [rbp+8]
  0000000140011073: 48 8D 65 50        lea         rsp,[rbp+50h]
  0000000140011077: 5D                 pop         rbp
  0000000140011078: C3                 ret

As you can see calls are made through pointers:

call        qword ptr [__imp_add]

“__imp_add” is a linker symbol that refers to the function pointer in the import table. When the windows loader loads the DLL, it stores the function pointer in the import table.

This is equivalent to using GetProcAddress() and storing the address in a global variable, then using the global variable as a function pointer.

The windows loader does not use LoadLibrary() function directly, it uses a lower level function to perform the load. Eventually both arrive to ntdll!NtMapViewOfSection with the SEC_IMAGE flag. I will probably explain this sometime..

What does __declspec(dllimport) do?

To call the ‘add’ / ‘sub’ function, the call has to be made through a pointer that resides in the import table. Let’s see what I mean:

Say I declare the add / sub functions this way:

// DynamicCalculatorClientStub.c
int add(int x, int y);
int sub(int x, int y);


int main() {
	int x = add(10, 20);
	int y = sub(30, 20);

	int sum = add(x, y);

	return sum;
}

My object file will look like this:

>dumpbin /disasm DynamicCalculatorClientStub.obj
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file DynamicCalculatorClientStub.obj

File Type: COFF OBJECT

main:
  0000000000000000: 40 55              push        rbp
  0000000000000002: 48 83 EC 70        sub         rsp,70h
  0000000000000006: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  000000000000000B: BA 14 00 00 00     mov         edx,14h
  0000000000000010: B9 0A 00 00 00     mov         ecx,0Ah
  0000000000000015: E8 00 00 00 00     call        add ; <-----
  000000000000001A: 89 45 00           mov         dword ptr [rbp],eax
  000000000000001D: BA 14 00 00 00     mov         edx,14h
  0000000000000022: B9 1E 00 00 00     mov         ecx,1Eh
  0000000000000027: E8 00 00 00 00     call        sub ; <-----
  000000000000002C: 89 45 04           mov         dword ptr [rbp+4],eax
  000000000000002F: 8B 55 04           mov         edx,dword ptr [rbp+4]
  0000000000000032: 8B 4D 00           mov         ecx,dword ptr [rbp]
  0000000000000035: E8 00 00 00 00     call        add ; <-----
  000000000000003A: 89 45 08           mov         dword ptr [rbp+8],eax
  000000000000003D: 8B 45 08           mov         eax,dword ptr [rbp+8]
  0000000000000040: 48 8D 65 50        lea         rsp,[rbp+50h]
  0000000000000044: 5D                 pop         rbp
  0000000000000045: C3                 ret

As you can see the calls are made with the E8 opcode, which as we said before, expects an offset to the function. The linker cannot know the offset to the actual function because it is known only in runtime. The linker does know the offset to the import table which contains a pointer to the imported function in runtime, but it cannot change the opcode of the instruction from a relative call to an indirect call because typically the length of the opcodes are different, you can see here:

E8 00 00 00 00     call        add                   ; 5 bytes
FF 15 00 00 00 00  call        qword ptr [__imp_add] ; 6 bytes

Oh man what a mess. The linker cannot move all other instructions and fix that much stuff by itself..

Delaring a function with “__declspec(dllimport)” instructs the compiler to generate the second option - calling through a function pointer.

BUT typically in real libraries we want to share the header files with the clients of the library. As a reminder, our header file looks like this:

#pragma once

int add(int x, int y);

int sub(int x, int y);

This means the declarations won’t have any __declspec(dllimport) in the client’s code. So, how can the linker deal with this situation?

Stubs! Let’s see what happens after the linkage of the last example:

>dumpbin /disasm DynamicCalculatorClientStub.exe
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file DynamicCalculatorClientStub.exe

File Type: EXECUTABLE IMAGE

main:
  0000000140001000: 40 55              push        rbp
  0000000140001002: 48 83 EC 70        sub         rsp,70h
  0000000140001006: 48 8D 6C 24 20     lea         rbp,[rsp+20h]
  000000014000100B: BA 14 00 00 00     mov         edx,14h
  0000000140001010: B9 0A 00 00 00     mov         ecx,0Ah
  0000000140001015: E8 2C 00 00 00     call        add ; <----- A call to the stub below
  000000014000101A: 89 45 00           mov         dword ptr [rbp],eax
  000000014000101D: BA 14 00 00 00     mov         edx,14h
  0000000140001022: B9 1E 00 00 00     mov         ecx,1Eh
  0000000140001027: E8 20 00 00 00     call        sub ; <----- A call to the stub below
  000000014000102C: 89 45 04           mov         dword ptr [rbp+4],eax
  000000014000102F: 8B 55 04           mov         edx,dword ptr [rbp+4]
  0000000140001032: 8B 4D 00           mov         ecx,dword ptr [rbp]
  0000000140001035: E8 0C 00 00 00     call        add ; <----- A call to the stub below
  000000014000103A: 89 45 08           mov         dword ptr [rbp+8],eax
  000000014000103D: 8B 45 08           mov         eax,dword ptr [rbp+8]
  0000000140001040: 48 8D 65 50        lea         rsp,[rbp+50h]
  0000000140001044: 5D                 pop         rbp
  0000000140001045: C3                 ret
add:
  0000000140001046: FF 25 BC 0F 00 00  jmp         qword ptr [__imp_add] ; the stub for add
sub:
  000000014000104C: FF 25 AE 0F 00 00  jmp         qword ptr [__imp_sub] ; the stub for sub

So the DynamicCalculator.lib import library contains the following symbols:

__imp_add / __imp_sub: Pointers in the import table that point to the actual library code in runtime.
add/sub: Stubs that contains “jmp” instructions to the import table pointers. If a client does not use __declspec(dllimport) the relative offset of the opcode will be resolved to these functions.

One of the main advantages of not using __declspec(dllimport) is we can replace the dynamic library to a static library without changing our code! (even without changing the object file actually)

Revisiting the whole program optimization

Remember the whole program optimization? It is a link time optimization. If we turn on the whole program optimization the call to the stub can be converted to a call to the import table, here:

>dumpbin /disasm DynamicCalculatorClientStub.exe
Microsoft (R) COFF/PE Dumper Version 14.22.27905.0
Copyright (C) Microsoft Corporation.  All rights reserved.


Dump of file DynamicCalculatorClientStub.exe

File Type: EXECUTABLE IMAGE

add:
  0000000140001000: FF 25 02 10 00 00  jmp         qword ptr [__imp_add]
sub:
  0000000140001006: FF 25 F4 0F 00 00  jmp         qword ptr [__imp_sub]
  000000014000100C: CC CC CC CC                                      ÌÌÌÌ
main:
  0000000140001010: 48 83 EC 38        sub         rsp,38h
  0000000140001014: BA 14 00 00 00     mov         edx,14h
  0000000140001019: B9 0A 00 00 00     mov         ecx,0Ah
  000000014000101E: FF 15 E4 0F 00 00  call        qword ptr [__imp_add]
  0000000140001024: 89 44 24 24        mov         dword ptr [rsp+24h],eax
  0000000140001028: BA 14 00 00 00     mov         edx,14h
  000000014000102D: B9 1E 00 00 00     mov         ecx,1Eh
  0000000140001032: FF 15 C8 0F 00 00  call        qword ptr [__imp_sub]
  0000000140001038: 89 44 24 20        mov         dword ptr [rsp+20h],eax
  000000014000103C: 8B 54 24 20        mov         edx,dword ptr [rsp+20h]
  0000000140001040: 8B 4C 24 24        mov         ecx,dword ptr [rsp+24h]
  0000000140001044: FF 15 BE 0F 00 00  call        qword ptr [__imp_add]
  000000014000104A: 89 44 24 28        mov         dword ptr [rsp+28h],eax
  000000014000104E: 8B 44 24 28        mov         eax,dword ptr [rsp+28h]
  0000000140001052: 48 83 C4 38        add         rsp,38h
  0000000140001056: C3                 ret

That’s it! I hope you learned about the compilation model in windows libraries. If you have any questions or found a mistake in the article, Send me a twitter message: @0xrepnz