Code hooks can be useful to add instrumentation and debug code, or to change the behavior of functions you don’t have the source code for. We take a look at how you can implement these in a way that works on all platforms that are currently supported by Delphi.
Hooking into existing code at run-time is frowned upon by many people since it can be used to create malicious code. But there are some legitimate uses of hooking as well. For example, in the not-too-distant future, we at Grijjy will present our cross-platform remote logging library that can be used to send log messages from any platform and view them in a log viewer on your PC. One of the features of this viewer is that it not only shows log messages, but it also provides a view of all live objects in your application as a list of class names and the current number of live instances of those classes. This information can be very useful to track down memory leaks and other memory related problems. For example, if the instance count of a certain kind of object keeps growing, but you expect it to shrink, then you may have forgotten to free an object somewhere, or you may have a reference cycle when running on an ARC platform or using object interfaces.
Instance Tracking
To implement this feature, we must somehow be able to get notified whenever an object is created or destroyed. We do this by hooking into the TObject.NewInstance
and TObject.FreeInstance
methods. These are the methods where memory for an object is actually (de)allocated. Inside those hooked methods, we duplicate the original implementations of these methods, and in addition update a global list of active instances. In this article, we show how we did this. It is accompanied by sample code in our JustAddCode repository on GitHub, in the directory CrossPlatformHooking. You will find two sample applications that show a list of running instances. One is a FireMonkey application that runs on all Platforms except Linux. The other one is a console application that works on all desktop platforms, including Linux. These are some screen shots of the end result:
Hooking Methods
There are various ways you can hook into existing code at run-time. Unfortunately, I have not found a single method that works on all platforms. So our logging library, and the sample code for this article, uses one of two methods, depending on platform.
The first method I simply call Function Hooking. This works by overwriting the existing implementation of a function with a jump to a custom implementation. Unfortunately, this method does not work an iOS and Android since those platforms don’t allow you to overwrite executable code.
The second method is called Virtual Method Table (VMT) patching. This method is more limited than the first one, but also works on iOS and Android, but interestingly enough does not work on macOS.
Function Hooking
Function hooking works by overwriting the first few bytes of a function with a JMP
instruction to a new function. In Delphi pseudo-code, this would look something like this:
Here, the first line of code is replaced with a goto
instruction to our hooked version. In reality, this means overwriting the first 5 bytes of the function with an assembly JMP
instruction.
Since this method of code hooking only works in Intel CPU’s, we don’t have to take an ARM version into account.
You might wonder if you are even allowed to modify existing code this way. Normally, you cannot, because memory pages with executable code are read-only by default, and trying to modify them will result in an Access Violation. However, with the help of the VirtualProtect
API on Windows, and the mprotect
API on other (Posix) platforms, you can change the access level of those memory pages. Actually, you are only allowed to do that on Windows, macOS, iOS Simulator and Linux. For iOS and Android, we use a different approach (as presented later).
Our goal is to create a function that we can call like this:
HookCode(@TObject.NewInstance, @HookedObjectNewInstance);
This redirects the implementation of TObject.NewInstance
to our own HookedObjectNewInstance
function. This function looks like this:
function HookedObjectNewInstance(const Self: TClass): TObject; var Instance: Pointer; begin GetMem(Instance, Self.InstanceSize); Result := Self.InitInstance(Instance); {$IFDEF AUTOREFCOUNT} TObjectOpener(Result).FRefCount := 1; {$ENDIF} TrackInstance(Result); end;
There are a few things to note here:
TObject.NewInstance
is a (non-static) class method. Like regular methods, these methods have an implicitSelf
parameter. But in the case of class methods, thisSelf
parameter refers to the class, and not to the instance. This is a Delphi language feature that you don’t see much in other object-oriented programming languages, and allows for powerful features like virtual class methods (whichNewInstance
is). In our hooked functions, we need to make any implicitSelf
parameters explicit, as we did in the example above.- The majority of the implementation is just a copy of the original
TObject.NewInstance
method. We just need to use theSelf
parameter explicitly here to access its methods. Also,TObjectOpener
is the common “hack” used to access protected fields and methods of a class. - The last line is were we added our custom code. In this case, it calls a
TrackInstance
routine which adds the object’s class to a hasp map of running instances. I will not show the implementation of this routine here, since that is outside the scope of this article. You can look it up in the sample code on GitHub. One thing to note though is that multiple threads may be creating objects at the same time, so access to the list of instances must be protected with a lock.
The hooked TObject.FreeInstance
method works similarly:
procedure HookedObjectFreeInstance(const Self: TObject); begin UntrackInstance(Self); Self.CleanupInstance; FreeMem(Pointer(Self)); end;
First it calls UntrackInstance
to remove the instance from the hash map. After that follows the original implementation of TObject.FreeInstance
. Note that this is a “regular” method, so the implicit Self
parameter is a TObject
, not a TClass
.
You may wonder if there are better ways to execute the original code, other than to copy its implementation as we did here. There are. One of the ways is by using a library like Microsoft’s Detours. Way back in 2004, I wrote an article in The Delphi Magazine that presented a Delphi version of this library. It would copy part of the original implementation to a so-called “Trampoline” function. Then you would just call that trampoline function to execute the original code. However, using a library like Detours here is overkill, since the hooked methods are small and have not changed in many years. You can find a more recent version of Delphi Detours on GitHub. Another well-known Delphi hooking library that provides similar functionality is madCodeHook. These libraries are Windows-only though…
Hooking on Windows
As mentioned, on Windows you use the VirtualProtect
API to change the access level of an executable piece of memory. We need to modify enough bytes to insert a JMP
instruction. On both x86 and x64 platforms, a jump instruction takes 5 bytes: one for the opcode and four for a displacement value. The HookCode
function starts by changing the access level of these 5 bytes:
const SIZE_OF_JUMP = 5; JMP_RELATIVE = $E9; function HookCode(const ACodeAddress, AHookAddress: Pointer): Boolean; var OldProtect: DWORD; P: PByte; Displacement: Integer; begin Result := VirtualProtect(ACodeAddress, SIZE_OF_JUMP, PAGE_EXECUTE_READWRITE, OldProtect); if (Result) then begin P := ACodeAddress; P^ := JMP_RELATIVE; Inc(P); Displacement := UIntPtr(AHookAddress) - (UIntPtr(ACodeAddress) + SIZE_OF_JUMP); PInteger(P)^ := Displacement; VirtualProtect(ACodeAddress, SIZE_OF_JUMP, OldProtect, OldProtect); end; end;
The original protection level will be stored in the OldProtect
variable, which will be used at the end of the routine to restore to the original level.
If the VirtualProtect
API succeeds (which it always should in this case), then we patch the first byte of the original code with the opcode for the JMP
instruction. Next, we calculate the number of bytes to jump from the location after the JMP
instruction to our hooked function. We calculate this displacement by taking the difference between the address of our hooked function and the original code address (adjusted for the size of the jump itself).
Finally, we write this displacement value as operand to the JMP
instruction and restore the original protection level. Not too complicated actually.
Hooking on macOS and Linux
The version for Posix-based operating systems (like macOS, Linux and the iOS Simulator) is similar, but it uses the mprotect
API instead of VirtualProtect
:
function HookCode(const ACodeAddress, AHookAddress: Pointer): Boolean; var AlignedCodeAddress: UIntPtr; P: PByte; Displacement: Integer; begin AlignedCodeAddress := UIntPtr(ACodeAddress) and (not (GPageSize - 1)); Result := (mprotect(Pointer(AlignedCodeAddress), GPageSize, PROT_READ or PROT_WRITE) = 0); if (Result) then begin P := ACodeAddress; P^ := JMP_RELATIVE; Inc(P); Displacement := UIntPtr(AHookAddress) - (UIntPtr(ACodeAddress) + SIZE_OF_JUMP); PInteger(P)^ := Displacement; end; end;
There are only a few differences:
- Unlike
VirtualProtect
,mprotect
works on entire memory pages. So you need to align the code address to the size of a memory page. We store the size of each memory page in the globalGPageSize
variable. This variable is initialized at startup with the result from asysconf(_SC_PAGESIZE)
API call. Since page sizes are always a power of two, we can simply align the memory address by and’ing it with(not (GPageSize -1))
. - There is no (easy) way to query the original protection level of a memory page, so we cannot restore to that level afterwards.
VMT Patching
Function hooking does not work on iOS and Android, since we are not allowed to change the protection level of executable memory pages on those devices. However, we are allowed to change to protection level of read-only pages containing other data, such as Virtual Method Tables. But before we do that, lets first recap what VMTs are and how they are implemented.
About Virtual Method Tables
A virtual method table is simply a list of addresses to virtual methods in a class. It is used at run-time to lookup the method implementation to execute when a virtual method is called. Every class has its own VMT, and all instances of the same class share the same VMT. The VMT of a class has a complete copy of the VMT of its parent class, and optional additional entries in case the class introduces new virtual methods. The following diagram may clarify this:
This diagram shows part of the VMT for three classes. Each entry in the VMT just contains the address of the implementation (the diagram shows some made-up addresses). As you can see, each class has entries for the NewInstance
and FreeInstance
methods. These methods are first introduced at the TObject
level, but since all classes derive from TObject
, their VMTs all have entries for these methods. The TStream
class introduces new virtual methods like Read
and some others.
The entries for the FreeInstance
method all have the same value (the made-up $00100200 address in the diagram). This means they all share the same implementation. The TStream
class does not override the NewInstance
method, so it has the same address ($00100100) as for TObject
. However, TInterfacedObject
does override the NewInstance
method, so it has a different address ($00100400).
Limitations of VMT Patching
Our goal is to patch the VMTs with the addresses of our hooked NewInstance
and FreeInstance
methods. This provides several challenges:
- Since each class has its own VMT, we need to patch the VMTs of all classes we care about. It does not suffice to just patch the VMT of the
TObject
class! - Some classes may have overridden the
NewInstance
and/orFreeInstance
methods. In that case, we would need different hook versions of these methods since their implementations will be different. - And obviously, VMT patching only works for virtual methods. You cannot use it to hook global functions or non-virtual methods.
The second problem can be addressed by simply ignoring all classes that have overridden versions of NewInstance
and/or FreeInstance
. Fortunately, there are only very few classes that do this, so this should not have a big impact. The exception is TInterfacedObject
. As you can see from the diagram, this class has an overridden version of NewInstance
. Since TInterfacedObject
is widely used as a base class, we want to include this class in our metrics, and so we create a separate hook function for its NewInstance
method:
function HookedInterfacedObjectNewInstance(const Self: TClass): TObject; var Instance: Pointer; begin GetMem(Instance, Self.InstanceSize); Result := Self.InitInstance(Instance); TInterfacedObjectOpener(Result).FRefCount := 1; TrackInstance(Result); end;
The only difference with TObject.NewInstance
, is that it always has a FRefCount
field that must be initialized (at the TObject
level, this field only exists on ARC platforms).
The first issue can be addressed with some RTTI.
Listing All Classes
So we need a list of all classes so we can patch their VMTs. We can use Delphi’s Run Time Type Information (RTTI) tools to help with this. The TRttiContext.GetTypes
method returns an array of all linked types that have RTTI. For each type, we can check if it is a class type, and if so, patch its VMT.
Unfortunately, not all classes will have RTTI. In particular, “private” classes that are declared in the implementation section of a unit will not have RTTI. Fortunately, most classes we care about do have RTTI, so this shouldn’t be too much of a problem.
In our sample app, enumerating all classes and patching their VMTs is performed in the InitializeVMTHooks
procedure:
procedure InitializeVMTHooks; var Rtti: TRttiContext; RttiType: TRttiType; InstanceType: TRttiInstanceType; VMTEntryNewInstance, VMTEntryFreeInstance: PPointer; ObjectNewInstance, ObjectFreeInstance, InterfacedObjectNewInstance: Pointer; begin ObjectNewInstance := @TObject.NewInstance; ObjectFreeInstance := @TObject.FreeInstance; InterfacedObjectNewInstance := @TInterfacedObject.NewInstance; { Get a list of all Delphi types in the application with RTTI support. } Rtti := TRttiContext.Create; for RttiType in Rtti.GetTypes do begin { Check if the type is a class type. } if (RttiType.TypeKind = tkClass) then begin { We can now safely typecase to TRttiInstanceType } InstanceType := TRttiInstanceType(RttiType); { Retrieve the entry in the VMT of the FreeInstance method for this class. } VMTEntryFreeInstance := PPointer( PByte(InstanceType.MetaclassType) + vmtFreeInstance); { Only track classes that didn't override TObject.FreeInstance. } if (VMTEntryFreeInstance^ = ObjectFreeInstance) then begin { Retrieve the entry in the VMT of the NewInstance method for this class. } VMTEntryNewInstance := PPointer( PByte(InstanceType.MetaclassType) + vmtNewInstance); { Only track classes that didn't override TObject.NewInstance or TInterfacedObject.NewInstance. } if (VMTEntryNewInstance^ = ObjectNewInstance) then begin { This class uses NewInstance and FreeInstance from TObject. Hook those VMT entries. } HookVMT(VMTEntryNewInstance, @HookedObjectNewInstance); HookVMT(VMTEntryFreeInstance, @HookedObjectFreeInstance); end else if (VMTEntryNewInstance^ = InterfacedObjectNewInstance) then begin { This class is (ultimately) derived from TInterfacedObject, so we need to hook to a separate version of NewInstance. } HookVMT(VMTEntryNewInstance, @HookedInterfacedObjectNewInstance); HookVMT(VMTEntryFreeInstance, @HookedObjectFreeInstance); end; end; end; end; end;
First, we retrieve the code addresses of the original NewInstance
and FreeInstance
methods. We use these later to check if these methods are overridden by a certain class.
Then we enumerate all types and look for class types. We can safely typecast those types to TRttiInstanceType
. Its MetclassType
property is used to get the TClass
for the class type. In reality, a TClass
is just a pointer to its VMT. This means we can find the entries for the NewInstance
and FreeInstance
methods by adjusting this pointer with a vmtNewInstance
or vmtFreeInstance
offset. These are constants declared in the System
unit. If you want to hook virtual methods for which you don’t have a vmt*
constant, then you can use the TRttiMethod.VirtualIndex
property to calculate the offset.
Next, we check if the contents of these VMT entries match the addresses of TObject.NewInstance
(or TInterfacedObject.NewInstance
) and TObject.FreeInstance
. If not, then the class has overridden one or both of these methods and we skip it. If so, then we patch its VMT by calling the HookVMT
function. Since this function has to change the protection level of the memory page containing the VMT, we need different implementations for Windows and non-Windows systems.
VMT Patching on Windows
On Windows, we need to use the VirtualProtect
API again. After that, patching the VMT is just a matter of changing the value of a VMT entry:
function HookVMT(const AVMTEntry, AHookAddress: Pointer): Boolean; var OldProtect: DWORD; begin Result := VirtualProtect(AVMTEntry, SizeOf(Pointer), PAGE_READWRITE, OldProtect); if (Result) then begin PPointer(AVMTEntry)^ := AHookAddress; VirtualProtect(AVMTEntry, SizeOf(Pointer), OldProtect, OldProtect); end; end;
Not much to it.
VMT Patching on Posix
On all other systems (macOS, iOS, Android and Linux), we need to use the mprotect
API instead and make sure we align to full memory pages again:
function HookVMT(const AVMTEntry, AHookAddress: Pointer): Boolean; var AlignedCodeAddress: UIntPtr; begin AlignedCodeAddress := UIntPtr(AVMTEntry) and (not (GPageSize - 1)); Result := (mprotect(Pointer(AlignedCodeAddress), GPageSize, PROT_READ or PROT_WRITE) = 0); if (Result) then PPointer(AVMTEntry)^ := AHookAddress; end;
Wrapping Up
I hope this article showed a legitimate use case for code hooking. I would suggest to only use code hooking for debugging and instrumenting purposes. In our upcoming remote logging library, code hooking will only be enabled on-demand, and only in DEBUG
builds. So it will not have any effect on release builds at all.
Maybe you’ll come up with some other good uses for code hooking. Let us know if you do!
2 thoughts on “Cross-Platform Code Hooking”