December 17th, 2007 Greg
Since we last spoke, Vaughn has seen very little action. The problem is not the week or so of down-time he’s experienced, but the fact that his virtual environment lives inside my computer. While the virtual PC itself is about as safe as a chainsaw-piñata, the internet connection to which it is bridged is protected by the firewalls and antivirus of my computer and router. This is such an elementary design-flaw that I was tempted to keep it quiet, but let’s move on and never speak of it again. The problem has gone unfixed since I discovered it (a fortnight ago) and, as this is the lowest-priority project on my agenda, it will probably remain so for a little while. At least until I work out why my router’s demilitarised-zone setting seems to do absolutely nothing.
Anyway, a piece of malware found its way into my possession via a different medium, and so I kept my side of the bargain and gave it to Vaughn. But don’t get too excited - this adware trojan is so uninteresting that none of the antivirus companies bothered to give it an identity any more unique than ‘Generic Delphi Downloader’. (yes, that’s right, Delphi). I still don’t know where it came from (after China).
The original infection was on my rarely-used laptop. First came the mysterious popup advertisements, then the rogue ‘Add Favourite’ dialog boxes. Both coincided with the installation of Internet Explorer 7 and so - being a long-time Firefox convert - I didn’t take too much notice. But when the activity persisted even after IE7 had been removed from the picture, I took a peek at my process list. Windows’s Task Manager showed nothing suspicious - mainly because it is borderline-useless - but OllyDbg’s attach menu showed up seven instances of svchost.exe. If you know anything about Windows Services, you’ll know that these are nothing more than user-mode process that house DLL-based shared-service modules. At least that’s true of the six instances resident at “%systemroot%/system32″, but the one at “%ProgramFiles%/Internet Explorer” was rather more spurious.
A brief analysis showed that this file (with the ‘hidden’ and ’system’ file attributes set) was compiled by Borland Delphi 7.0 and that despite its obviously trojan nature, the process - true to its name - actually did host a Windows Service. Only, a bad one. After familiarising myself with the verbose assembly produced by Delphi (like any other very-high-level language) and the fastcall-esque nature of the internals, I produced a flow-chart of the trojan’s life-cycle.
- Initialise the Delphi run-time library.
- Get the executable path and set the file attributes to ‘SYSTEM | HIDDEN’.
- Look for a service named ‘windownetpker’. If none such is registered with the system, create it.
If you look for this service in the administrative tools, you’ll see it just points straight back at the same executable file. It hides under the name ‘Window Image Worker’, which is presumably supposed to resemble the legitimate ‘Windows Image Acquisition’. Naturally, it is set to auto-start.
- If the service isn’t running, start it.
- Check the user-name of the process environment and determine how the process was launched.
When a program is launched manually, this user-name is that of whoever is logged in. But when launched as a service, GetUserName returns ‘SYSTEM’, or in certain cases ‘LOCAL SERVICE’. By comparing against these two the trojan works out whether it is supposed to act as a service or just a plain and filthy malware executable.
- In the normal-user case, clean up and quit. Otherwise, enter the secondary phase, idling in a service event-handling loop.
It is made quite clear in the MSDN documentation that, when launched, a service should not do anything before calling StartServiceCtrlDispatcher. Not only does our trojan violate this, but it also goes through all the unnecessary work of installing and starting the service, even when it probably is the service. Now, I’m quite happy for people to infect my computer without consent but they breaking the rules is just plain rude.
The call to CreateServiceA:

The service payload isn’t much more interesting. It opens up a few UDP ports (starting from 1025), establishes a TCP connection with cpk4.easy78.cn (HTTP) and waits for the spam to come rolling in. When such an item does come along, the service displays a popup (usually offering ‘great savings’ on something or other) or attempts to add a page to your IE Favourites. The mundaneness was all too much for me and so I didn’t probe much further, but the trojan doesn’t seem to use any suspicious APIs and it even gives you the offer to ‘Cancel’ the popup ads. A little courtesy goes a long way.
That’s all I have to say about ‘Generic Delphi Downloader’ and any other ‘generic’ downloaders that I may run into in future. Here’s hoping that something truly wicked finds its way to me before too long. And for the record, I don’t have any problems with Delphi. It just struck me as an odd choice for the task.
Posted in RCE | No Comments »
December 11th, 2007 Greg
Let me tell you about a problem I ran into a couple of years ago, and the solution I ended up with. If you’ve ever heard of ArmInline, then this is the story behind its Nanomites tool.
The Background
If you’re not already aware, Armadillo is a commercial anti-cracking software scheme for Windows: you buy a license, throw your exe (or DLL) at it, and you end up with a new, protected, file. This new program does just what the old one did, but it’s far harder to reverse-engineer. As the attacker, our goal is to remove the protection so that we can have our wicked way with the program inside.
Among other things, Armadillo employs a system known as Debug Blocker. Briefly put, this causes the program to create two instances whenever it is run - we call them the ‘parent’ and ‘child’ processes. The parent acts as a user-mode debugger, nannying the child (which does all the real work) to make sure that no bad guys can get too close. This system was fairly easy to defeat - all you needed to do was detach the parent process’s debugger at an appropriate moment and attach your own.
So to prevent this happening, the developers of Armadillo invented what they call Nanomites. When the protector is installed on the program, user-marked parts of the code section are scanned for jump instructions (JZ, JNZ, JBE and so on), and a database is created containing the address, type and offset of each. These jump instructions are patched over with ‘INT 3’s (user-mode breakpoint interrupt) and the database is put in the hands of the debugger. The idea is that the child process will raise a debug-break exception whenever one of these instructions fires, whence the parent steps in, grabs the thread context, looks up the appropriate jump in the database and sets the child process on its merry way.
This works very well. If the Nanomite-enabled code regions are chosen carefully then performance is virtually unaffected, and any attempts to sever the child-parent bond results in an immediate and unrecoverable crash. Even worse for the would-be cracker, the information needed to recover the code to a working state is locked up in this database, which is encrypted several times over and accessed only by heavily obfuscated, anti-debug-ridden routines. Reverse-engineering this would be a royal pain.
Getting the table
Many successful efforts had been made to reverse this encryption process and produce a working Nanomite table, but with each offence from the crackers came a counter-offence from the developers and pretty soon there were several variants of the Nanomite system floating around. It was time for a unified approach. Being lazy as I am, I insisted on making the computer do as much of the work as possible. So the plan was this:
Write a program to debug the parent process. That is, debug the debugger. With this level of control, it would be reasonably easy to fool the parent into processing Nanomites at our will. Three function hooks need to be created in the parent process:
- WaitForDebugEvent - This is the primary source of information for any debugger. With a hook in here, we could forge any conceivable exception and let the parent attempt to handle it.
- GetThreadContext - When alerted of an INT 3 exception, the parent calls this to find out where the Nanomite was struck. Another hook and we can feign a Nanomite hit at an arbitrary address.
- SetThreadContext - After ploughing through that obfuscated code, the parent will have decided where execution should continue from, and enforces its will by setting the thread context. This last inside-element will help us determine the details of any given Nanomite.
From here the algorithm writes itself. We find all instances of the byte 0xCC (INT 3) in the code section, spoof an INT 3 exception at each of these points and watch how the parent responds. By setting the EFlags register to take different values for the same Nanomite address, we can determine under which circumstances the jump occurs and hence exactly which conditional jump is being emulated. A few switch-statements later and we have a complete Nanomite table, without having to step through a single instruction of Armadillo’s code.
The Real Problem
After all that work, it we can just assemble all the jumps from the database into place and dump the process. That’ll be sure to remove all the Nanomites, right? Well, yes, but it turns out that something far nastier happens in the process. See, when Armadillo creates the table in the first place, it doesn’t just store the addresses of the jumps but also creates some false entries at addresses that happen to legitimately contain a 0xCC byte. This means that a completely unrelated ‘CALL DWORD PTR:[0043CC7A]’, for instance, will produce a false entry in the table. This entry will never be needed, as the 0xCC is in the middle of an instruction and can’t trigger an exception under normal circumstances, but those clever developers have put us in a real dilly of a pickle.
There is simply no sure-fire way to weed out the ‘false Nanomites’ from the real ones. Without defeating the object of our endeavour and writing a purpose-built debugger to do exactly what we didn’t want the parent process doing, how can we fix this?
The Solution
It took a little bit of brainstorming, but this is where vectored exception-handling comes to the rescue. This little-used feature of the Win32 API allows for installation of a process-wide exception-handler that doesn’t depend on stack-frames. They are of limited use in the real world, but just perfect for our needs for the sole reason that the VEH chain is triggered before the SEH chain.
Suppose that we’ve managed to dump and patch the program (and fixed the imports, encrypted pages, code-splicing) so that it runs without the parent. Suppose further that the original program didn’t use any VEH. Then everything works great until a Nanomite triggers: a debug-break fires, promptly falls through all the structured exception-handlers and the process crashes and burns. But if we had a VEH installed, we’d be given a chance to deal with it.
So by adding a new section to the exe containing the Nanomite table along with some code, we can save the day:
Redirect the entry-point to our code, which installs the VEH and jumps straight to the original entry-point.
Have the VEH handle only INT 3 exceptions, searching the database and patching in the appropriate jump instruction when necessary.
That nearly takes care of everything. The only remaining problem is for programs that use VEHs of their own. It’s unlikely that anybody would implement their own exception handler to deal with breakpoints, but conceivable for a catch-all scenario to ruin our best-laid plans. So the last piece of the puzzle is to hook RtlAddVectoredExceptionHandler, telling it to remove our handler before installing the client’s, then replace it afterwards. In this way, the Nanomite-handler is guaranteed to be the first exception-handler on the scene (be it structured or vectored), and existing functionality is unaffected.
Posted in RCE | 4 Comments »
December 9th, 2007 Greg
The story so far: Part 1, Part 2, Part 3, Part 4.
The remainder of this project consisted of developing the interface and injection DLL in parallel. This all went fairly smoothly, so I’ll present a summary of the workings.
- Two programs are involved:
- DLLBugger.dll - a C++ toolkit DLL designed for injection into iTunes. It sniffs out DRM keys as they are passed to the MP4-playing subroutine, exposes a variety of methods for inter-process communication, and invokes iTunes’s decrypter function when ordered to do so.
The peculiar name is something of a relic from the DLL’s twin program, who unfortunately didn’t make it this far.
- DisaRM.exe - a C# GUI responsible for locating the iTunes process (launching it, if necessary), injecting DLLBugger, parsing the database, asking the user which tracks to unlock, and overseeing the decryption process as performed by DLLBugger within the iTunes address-space.
- When launched, DisaRM immediately loads DLLBugger into its own address-space. Next, it launches iTunes.exe and acquires a handle to the process. From here DLLBugger is injected into iTunes. Having the DLL present in both processes makes for an ‘easy’ way to communicate data back-and-forth (using a shared PE segment). As it turned out, there was no need to use the Win32 debugging API and so DRMBugger.exe outlived its usefulness.
- Because I had never written anything involving inter-process communication before, I was quite unprepared for the volume of work required to make this shared-segment approach successful. So it wasn’t before familiarising myself with semaphores and planning out how everything could be made to work with exchange limited to POD, that a rudimentary communication state-machine was implemented.
- DLLBugger exports twelve functions:
bool CreateHooks(void* decrypt_call, void* decrypt_func, void* cfw_call);
DWORD InjectMain(void *lpParam);
void* GetRemoteProcAddress(LPCSTR lpModuleName, LPCSTR lpProcName);
bool KillRemoteThread(HANDLE hRemoteThread);
bool RemoteDecrypt(wchar_t* in_name, wchar_t* out_name, RijndaelKey key);
RijndaelKey GetLastKey();
void SetStoredKey(RijndaelKey key);
void RemoveHooks();
long GetLastTrackFirstLength();
char* GetLastTrackFirstData(long* buffer_size);
WCHAR* GetLastAudioFileName();
bool PollNewFile();
- Passing hard-coded addresses (I know, yuck), DisaRM invokes CreateHooks in the iTunes process. This installs hooks in Kernel32!CreateFileW, iTunes!Decrypt and iTunes!_PlayMP4+_CallToDecrypter (the point at which the previous function is called). Now any attempts to load a track or decrypt a chunk of AAC will be intercepted.
- After DisaRM has loaded and displayed the protected subset of the iTunes library, the user chooses which tracks to unlock and hits the ‘Get Keys’ button. This triggers DisaRM to launch the first track into iTunes, causing a call to CreateFileW to be intercepted by the DLL. The arguments are stored and execution is allowed to continue. With this, DLLBugger has a good idea which track will be playing at any given time. Once iTunes has loaded the protected MP4 file, determined its decryption key and done whatever else it does, it necessarily makes a call to the Decrypt function. Naturally, this too is intercepted by our DLL and we begin to generate a mapping of file-names to DRM keys. Sanity checks exist in the form of GetLastTrackFirstLength, GetLastTrackFirstData, GetLastKey, and GetLastAudioFileName. Once the confidence level is high enough (as all this business is done asynchronously by iTunes and it isn’t safe to assume too much about the order of events) DLLBugger reports back to DisaRM, and the next track is launched. In this way, DisaRM learns the keys for each file it needs to decrypt.
- Provided everything went smoothly, the DisaRM displays the keys alongside the track name, artist, album and such (this was initially useful for debugging purposes, but I left it in because it looks kinda cool). The user gets a chance to reconsider before hitting the ‘Remove DRM’ button. Because what DRM-removal tool would be complete without one?
- The decryption process itself is relatively straightforward. A single call to RemoteDecrypt from DisaRM creates a new thread in iTunes, which opens up the MP4 file and parses the data to find the stbl atom. This part of the file lists the offsets and sizes of each chunk of AAC data (’cause they come in chunks, you know) among other things. For each chunk, the thread calls the Decrypt function, passing the appropriate offset, size and key. With the stream decrypted, some offensive atoms are removed and the file is made to look like it never had any DRM in the first place. DLLBugger saves the result to disk and that’s that.
I’ll take this opportunity to apologise to any Mac users who were hoping to learn something about the iTunes DRM from this series. Clearly, I didn’t reverse-engineer the protocol to any substantial degree and so none of the methods described port very far away from Windows XP. Maybe another time.
A few things were learned over the four weeks I spent. Here are just a few:
- Writing an inter-process communication framework is not a task to be taken lightly, no matter how little of it you think you need.
- C# is excellent for GUI development and awful at low-level hackery. But when you have a shiny new hammer, everything starts to look like a nail.
- Over-engineering a solution is as bad as under-engineering it. I’m sure I could have saved myself a fortnight if I hadn’t bothered writing that debugger I didn’t need.
- A profiler can be an excellent RCE tool. If I’d only thought to profile a few seconds of each of m4p and m4a playback, I could have isolated the decrypter function in minutes, rather than days.
- QuickTime is horrible.
So that marks the end of this series of posts. I can assure you, though, that I haven’t nearly reached the end of the story.
Posted in RCE | 2 Comments »
December 5th, 2007 Greg
The Win32 API function IsDebuggerPresent is commonly used in rudimentary anti-hack techniques. It’s generally safe to conclude, if somebody is debugging your program, that there’s some foul play going on. Now, once you’ve convinced yourself that this really doesn’t matter, allow me to explain the guts of this Kernel32 function. Here’s a disassembly:
7C813093 MOV EAX, DWORD PTR FS:[18]
7C813099 MOV EAX, DWORD PTR DS:[EAX+30]
7C81309C MOVZX EAX, BYTE PTR DS:[EAX+2]
7C8130A0 RETN
That’s really all there is to it. The first line gets a pointer to the thread environment block (often abbreviated to TEB). This is a lump of system-maintained memory that keep track of per-thread data. At offset 0×30 into the TEB is a pointer to the process environment block, or PEB. The second line of the disassembly loads this address into the EAX register. Last of all, it reads and returns the third byte of the PEB (the ‘BeingDebugged’ member) as a boolean value.
This code is very simple to implement manually, and doing just that is a quick and easy way to thwart the attempts of those out-of-the-loop crackers who attempt to patch the IsDebuggerPresent function itself. But equivalently, the disassembly betrays a way to render IsDebuggerPresent truly useless:
void HideIsDebuggerPresent(bool hide) {
unsigned char being_debugged = (hide ? 0 : 1);
__asm {
MOV EAX, FS:[0x18]
MOV EAX, [EAX + 0x30]
MOV CL, being_debugged
MOV BYTE PTR [EAX + 2], CL
}
}
So without the need for any messy code patches, we can hide the presence of a debugger from IsDebuggerPresent - or anything that reads from PEB->BeingDebugged - by executing this function in the process’s address-space. Now, that won’t fool any programs clever enough to read in the value of PEB _EPROCESS->DebugPort (which can’t be overwritten from ring3) or that use CheckRemoteDebuggerPresent (which requires XP SP1 or later), but it’s nice to know.
Posted in RCE | 2 Comments »
December 3rd, 2007 Greg
How should I best spend my valuable time, money and man-power so as to keep those dastardly crackers at bay, and the money rolling in? There are so many commercial protection schemes out there; some cheap and some certainly not, but they all seem to be compromised… Is it really worth all that money? Or should I maybe set one of my most talented programmers on the task for a couple of weeks? At least that way, my security will be truly unique. All I know is that the more pirated copies of my game are made, the more of my revenue goes down the drain!
Stop! If this sounds familiar then I beg you to hear me out, even though you probably won’t like it. I’ve spent plenty of time reverse-engineering games in the company of the black-hats and I’ve seen the situation from both sides of the fence. Here’s your chance to learn from the mistakes of thousands before you and save some money in the process. At least, that goes for everybody to whom this next paragraph doesn’t apply.
If you’re writing a multiplayer client-server game and your interest is in keeping the hackers from ruining everyone else’s fun, then the solution is simple (although it needs to be understood from the beginning of development): Any code that can be taken advantage of must go on the server side. No matter how much time and intelligence you have, the hacker collective has more, and so sensitive code in the client application is a disaster waiting to happen. For example, if the client program is responsible for his own collision detection, then you are responsible when your paying customers complain about their enemies walking through walls. Once you’ve made sure that your design ensures the client can’t cheat, no matter what data it sends the server’s way, your job is done.
But for the rest of you, who are concerned about illicit copying, things are even simpler. Let’s suppose I am a software pirate, for a moment. Then here are some facts:
- If I can run your program, then I can disassemble it. If I can disassemble it, then I can reverse-engineer it.
- If your game isn’t worth hacking, I won’t try, and your efforts will go to waste.
- If your game is worth hacking, then I will try to hack it, and given sufficient time and motivation I will succeed. At this point, your protection scheme is worth nothing. It only takes one successful hack, and the cat is out of the bag.
- There are many hackers working on any popular commercial protection scheme at any moment, and once we’ve cracked one sample, the method extends with relative ease.
- The more effective a protection is, the more popular it will become and the more attention it will get from us bad guys. Together with the last point, this places a limit on the value of any commercial protector.
- It is very, very easy to slip up when writing such a system. All but the best-thought-out, most thoroughly-tested schemes (generally written by those with years of experience) boil down to a single conditional jump somewhere along the line. Once I find this Achilles’ heel, I win.
And here are some commonly upheld non-facts:
- If somebody is prepared to steal your game, then they are prepared to buy it in the absence of a crack.
- Protection schemes reliably keep a game safe from crackers long enough to ride out the majority of its public consumption.
- Protection schemes reliably keep a game safe from crackers for at least a few days.
So both buying and home-brewing a protection are bad ideas? Not at all, they’re just not as good as they sound. A shop-bought PE-protector is an excellent investment if you deem its level of protection worth the value on the price-tag. But you’ll be investing in a false economy by fooling yourself into believing that this level of protection is anything more than ‘a week’s head-start from release’ or simply ‘keeping honest people honest’. And home-brewed protections are a very good idea, just keep them simple. If it’s going to be cracked anyway, then keep it as simple as possible. Why not restrict yourself to the five man-minutes it takes to check for the presence of your game’s CD?
- Enumerate the disc-drives on the system.
- Find the one that contains your game’s CD, by checking for a specifically-named file
- Check the file’s attributes, to make sure that it is read-only.
If something goes wrong at any point in this sequence, then you simply demand the user puts the CD in the drive, and loop. Now, even a half-witted hacker will circumvent this in minutes, but it’s enough to keep the layman in check, and that’s all you want.
Update: I hate CD-swapping as much as the next man, and so if you can bear it I’d recommend you avoid this too. The ideal solution is a server-based disposable serial database system, but most developers don’t have a server lying around to dedicate to this cause. If you do decide to use a CD-check, then at least be reasonable and disable it after a few days of successful use.
So now that you’ve saved yourself the hundreds or thousands of dollars you were about to give to Silicon Realms or Macrovision, you can afford to spend more on getting your game perfect, and that is what the consumer is happy to pay for. All that’s left is to price your game appropriately, and this means doing a little market research. Once we remove from the equation the user who pirates everything (the type you can do nothing about and should disregard), the situation is extremely simple:
If the user wants and can afford your game, then he’ll consider buying it.
If he has pirated it then he didn’t consider it worth paying for.
Posted in Game Programming | 11 Comments »
December 1st, 2007 Greg
Success was close enough to smell, but not to taste. Succeeding in a debugger with all your (razor-sharp) wits about you, and teaching a computer how to do the same are two very different things. DRMBugger and DLLBugger were still in a state of throwaway code and the project had almost nothing in the way of an interface.
This is where Visual C# came into play. While I’m no expert (and I certainly wasn’t at the time), anybody with conversational C++ can quite quickly pick it up and produce a convincing GUI in no time. But with any new toy comes the compulsion to wear it out, and I soon found myself wasting a week getting the DisaRM GUI perfect. I really have nobody to blame but myself, but the friend who suggested I mimic the iTunes GUI (mentioning no names, Dave) helped to send the project rolling in entirely the wrong direction. Unfortunately, I wasn’t quite prepared for the OOP-mania that is C# and so the controls I created are a little too interdependent for me to release their source code, and that’s a shame, because my iTunesListBox, iTunesScrollBar and iTunesProgressBar classes are true works of art.
With that distraction out of the way, I got to porting the DLL injection code from C++ to C#. If you’ve ever used the Win32 API from C#, you’ll know how much of a pain it is to translate all those function prototypes (somewhat reminiscent of VB 6) and you’ll have some sympathy for me having to do thirty of the bastards. If I had thought things through beforehand then I’d have left this close-to-the-metal business in a C++ DLL where it belongs, but we live and learn.
Getting the iTunes library track-listing and extracting the DRMed tracks was a lesson in elementary XML-parsing (take a look in %MyMusic%/iTunes/iTunes Music Library.xml if you don’t believe me). The next step is to extract the DRM keys.
If I’d taken more time to debug I’m sure a cleaner way to do this would have presented itself, but I settled for launching each of the encrypted files into the Windows shell (so that iTunes begins to play it) and extracting each key via a hook installed in the iTunes process’s decrypter function. The keys are piped back to DisaRM and everybody’s happy (with the possible exception of the user, who has just heard the first second of each protected track in their library while seeing the iTunes window frantically pop in and out of focus). It won’t mean too much without context, but here’s the rather confusing C++ source for the hook, from DLLBugger. The unnecessarily complex conditional statement at the start is checks the stream contents to make sure that it is indeed drawing the key from the right track.
long __cdecl HookDecrypt(char* buffer, long length, RijndaelKey** key) {
if (new_file) {
// Track Changed
long copy_length = min(length, LENGTH_AAC_DATA_TO_SAMPLE);
std::fill(port_last_track_first_data, port_last_track_first_data + LENGTH_AAC_DATA_TO_SAMPLE, 0);
std::copy(buffer, buffer + copy_length, port_last_track_first_data);
port_last_track_first_decrypt_length = length;
new_file = false;
port_new_file = true;
}
port_last_key = **key;
return (hook_decrypt_func(buffer, length, key));
}
With this done, all the ingredients are present. DisaRM knows the locations of the protected files and the keys needed to decrypt them. Next is to manipulate iTunes into doing the dirty on its own DRM.
Posted in RCE | 2 Comments »
November 29th, 2007 Greg
I was recently somewhat surprised to find that there is really no C++ way to resolve a virtual function to its address at run-time. Admittedly, there is no good reason why anybody would morally need to do this, but when you’ve already lowered yourself to patching another process’s own code without consent, it seems like a very small crime.
Pioneers of such hackery have already established concrete methods for calling virtual functions from inline assembly, but these methods don’t quite stretch to getting the address in pointer form. So, if for no reason other than to convince you that it’s a lot of hassle, I present a miserable bit-chop hack to do just this.
Read More…
Posted in Game Programming, RCE | 1 Comment »
November 27th, 2007 Greg
Update: See the post for the new version.
The theme of the moment is DLL hooking, and so I thought I’d present an applied example. I already explained how Fraps works, and since I’ve recently been roped into writing a similar tool for a stranger, I thought I’d share the wealth. There isn’t much new material here, but people like examples with source code, so you can download the DLL source (C++) from the project page.
If you don’t know how to inject this DLL into a foreign process, then you’ll need to read my previous post or wait for the injection framework I’m working on. But once it’s injected call the Initialise method, via CreateRemoteThread or otherwise, to install the hooks. It works with any program that uses IDirect3DDevice9::Present (or IDirect3DSwapChain::Present) to render, which is probably all of the DirectX 9 games. Similarly, invoke Release to remove the hooks. The source is fairly self-explanatory, with a few exceptions.
- It’s not safe for 64-bit consumption, though this should be obvious.
- While there’s no reason it can’t be made to work with Unicode, I’ve written everything in ASCII, for simplicity.
- By default, the DLL will increase its own reference count to prevent it being unloaded prior to termination of the host process. This is because there is a small risk of the DLL being unloaded by one thread, while a hooked function in another returns to the now dead memory. I figured that it’s best to waste a little bit of everybody’s memory than to crash unnecessarily.
- The d3d9.dll function addresses (and prologues) are hard-coded, or at least their offsets are. While this may look very unprofessional and rather risky, I can assure you that it’s quite safe. The alternative would be to hack up some virtual-function tables and that’s a whole other story for a whole other post.
- You may notice that the compiled DLL is dependent upon D3DX. This isn’t necessary for the hook itself, but I used ID3DXFont in my example for demonstrative purposes. The only reason I mention this is that there is no way to guarantee the existence of any D3DX DLLs on a DirectX 9 machine, and distributing them yourself is in violation of the DirectX Runtime EULA. So if you happen to need to distribute this code, you’ll either need to carry the huge runtime installer around, or avoid using D3DX altogether.
Update:
- The soft-hooks used here will cause problems with PunkBuster if applied to any of its monitored functions. If you need to do this then you’ll have to be a bit cleverer.
- The source assumes that the graphics device will never become invalid. If you suspect that this isn’t the case (which will be true for any full-screen game at a minimum) then you’ll need to add the appropriate sanity checks (see IDirect3DDevice9::TestCooperativeLevel) before attempting to render anything, lest you want to crash and burn.
Posted in Game Programming, RCE | 35 Comments »
November 24th, 2007 Greg
When I mention my reverse-engineering feats or failures to technically-minded friends, I tend to get one of a few responses. Not uncommon is ‘I wouldn’t know where to start.’ Well, I know it’s just a figure of speech, but I always start in the same place: PEiD.

Many programs are built with third-party post-applied protection schemes, or are compressed with a packer to reduce the file size. The basic workings are the same - you run what you think is the program, but unknowingly execute the unpacker’s code, which decompresses or decrypts the original exe in memory and executes that once it’s done. The fact that most people are completely unaware of this process goes to show that these protectors and packers do at least half of their job well. While some protection schemes are better than others, any such packer will have the effect of turning a trivial hack, crack or patching job into a relative pain in the neck.
Rather distinctly, the odd occasion comes up where you’d like to know which compiler and/or linker was used to produce a binary, as the different options have their own quirks and particulars. Differentiating your Borland C++ Builder 5 from your Microsoft Visual C++ 6 can save you a little time and effort, if you need to fiddle with the ins and outs of stack-frame prologues or function indirection tables, for example.
Any tool that modifies a PE (exe or DLL) has to conform to strict standards, so as to keep the program functional, but will also have the effect of leaving behind a mark. These tell-tale marks are aptly known as PE fingerprints, and PEiD is designed to sniff out these fingerprints and give you the lowdown. So if I decide that I want to tweak the interface of my PostScript viewer, or to investigate how my anti-spyware tool enumerates processes, I only need to drag-drop the respective exe files into PEiD and I immediately know that GhostScript 4.7’s gsview32.exe was built in Microsoft Visual C++ 7.0 and that AdAware SE Personal 6.20 is compressed using ASPack 2.12. This tells me that the former will be very easy to analyse, whereas the latter will put up something of a fight, and that I’d perhaps be better off spending my time on Google.
So PEiD is something of an unsung hero, in that it only ever runs for five seconds at a time, perhaps once a week (at least on my computer), but yet when used properly it can have a profound effect on the development of any RCE project. And it is for this reason that I hereby sing its heroism for all to hear.
Posted in RCE | 2 Comments »
November 22nd, 2007 Greg
One of the topics that I often find myself bluffing through on GameDev is Direct3D hooking. In particular, how to display an overlay of your own on the window of another Direct3D program, often a commercial game. It’s pretty clear that the simplest method would involve somehow hooking the call to IDirect3DDevice8/9/10::Present, but the details are a little sketchy, particularly when you throw anti-hack systems into the mix. To be quite honest, I wasn’t sure I’d have been able to write a scalable hook that wouldn’t cause any incompatibilities - at least not without doing some very immoral things. So when I found out that Fraps has been doing exactly this for years, and that it somehow manages to avoid angering PunkBuster and other such systems, I decided to investigate.
What is Fraps?
Simply put, it’s a profiling and video-capture tool for PC games. As well as providing the ability to capture a video stream from any Direct3D 8/9/10, DirectDraw or OpenGL program, it can display real-time performance statistics (frame-rate and such) by means of an overlay on the game’s window (or full-screen display).
Remarkably, Fraps seems to handle the hacky side of all this automatically, with no fuss. It will dig its claws into any compatible game, whether it was started before or after Fraps. Yet if you investigate the state of DirectX and OpenGL’s DLLs on disk (in system32), they remain untouched at all times.
So how does it work?
Well it could be a whole lot worse, but I wasn’t thrilled to find out that every process running on my system had an instance of Fraps.dll loaded. It’s not so much the 106kB footprint that bothers me, but the performance and stability concerns. Anyway, my suspicions that this DLL was ‘infecting’ all processes via a system-wide hook were soon confirmed when OllyDbg caught Fraps.exe making a call to SetWindowsHookEx.
SetWindowsHookEx(WH_CBT, &FrapsProcCBT, hModuleFrapsDLL, 0);
So in one fell swoop, Fraps guarantees that Windows will load a copy of Fraps.dll into every process currently running, as well as those created in future. Moreover, the function FrapsProcCBT will be called by that process each time it attempts to create, destroy, show, hide, move or resize a window. Now this may seem like the perfect solution to a difficult problem, but it’s rather wasteful considering that most processes run window-driven interfaces, and only very few involve DirectX or OpenGL.
How should it work?
Now I haven’t tested this, but a much cleaner way to achieve the same goal would be for Fraps.exe to periodically poll EnumProcesses and EnumProcessModules, so as to determine the processes that actually need hooking. Installing the hooks into these specific processes would require no more work but would save the OS some effort and limit the worst-case-scenario disaster-zone to DirectX and OpenGL applications, which is a considerably smaller domain than almost everything. In Fraps’s defence, the code makes extensive use of IsBadReadPtr and suchlike, and I’ve never heard of it causing any trouble, but nevertheless, the best way to prevent your DLL from crashing someone else’s program is to make sure it never gets loaded.
What does the hook do?
All events but window activation and focus-acquisition (HCBT_ACTIVATE, HCBT_SETFOCUS) fall through the hook chain (CallNextHookEx). But in either of these cases, Fraps.dll goes on to look for a supported graphical interface.
Rather achronologically, the first thing it does at this point is a bunch of string-processing to capitalise and isolate the executable image’s file name. Presumably, the writers would have used GetProcessImageFileName, but bringing psapi.dll along to the party for this reason alone would be borderline-criminal.
Next, GetModuleHandleA is called on opengl32.dll, d3d8.dll, d3d9.dll, dxgi.dll and ddraw.dll. If all return NULL, then there is no work to do and the function returns. But if any of these modules are found, Fraps.dll gets straight to installing its function hooks. The hooks are simply JMP operations assembled ad-hoc at the beginning of IDirect3DDevice9::Release and Present (and presumably the equivalent functions belonging to the other APIs). Now, I was rather surprised to find that PunkBuster has no problem with such crude, unsubtle behaviour, but it’s possible that there is some agreement between the two developers.
That’s almost everything, but one problem remains. Nothing will be drawn to the screen unless d3d9!Present is called, and installation of the patch renders the original functions useless. It is for this reason IAT hooking is preferable to the patch method being used here, but from what I gather PunkBuster periodically ‘fixes’ the IAT of its client process, so that’s no-go. Fraps gets around this little inconvenience in the messy, but reliable way that you’d expect: each time the proxy Present function fires, it removes the patch, calls the original function, and restores the patch.
Here’s some untested C++ concept-code I threw together for the IDirect3DDevice9::Present case. The unbraced snippet installs a patch at address_d3d9_Present (use GetProcAddress) to redirect it to PresentHook. I’ve omitted the patch-removal code, along with a whole load of sanity-checking that really shouldn’t be left out in such a risky situation. Don’t use my laziness as an excuse.
// Calculate offset
DWORD from_int = reinterpret_cast <DWORD> (address_d3d9_Present);
DWORD to_int = reinterpret_cast <DWORD> (&PresentHook);
// This version of the JMP instruction takes an address relative
// to the current address, and it is 5 bytes long
// So the relative offset is 'to - from - 5'
// Don't worry about the unsigned DWORD underflowing
DWORD offset = to_int - from_int - 5;
// Assemble the patch at the beginning of Present
const unsigned char jmp = 0xE9; // The opcode for a 32-bit rel JMP
unsigned char* ip = reinterpret_cast <unsigned char*> (address_d3d9_Present);
*ip = jmp;
*(reinterpret_cast <DWORD*> (ip + 1)) = offset;
HRESULT __cdecl PresentHook(const RECT* pSourceRect, const RECT* pDestRect, HWND hDestWindowOverride, const RGNDATA* pDirtyRegion) {
IDirect3dDevice9* device;
__asm MOV device, ECX;
// Do anything that needs to be done before Present gets called
// Remove the Present hook
HRESULT return_value = Present(pSourceRect, pDestRect, hDestWindowOverride, pDirtyRegion);
// Reinstall the Present hook
// Do anything that needs to be done after Present gets called
return return_value;
}
Posted in Game Programming, RCE | 8 Comments »