D3DLookingGlass v0.1

May 5th, 2008 Greg

The topic of debugging full-screen Direct3D applications came up a little while ago. If you’ve ever tried it on a single-monitor setup (or even multi-monitor if the app wasn’t designed to handle it) then you’ll know how much of a pain it is. Windows just can’t handle focus being stolen from a suspended exclusive-mode program. The solution’s exactly what you’d expect - to intercept the relevant window- and device-creation calls and coax the debuggee into running in a window. This works, but fiddling with the calls manually each time you restart the process quickly gets boring. So here’s my first attempt at a generic solution.

D3DLookingGlass is a DLL which, if injected into a Direct3D process early enough, will make sure that all video devices are created in windowed mode, allowing the hosting process to coexist with a debugger without any bother. If you can inject this DLL into the target process before the first call to CreateWindow, then everything should go smoothly. I think. Any later than this and your mileage may vary.

I’ve also written a ‘loader’ program that installs the DLL as a system-wide CBT hook, so that you don’t need to inject it manually. This kind of worked for my limited set of test-cases, but there seems to be no Windows-hooks method of injecting a DLL globally and beating the call to CreateWindow. Windows installs the DLL containing the hook at the latest possible moment for its function, and I can find no type of hook that needs to be around before a window is created. I’d love for somebody to prove me wrong (or suggest another way to install the DLL system-wide), but by the looks of things, my loader is of limited use.

In particular, I recall a situation where the game (Call Of Duty 4 Demo, I think) creates a non-overlapped window, which works fine for full-screen mode, but causes problems when you try force the device to bind as windowed. This will still be a problem unless the call to CreateWindow can be intercepted (and a well-formed window induced), which means that D3DLookingGlassLoader will struggle. Confirmation would be nice.

Here’s the source: D3DLookingGlass_Source.zip

Here’s the binary: D3DLookingGlass_Binary.zip

Here’s the small-print:

  • The DLL hooks CreateWindowExW and ShowWindow in its DLLMain. I think this is kosher in terms of loader-lock, but it’s obviously not too cool with regard to system stability. Especially if it’s being installed in every running process. If d3d9.dll isn’t found in the address-space then the hooks fall straight through, so that shouldn’t be too much of a problem. But if it is found then all attempts to create or show (or hide) a window will be overridden - possibly to the demise of the process if it’s doing anything but the basic behaviour. So in all cases, watch out, and make sure you aren’t running anything important in the background (in particular, I’ve noticed that it doesn’t play nice with Firefox).
  • The loader uses a system-wide hook, and you hate system-wide hooks. I trust that anybody who needs this tool has some degree of technical expertise and is aware of the stability concerns inherent in installing somebody else’s barely-tested system-wide hook.
  • This was harder to put together than I anticipated, and that’s probably evident from the slightly shabby code. Again, I intend for this only to be used for debugging purposes, so you’ll have to forgive me for the sub-production-quality code.
  • Despite the focus on Direct3D of this blog, I’m not really a gamer and I don’t actually have any commercial games installed on this machine. So I only got a chance to test this against my own programs. Obviously, there are several ways to skin the metaphorical Direct3D-initialisation cat, so please leave a comment when you find a game that this chokes on.

Run-time determination of VC++ virtual member function addresses: Take II

February 6th, 2008 Greg

I wrote about this tricky little problem a while ago and wasn’t too happy with the desperate methods that seemed necessary. Since then, I’ve been shown a much cleaner way to do the same thing, by manipulating the vTable manually. It seems that Microsoft haven’t changed their vTable implementation since Visual Studio 6 (at least) and so with a little modification, the following piece of inline-assembly will do the trick: no muss, no fuss.

__declspec(naked) void* ResolveVirtualFunction(IDirect3DDevice9* pDevice, ...) {
    __asm {
        mov eax, dword ptr ss:[esp+0x08]
        add eax, 0x8
        cmp byte ptr ds:[eax-1], 0xA0
        mov eax, dword ptr ds:[eax]
        je normal_index
        and eax, 0xFF
normal_index:
        mov ecx, eax
        mov eax, dword ptr ss:[esp+0x4]
        mov eax, dword ptr ds:[eax]
        mov eax, dword ptr ds:[eax+ecx]
        retn
    }
}
 
// ...
 
// The function should be invoked like this:
void* address_device_present = ResolveVirtualFunction(device, &IDirect3DDevice9::Present);

Thanks go to Vuurvlieg for this function. The beauty (or horror), here, is the use of a variadic parameter-list to overcome C++’s strong-typing that would otherwise make this operation very difficult. Obviously, this implementation will only work for objects of type IDirect3DDevice9, but the method extends to any other class by simply replacing the class name in the function declaration. Don’t be tempted to generalise this function to IUnknown or some other common base-class, as you’ll quickly run into problems with object-slicing. A final warning to those still using Visual C++ 6 (not that you deserve any help for such a crime): you’ll need to drop the ampersand from the second argument in the function call, as VC++6 handles function pointers slightly differently.

Direct3D 9 Hook v1.1

February 1st, 2008 Greg

By popular demand, I’ve updated the Direct3D 9 Hooking Sample to accommodate Windows Vista. The same binary should work on both Vista and XP. I’ve only tested it on Vista 64-bit, so it’d be nice to know if it works with Vista 32 or not. Other than this, most of the same caveats apply as last time.

Screenshot

A framework to take the tedium out of code-injection in C++

December 20th, 2007 Greg
Calculator Hook

I know I’ve been banging on about injection a lot recently, but I figured a good way to pinch off would be to present some code. After searching and failing, I took it upon myself to write a reusable C++ class to do most of the leg-work for Windows XP/2000/Vista32 DLL injection and hooking. The source is available on the project page.

The process of remote function hooking via a DLL is notoriously messy, so I’ve tried to encapsulate as much of the mess as possible into a C++ class. Here’s an example of some client code that injects a DLL into Windows Calculator, then installs two hooks (one by name and another by address):

// Create the injection object
DLLInjection injection("E:/Temp/HookDLL.dll");
 
// Find Calc.exe by its window
DWORD process_id = injection.GetProcessIDFromWindow("SciCalc", "Calculator");
 
// Inject the DLL
HMODULE remote_module = injection.InjectDLL(process_id);
 
// Hook a DLL function (User32!SetWindowTextW)
HDLLHOOK swtw_hook = injection.InstallDLLHook("C:/Windows/System32/User32.dll", "SetWindowTextW", "SetWindowTextHookW");
 
// Hook a function manually (Calc!0100F3CF)
HDLLHOOK manual_hook = injection.InstallCodeHook(reinterpret_cast <void*> (0x0100F3CF), "SomeOtherHook");
 
// Remove the hooks
injection.RemoveHook(swtw_hook);
injection.RemoveHook(manual_hook);

Testing has been limited so don’t be surprised to find bugs. If you do find any, please report them via email or comment.

Protecting your game against the hackers

December 3rd, 2007 Greg

How should I best spend my valuable time, money and man-power so as to keep those dastardly crackers at bay, and the money rolling in? There are so many commercial protection schemes out there; some cheap and some certainly not, but they all seem to be compromised… Is it really worth all that money? Or should I maybe set one of my most talented programmers on the task for a couple of weeks? At least that way, my security will be truly unique. All I know is that the more pirated copies of my game are made, the more of my revenue goes down the drain!

Stop! If this sounds familiar then I beg you to hear me out, even though you probably won’t like it. I’ve spent plenty of time reverse-engineering games in the company of the black-hats and I’ve seen the situation from both sides of the fence. Here’s your chance to learn from the mistakes of thousands before you and save some money in the process. At least, that goes for everybody to whom this next paragraph doesn’t apply.

If you’re writing a multiplayer client-server game and your interest is in keeping the hackers from ruining everyone else’s fun, then the solution is simple (although it needs to be understood from the beginning of development): Any code that can be taken advantage of must go on the server side. No matter how much time and intelligence you have, the hacker collective has more, and so sensitive code in the client application is a disaster waiting to happen. For example, if the client program is responsible for his own collision detection, then you are responsible when your paying customers complain about their enemies walking through walls. Once you’ve made sure that your design ensures the client can’t cheat, no matter what data it sends the server’s way, your job is done.

But for the rest of you, who are concerned about illicit copying, things are even simpler. Let’s suppose I am a software pirate, for a moment. Then here are some facts:

  • If I can run your program, then I can disassemble it. If I can disassemble it, then I can reverse-engineer it.
  • If your game isn’t worth hacking, I won’t try, and your efforts will go to waste.
  • If your game is worth hacking, then I will try to hack it, and given sufficient time and motivation I will succeed. At this point, your protection scheme is worth nothing. It only takes one successful hack, and the cat is out of the bag.
  • There are many hackers working on any popular commercial protection scheme at any moment, and once we’ve cracked one sample, the method extends with relative ease.
  • The more effective a protection is, the more popular it will become and the more attention it will get from us bad guys. Together with the last point, this places a limit on the value of any commercial protector.
  • It is very, very easy to slip up when writing such a system. All but the best-thought-out, most thoroughly-tested schemes (generally written by those with years of experience) boil down to a single conditional jump somewhere along the line. Once I find this Achilles’ heel, I win.

And here are some commonly upheld non-facts:

  • If somebody is prepared to steal your game, then they are prepared to buy it in the absence of a crack.
  • Protection schemes reliably keep a game safe from crackers long enough to ride out the majority of its public consumption.
  • Protection schemes reliably keep a game safe from crackers for at least a few days.

So both buying and home-brewing a protection are bad ideas? Not at all, they’re just not as good as they sound. A shop-bought PE-protector is an excellent investment if you deem its level of protection worth the value on the price-tag. But you’ll be investing in a false economy by fooling yourself into believing that this level of protection is anything more than ‘a week’s head-start from release’ or simply ‘keeping honest people honest’. And home-brewed protections are a very good idea, just keep them simple. If it’s going to be cracked anyway, then keep it as simple as possible. Why not restrict yourself to the five man-minutes it takes to check for the presence of your game’s CD?

  1. Enumerate the disc-drives on the system.
  2. Find the one that contains your game’s CD, by checking for a specifically-named file
  3. Check the file’s attributes, to make sure that it is read-only.

If something goes wrong at any point in this sequence, then you simply demand the user puts the CD in the drive, and loop. Now, even a half-witted hacker will circumvent this in minutes, but it’s enough to keep the layman in check, and that’s all you want.

Update: I hate CD-swapping as much as the next man, and so if you can bear it I’d recommend you avoid this too. The ideal solution is a server-based disposable serial database system, but most developers don’t have a server lying around to dedicate to this cause. If you do decide to use a CD-check, then at least be reasonable and disable it after a few days of successful use.

So now that you’ve saved yourself the hundreds or thousands of dollars you were about to give to Silicon Realms or Macrovision, you can afford to spend more on getting your game perfect, and that is what the consumer is happy to pay for. All that’s left is to price your game appropriately, and this means doing a little market research. Once we remove from the equation the user who pirates everything (the type you can do nothing about and should disregard), the situation is extremely simple:

If the user wants and can afford your game, then he’ll consider buying it.
If he has pirated it then he didn’t consider it worth paying for.

Run-time determination of VC++ 2005 virtual member function addresses

November 29th, 2007 Greg

I was recently somewhat surprised to find that there is really no C++ way to resolve a virtual function to its address at run-time. Admittedly, there is no good reason why anybody would morally need to do this, but when you’ve already lowered yourself to patching another process’s own code without consent, it seems like a very small crime.

Pioneers of such hackery have already established concrete methods for calling virtual functions from inline assembly, but these methods don’t quite stretch to getting the address in pointer form. So, if for no reason other than to convince you that it’s a lot of hassle, I present a miserable bit-chop hack to do just this.

Read More…

Drawing on another Direct3D program’s viewport

November 27th, 2007 Greg

Update: See the post for the new version.

The theme of the moment is DLL hooking, and so I thought I’d present an applied example. I already explained how Fraps works, and since I’ve recently been roped into writing a similar tool for a stranger, I thought I’d share the wealth. There isn’t much new material here, but people like examples with source code, so you can download the DLL source (C++) from the project page.

Bioshock Hook Screenshot

If you don’t know how to inject this DLL into a foreign process, then you’ll need to read my previous post or wait for the injection framework I’m working on. But once it’s injected call the Initialise method, via CreateRemoteThread or otherwise, to install the hooks. It works with any program that uses IDirect3DDevice9::Present (or IDirect3DSwapChain::Present) to render, which is probably all of the DirectX 9 games. Similarly, invoke Release to remove the hooks. The source is fairly self-explanatory, with a few exceptions.

  • It’s not safe for 64-bit consumption, though this should be obvious.
  • While there’s no reason it can’t be made to work with Unicode, I’ve written everything in ASCII, for simplicity.
  • By default, the DLL will increase its own reference count to prevent it being unloaded prior to termination of the host process. This is because there is a small risk of the DLL being unloaded by one thread, while a hooked function in another returns to the now dead memory. I figured that it’s best to waste a little bit of everybody’s memory than to crash unnecessarily.
  • The d3d9.dll function addresses (and prologues) are hard-coded, or at least their offsets are. While this may look very unprofessional and rather risky, I can assure you that it’s quite safe. The alternative would be to hack up some virtual-function tables and that’s a whole other story for a whole other post.
  • You may notice that the compiled DLL is dependent upon D3DX. This isn’t necessary for the hook itself, but I used ID3DXFont in my example for demonstrative purposes. The only reason I mention this is that there is no way to guarantee the existence of any D3DX DLLs on a DirectX 9 machine, and distributing them yourself is in violation of the DirectX Runtime EULA. So if you happen to need to distribute this code, you’ll either need to carry the huge runtime installer around, or avoid using D3DX altogether.

Update:

  • The soft-hooks used here will cause problems with PunkBuster if applied to any of its monitored functions. If you need to do this then you’ll have to be a bit cleverer.
  • The source assumes that the graphics device will never become invalid. If you suspect that this isn’t the case (which will be true for any full-screen game at a minimum) then you’ll need to add the appropriate sanity checks (see IDirect3DDevice9::TestCooperativeLevel) before attempting to render anything, lest you want to crash and burn.

Case study: Fraps

November 22nd, 2007 Greg

One of the topics that I often find myself bluffing through on GameDev is Direct3D hooking. In particular, how to display an overlay of your own on the window of another Direct3D program, often a commercial game. It’s pretty clear that the simplest method would involve somehow hooking the call to IDirect3DDevice8/9/10::Present, but the details are a little sketchy, particularly when you throw anti-hack systems into the mix. To be quite honest, I wasn’t sure I’d have been able to write a scalable hook that wouldn’t cause any incompatibilities - at least not without doing some very immoral things. So when I found out that Fraps has been doing exactly this for years, and that it somehow manages to avoid angering PunkBuster and other such systems, I decided to investigate.

What is Fraps?

Simply put, it’s a profiling and video-capture tool for PC games. As well as providing the ability to capture a video stream from any Direct3D 8/9/10, DirectDraw or OpenGL program, it can display real-time performance statistics (frame-rate and such) by means of an overlay on the game’s window (or full-screen display).

Remarkably, Fraps seems to handle the hacky side of all this automatically, with no fuss. It will dig its claws into any compatible game, whether it was started before or after Fraps. Yet if you investigate the state of DirectX and OpenGL’s DLLs on disk (in system32), they remain untouched at all times.

So how does it work?

Well it could be a whole lot worse, but I wasn’t thrilled to find out that every process running on my system had an instance of Fraps.dll loaded. It’s not so much the 106kB footprint that bothers me, but the performance and stability concerns. Anyway, my suspicions that this DLL was ‘infecting’ all processes via a system-wide hook were soon confirmed when OllyDbg caught Fraps.exe making a call to SetWindowsHookEx.

SetWindowsHookEx(WH_CBT, &FrapsProcCBT, hModuleFrapsDLL, 0);

So in one fell swoop, Fraps guarantees that Windows will load a copy of Fraps.dll into every process currently running, as well as those created in future. Moreover, the function FrapsProcCBT will be called by that process each time it attempts to create, destroy, show, hide, move or resize a window. Now this may seem like the perfect solution to a difficult problem, but it’s rather wasteful considering that most processes run window-driven interfaces, and only very few involve DirectX or OpenGL.

How should it work?

Now I haven’t tested this, but a much cleaner way to achieve the same goal would be for Fraps.exe to periodically poll EnumProcesses and EnumProcessModules, so as to determine the processes that actually need hooking. Installing the hooks into these specific processes would require no more work but would save the OS some effort and limit the worst-case-scenario disaster-zone to DirectX and OpenGL applications, which is a considerably smaller domain than almost everything. In Fraps’s defence, the code makes extensive use of IsBadReadPtr and suchlike, and I’ve never heard of it causing any trouble, but nevertheless, the best way to prevent your DLL from crashing someone else’s program is to make sure it never gets loaded.

What does the hook do?

All events but window activation and focus-acquisition (HCBT_ACTIVATE, HCBT_SETFOCUS) fall through the hook chain (CallNextHookEx). But in either of these cases, Fraps.dll goes on to look for a supported graphical interface.

Rather achronologically, the first thing it does at this point is a bunch of string-processing to capitalise and isolate the executable image’s file name. Presumably, the writers would have used GetProcessImageFileName, but bringing psapi.dll along to the party for this reason alone would be borderline-criminal.

Next, GetModuleHandleA is called on opengl32.dll, d3d8.dll, d3d9.dll, dxgi.dll and ddraw.dll. If all return NULL, then there is no work to do and the function returns. But if any of these modules are found, Fraps.dll gets straight to installing its function hooks. The hooks are simply JMP operations assembled ad-hoc at the beginning of IDirect3DDevice9::Release and Present (and presumably the equivalent functions belonging to the other APIs). Now, I was rather surprised to find that PunkBuster has no problem with such crude, unsubtle behaviour, but it’s possible that there is some agreement between the two developers.

That’s almost everything, but one problem remains. Nothing will be drawn to the screen unless d3d9!Present is called, and installation of the patch renders the original functions useless. It is for this reason IAT hooking is preferable to the patch method being used here, but from what I gather PunkBuster periodically ‘fixes’ the IAT of its client process, so that’s no-go. Fraps gets around this little inconvenience in the messy, but reliable way that you’d expect: each time the proxy Present function fires, it removes the patch, calls the original function, and restores the patch.

Here’s some untested C++ concept-code I threw together for the IDirect3DDevice9::Present case. The unbraced snippet installs a patch at address_d3d9_Present (use GetProcAddress) to redirect it to PresentHook. I’ve omitted the patch-removal code, along with a whole load of sanity-checking that really shouldn’t be left out in such a risky situation. Don’t use my laziness as an excuse.

// Calculate offset
DWORD from_int = reinterpret_cast <DWORD> (address_d3d9_Present);
DWORD to_int = reinterpret_cast <DWORD> (&PresentHook);
 
// This version of the JMP instruction takes an address relative
// to the current address, and it is 5 bytes long
// So the relative offset is 'to - from - 5'
// Don't worry about the unsigned DWORD underflowing
DWORD offset = to_int - from_int - 5;
 
// Assemble the patch at the beginning of Present
const unsigned char jmp = 0xE9; // The opcode for a 32-bit rel JMP
 
unsigned char* ip = reinterpret_cast <unsigned char*> (address_d3d9_Present);
*ip = jmp;
*(reinterpret_cast <DWORD*> (ip + 1)) = offset;
 
HRESULT __cdecl PresentHook(const RECT* pSourceRect, const RECT* pDestRect, HWND hDestWindowOverride, const RGNDATA* pDirtyRegion) {
    IDirect3dDevice9* device;
    __asm MOV device, ECX;
 
    // Do anything that needs to be done before Present gets called
 
    // Remove the Present hook
    HRESULT return_value = Present(pSourceRect, pDestRect, hDestWindowOverride, pDirtyRegion);
    // Reinstall the Present hook
 
    // Do anything that needs to be done after Present gets called
    return return_value;
}

RST decomposition of a general skew-free 3D transformation

November 18th, 2007 Greg

First of all, I refer you to D3DMatrixDecompose. If you want to break a standard 3D transformation matrix into its rotational, translational and scaling parts, without caring how it’s done, then look no further. If your needs are a little more specific and you’re sure you aren’t reinventing this wheel, then read on.

There is nothing clever about this decomposition, but it’s a question that comes up more often than I’d expect, so here’s the lowdown. I assume that your matrix is in DirectX-standard row-vector representation (so vout = vinM) and that the skew components are zero (M14 = M24 = M34 = 0). You should visualise the matrix like this:

RST Matrix

First, we extract the translation:

vtranslation = (M41, M42, M43)

Then the scaling factors:

sx = √(M112 + M212 + M312)
sy = √(M122 + M222 + M322)
sz = √(M132 + M232 + M332)

The rotation matrix is then the upper-left 3×3 minor, after scaling back to unity.

Mrotation = M;

// Remove translation
Mrotation41 = 0;
Mrotation42 = 0;
Mrotation43 = 0;

// Normalise
Mrotation11 /= sx;
Mrotation21 /= sx;
Mrotation31 /= sx;

Mrotation12 /= sy;
Mrotation22 /= sy;
Mrotation32 /= sy;

Mrotation13 /= sz;
Mrotation23 /= sz;
Mrotation33 /= sz;

LDR tone-mapping and how to do it properly

November 13th, 2007 Greg

I’m a huge fan of post-processing in games. It seems that no matter what I’m writing, I can’t resist the temptation to install an over-the-top bloom effect and some tone-mapping. And that’s me being conservative. The great thing about tone-mapping is that you can throw it on the end of just about any rendering pipeline and instantly glitz up the visuals, giving it that ‘digitally remastered’ feel.

Tone-Mapping

So what is tone-mapping? Well, it’s a post-processing effect that remaps the render’s colour-dynamics to change the overall appearance of the game. Some of the more common tone-mapping operations are contrast & brightness, saturation and HDR exposure. HDR tone-mapping is an art unto itself and it can only be used for HDR pipelines (with floating-point render-targets and HDR textures), so I’ll restrict the conversation to the more universal LDR tone-mapping.

The Conjugate Transform

If you plan on doing anything interesting in a tone-mapping pass, then it’s rather necessary, for the sakes of performance, readability and maintainability, to convert to a more suitable colour-space than RGB. The first such space that springs to mind is HSL, and indeed tone-mapping in HSL is like a gentle walk in the park, but it’s wise to look a little further afield to YCC. But why YCC? Sure, it does offer a luma component for brightness & contrast mapping, but the saturation is tied up in the two chroma components. Granted, this is a bad thing, but it’s not nearly as bad as the cost of a full RGB->HSL->RGB conversion.

The Problem With HSL

I spent a fair while trying to optimise this HSL-detour code in HLSL, hoping that I could make it viable for small shaders, but came out rather disappointed. Despite the availability of vector SIMD instructions, the piecewise-linear nature of the transformation demands a worryingly large number of conditional branches, and unless you have the luxury of Shader Model 4’s true branching, this amounts to a horror story of register-juggling and lerp operations. I didn’t try too hard, but believe it’s impossible to complete the transformation-and-back through HSL in under 100 ps_3_0 operations, which immediately rules out the possibility of assembling on a Shader Model 2 target platform.

YCC To The Rescue

Contrast this with the simplicity of the truly linear RGB->YCC->RGB transformation. If there’s one thing that the GPU does best, it’s vector-matrix multiplication, and that’s exactly what this boils down to:

float4x4 RGBToYCC = 
{ 0.299,  0.587,  0.114,  0.000,
  0.701, -0.587, -0.114,  0.000,
 -0.299, -0.587,  0.886,  0.000,
  0.000,  0.000,  0.000,  1.000};
 
float4x4 YCCToRGB = 
{ 1.000,  1.000,  0.000,  0.000,
  1.000, -0.509, -0.194,  0.000,
  1.000,  0.000,  1.000,  0.000,
  0.000,  0.000,  0.000,  1.000};
 
float4 PS_LDRToneMap(float4 tex_coord : TEXCOORD) : COLOR
{
    float4 RGBA = tex2D(linear_sampler, tex_coord);
    float4 YCCA = mul(RGBToYCC, RGBA);
 
    // Work goes here
 
    RGBA = mul(YCCToRGB, YCCA);
    return saturate(RGBA);
}

This assembles to a handsome 9 instructions, leaving plenty of room even with the arcane ps_1_4’s instruction limit.

The Prize

My current project makes use of this code to ramp the contrast and saturation up and down, according to the scene. The code is simple, and the results rather dramatic.

// Contrast
YCCA.x -= contrast_midpoint;
YCCA.x *= contrast_gain;
YCCA.x += contrast_midpoint;
 
// Chroma
YCCA.y -= chroma_red_midpoint;
YCCA.y *= chroma_red_gain;
YCCA.y += chroma_red_midpoint;
 
YCCA.z -= chroma_blue_midpoint;
YCCA.z *= chroma_blue_gain;
YCCA.z += chroma_blue_midpoint;

LDR Tone-Mapping