Slugsnack’s Reversing Series [5]


Today we will be using the ReverseMe that we looked at in part 4. However, we will be looking at a few new techniques. By the end of this series, you will see how to inline patch an application to make it keygen itself rather than having to code an external application to do the job for you.

Some Background Information

I will begin by talking about endians, something I should’ve told you about long ago. Endianness is the ordering of bytes used to represent data. As humans, when we represent numbers, we put the highest magnitude on the left and then progress smaller by magnitudes of ten (I’m referring to the decimal number system). So the number 3578 starts with the largest unit, thousands (3 of those), and then hundreds (5 of those), to tens (7) and finally the units (8). This is known as big-endian because the highest order magnitude is on the far left progressively getting smaller.

There are three types of endianness, namely big, middle and little. The x86 family uses little endians when representing integer values in memory. This means, the far left starts with the smallest magnitude and gets higher. If we had the value ABCD1234 in memory, the bytes would be stored with the smallest first so in memory, we would see 3412CDAB.

The second thing I’d like to discuss is the calling and use of APIs. APIs are essentially the way that applications can interact with the kernel. Your operating system allows different levels of access to different resources. The protection rings are able to represent this hierarchial relationship. This diagram shows the available privilege rings for the x86 family:

As you can see, Windows applications run in Ring3. This is the highest level so for these applications to interact with the kernel, they must somehow gain Ring0 access. Windows contains functions allowing applications to request specific operations from the kernel. These functions are known as the Windows Application Programming Interfaces.

When a program needs to access a low level function, it will call the relevant API. These APIs are for the most stored in your Windows system files. When an application starts, there is a list of all the functions called by that application from the operating system’s DLLs (Dynamic Link Libraries). This list of functions is known as the imports. You probably already figured this out from the name, but the addresses of functions inside a DLL are dynamic, so are prone to change.

So how does our program know where to look to find the function it is calling ? Well, every win32 application has an Import Address Table (IAT). This IAT contains the current address of each function so before the application even starts, the OS loader finds each address of each API that is to be called and builds the IAT with those addresses. Therefore at runtime, all the program has to do is look at the IAT and find the correct address for the function it is trying to call.

Now, let’s have a look at the exact procedure when an API is called. Opening up our target in OllyDbg:

So the first API that is called is GetModuleHandleA which is located in kernel32.dll. Looking at the exact instruction:

It looks just like a usual call so how did Olly know we were calling an API ? Well let’s step into the call:

What’s up with this funny table ? This table is known as the jump thunk table. It contains the list of all the imports. More importantly, it eventually gives information as to where each of the imports is located. Looking at the exact instruction at this point:

So from there it jumps to the value of a dword pointer, in this case located at 402000. Following this in the dump window:

I have highlighted the bytes I want you to look at:

This part is the import address table I mentioned earlier. It holds all the addresses of the APIs our application can make calls to. Going back to what I said about endians earlier, notice this:

The hex dump of the bytes at the instruction at 4012CC is FF25 00204000. Notice the little endians that make the 402000 be stored as 00204000.

So looking up the current address of GetModuleHandleA from our IAT, we see 7B424576. The little endians mean this is actually 7645427B. So at this instance, GetModuleHandleA’s function is at 7645427B or kernel32.dll. Indeed:

Now let me point out a few other things about the IAT. Notice that in the current application, we call functions from two DLLs, user32.dll (4) and kernel32.dll (2). These are the addresses of the functions from kernel32:

And these are the addresses of the functions from user32:

Can you see how the addresses from two different DLLs are separated by a dword of zeroes ? After the user32.dll function addresses, there are no more addresses so the next dword of zeroes represents the termination of the IAT.

To conclude, when an API is called, it will first jump to its relevant instruction at the jump thunk table. This contains a dword pointer to the place at the IAT holding the API’s address in its respective DLL.

You may have noticed I haven’t gone into much detail as to what happens for the OS loader to construct the IAT. I don’t think you need to know that at this stage and if I included that, this “part” would turn even longer then it’s already going to be 🙂

So that’s it for the background information for this part. Let’s put some of that theory into use now.

If you can still remember from the last part of this series, the ReverseMe we looked at worked by taking an 8 character long name, performing addition or subtractions on the ASCII values of each character and comparing the result to the password supplied. Only if there was no difference would it take us to the good message. Maybe you even recall last time I wrote a keygen for this and included it as a separate file. Today, I will teach you how to patch the application to keygen itself.

These are the aims for today:
– When the username length is not eight, we will display a message to notify the user of this
– When the password length is not eight, we will display a message to notify the user of this
– If the password is wrong, we will generate the correct password within the application
– If the password and username are valid, we will proceed as the application was originally coded to

We will be going through each of these aims in order. So starting with the first two, let’s look at the relevant section of the code. I won’t go into too much detail since we already analysed this ReverseMe in the last part of this series. This is the section checking whether the username and password are in fact 8 characters long:

So in the case that the username and password inputted is not 8 characters long, I’d like to display a message box to tell the user this. So what we are looking for is a conditional jump to a call to MessageBoxA. However, can you see there’s no room at all to call an API here ?

To call this API after a conditional jump, we will require an empty space somewhere in the program to inject our own code. This empty space is called a codecave. So we will find a codecave, jump to the codecave, execute our procedure and jump back again. Simple enough ?

There is however, one small problem. This is the short and long jump. You may have realised by now that different instructions require a different number of bytes to represent them. Anyway let’s look at what happens and I’ll explain the result. So we first find a codecave.

Scrolling down, I can see a bunch of unused addresses after the jump thunk table, those should do fine:

Now we want our function to jump here and we’ll handle everything from there and after we are done, we can return code execution to the original code. Our codecave address is therefore 4012D2 and is the address from which we will being “injecting” our code.

Okay, you can see that there is already a conditional jump at 401095. This conditional jump is taken if the length of the password is not 8. What we want to do is to display a message box to tell the user what is wrong instead of the deliberately useless “Wrong !” message box.

Perhaps you’re thinking right now, why not just patch that conditional jump to jump to the codecave instead ? Well okay, let’s try that first then:

Olly tells us that we need a long jump because the address we’re trying to assemble a jump to is too far away. So let’s just do that:

Assembled fine and everything but did you notice something change ?

What’s happened to the OR and JE instruction ? Remember when we changed the short jump to a long jump ? Well the long jump instruction is more bytes than a short jump instruction. As you can see, we changed the bytes from 75 0D to 0F85 37020000. The instructions size has tripled in size ! There are quite a few ways we can resolve this. We will be visiting a few of these different methods during the course of this guide.

The option we will take now is to rewrite the overwritten bytes to the codecave. But I think it’d be a better idea to jump from the actual compare instead. Reason ? We can keep the whole compare procedure together in one chunk which makes it easier for us to see what’s happening. So press Alt-Backspace to undo the changes we made:

And now overwrite the instruction at 401092 with an unconditional jump to the codecave, noting what bytes are overwritten:

The instructions we overwrote were:

 

Código:
CMP EAX,8
JNZ SHORT g4.004010A4

Therefore at our codecave, we will have to replace these instructions with either a copy of the original or an edited version.

We don’t want an exact copy this time round though. We want to compare EAX to 8, and if there is a difference, we want a message box to popup to tell the user their password is of the wrong length.

Therefore, we’d want code similar to this:

 

Código:
CMP EAX,8
JE XXXXXXXX

PUSH 0
PUSH YYYYYYYY
PUSH YYYYYYYY
PUSH 0
CALL MessageBoxA

JMP 4010A4

XXXXXXXX:

PUSH 401097
RETN

What this code does is compare EAX (characters in password) to 8. If it is 8, then we go to address XXXXXXXX where we return to 401097 (after our inline patch). If it is not 8, (ie. EAX != 8), then we execute the message box function and then jump to the instruction leading to “Wrong !” (4010A4). Now I suspect there are 2 things here we haven’t come across before.

The first thing is the call to the message box API. When invoking the API MessageBoxA, we need to follow this format:

 

Código:
PUSH X		//	The style of the message box.  PUSH 0 invokes a message box with just an OK button.
PUSH [TITLE]		//	[TITLE] = Address of the text with the title.  We will deal with this in a minute.
PUSH [TEXT]		//	[TEXT] = Address of the text for the message box.  In this case, we will make the title and message box content the same.
PUSH Y			//	The handle of the owner window, in this case 0.
CALL MessageBoxA	//	The call to the API itself.

Note that this application already contains an import of MessageBoxA, which is the reason I mentioned all that stuff about imports earlier. Were we to use an API that was not in the list of imports, we would have to add it and that complicates things a little.

So first, let’s write something for the content and title. Find another codecave (I just used another section of the code a little further down) and write the text you want:

Ctrl+E to edit:

Ctrl+A to analyse:

The address I chose to write our code in is 401A20, so we will need to remember that for later.

What was the second thing I thought you might not have come across ?

 

Código:
PUSH 401097
RETN

I’ve deliberately used this new method instead of just a simple JMP instruction so you can see another method. First you push the address you want to jump to onto the stack and the return instruction takes you to that address. This method is sometimes used to make it harder to trace back where the last instruction was.

Now let’s scroll back up and assemble the inline patch:

Do you notice on the instruction at 4012D5, it would jump back to 4012D2 ? Why did I do that ? Well at the time of writing that instruction, I didn’t know where the instructions for the return jump would be. Now that I can see it is at 4012EA, I can change that:

Let’s now quickly do a very similar inline patch for the length of the name. I won’t go into much detail since it’s basically the same.

New codecave (401316):

Patching the jump:

Writing the inline:

Testing:

Saved under a different filename. Executing the file now:

To test it, I will enter a username of 8 characters but a password of only 4:

Great ! Now to test the other inline:

Hurray ! So that’s our first 2 “aims” done and also maybe your first two inline patches 🙂

We still have to do:

– If the password is wrong, we will generate the correct password within the application
– If the password and username are valid, we will proceed as the application was originally coded to

So let’s look at this bit by bit. To fulfil our third aim, we will need to do the following:
– Check the username and password are 8 characters long (done)
– If they are then call the serial generating function (401175 if you remember from last time)
– If at any point, it is found that the password is not valid for the username given, then generate the correct serial and notify the user of this serial

Opening up our patched executable in OllyDbg, we can see the password checking function starts at 401175:

Just to refresh your memory a little, this is what happens in this function:
– Each byte of the username is taken and a different addition/subtraction calculation performed on it
– If the resulting byte is different to the corresponding byte in the password then we jump to the “Wrong !” message

So actually there is already an easy way forward in terms of inline patching a correct password generation function. All we’d have to do is to change each conditional jump after the compares so it jumps to a codecave with our function in it instead.

Looking at the function at the moment, there are 8 conditional jumps (because of the 8 characters in the username/password):

Scrolling down again to find a suitable codecave, I have chosen 401361:

Let’s look at the first compare/conditional jump:

So after the calculation, the first byte of the username/password are compared. If they are the same then we continue. Otherwise, we jump to 401245 which eventually leads to the “Wrong !” message.

What we want to do is to jump to our codecave if something’s wrong first though, so we can fix that jump easily enough:

And we can do that fine for the second conditional jump too:

Now let’s try this for the third jump:

Uh oh. Same problem as before. Last time, I gave you a solution to this. That was to copy the overwritten bits into our codecave where they’d be executed there. This is not very practical in this case though because different bytes are overwritten for each different short conditional jump.

First of all, let’s get something clear:
– Short jumps can jump are used if the target is within -128 or +128
– Long jumps can be used if the target is out of range of -128 to +128

What has happened at the moment is that there was initially a short jump because the instructions leading to the bad message came within short jump range at the third conditional jump. Remember I said I’d show you a different way to get past this ?

Instead of making it to our codecave in one jump, we can piggyback onto the first conditional jump we patched. That way we still get to codecave but it just takes a little longer. So the address of our first patch was 401196, so let’s jump there instead:

Can you see that we eventually still make it to our codecave ? No instructions were overwritten either.

Now let’s do the same thing for the other 5 conditional jumps:

As you can see it worked perfectly up until the last two tries. Why ? Well in the 7th try, we’re trying to jump from 401222 to 401196. The difference is 0x8C or 140d which is more than 128d backwards.

Then in the final jump we need to patch, we want to jump from 401238 to 401196. The difference is 0xA2 or 162d, again exceeding our maximum distance to use a short jump.

You should be able to solve this problem easily though. Simply use another “stepping stone” to 401196. Let’s make this stepping stone the second patched jump, 4011B0.

We will need a different “stepping stone” for the final patch because 401238 (address of the last conditional jump) to our stepping stone (401196) is 0xA2 or 162d which exceeds our maximum distance. It will be fine to jump via the third patched jump at 4011CA. 401238 – 4011CA = 0x6E = 110d.

Now that we’ve patched all the relevant jumps in, we need to start assembling a generation code into our codecave at the bottom (401361).

Let’s first look at what calculations are done when generating the valid password. I’ve opened up the source for the keygen I created in the last part of this series:

Admittedly, poorly written in high level assembly but that’s not what we’re looking at. We’re looking at the calculations done.

This is the bit you need to pay attention to:

Those integers are in decimal so we will need to convert them to hex to be able to make our own generation code. Notice also how the format of the instructions is slightly different to what you’re used to. Instead of:
– OPCODE(DESTINATION,SOURCE)
HLA uses:
– OPCODE(SOURCE,DESTINATION)

Byte1, Byte2, etc. are the bytes of the username, which are held at ESI. So converting gives us:

Código:
ADD BYTE PTR DS:[ESI],29
SUB BYTE PTR DS:[ESI+1],4
SUB BYTE PTR DS:[ESI+2],4
ADD BYTE PTR DS:[ESI+3],32
SUB BYTE PTR DS:[ESI+4],32
ADD BYTE PTR DS:[ESI+5],3
ADD BYTE PTR DS:[ESI+6],4
ADD BYTE PTR DS:[ESI+7],29

At our codecave, we want a function that calculates the correct password (with the calculations shown above) and displays the correct result to the user with a message box.

Let’s look for a possible space in memory for us to store our password whilst we’re generating it. I’ve chosen 4030C0:

So a possible injection could be:

Código:
MOV AH,BYTE PTR DS:[ESI]
ADD AH,29
MOV [4030C0],AH
MOV AH,BYTE PTR DS:[ESI+1]
SUB AH,4
MOV [4030C1],AH
MOV AH,BYTE PTR DS:[ESI+2]
SUB AH,4
MOV [4030C2],AH
MOV AH,BYTE PTR DS:[ESI+3]
ADD AH,32
MOV [4030C3],AH
MOV AH,BYTE PTR DS:[ESI+4]
SUB AH,32
MOV [4030C4],AH
MOV AH,BYTE PTR DS:[ESI+5]
ADD AH,3
MOV [4030C5],AH
MOV AH,BYTE PTR DS:[ESI+6]
ADD AH,4
MOV [4030C6],AH
MOV AH,BYTE PTR DS:[ESI+7]
ADD AH,29
MOV [4030C7],AH
PUSH 0
PUSH [TITLE]	//	We will deal with this shortly
PUSH 4030C0
PUSH 0
CALL MessageBoxA
MOV EAX,0	//	Make sure the compare after returning from the call takes us to the "Wrong !" message
PUSH 401245	//	Address for code leading to "Wrong !" message
RETN		//	Using an alternative method to JMP

Assembling a big script like this would take ages writing out one at a time. This is why we will be using NonaWrite, a plugin made for OllyDbg. I have included this in the attachments.

And how do we use NonaWrite ? Pretty simply actually. All you have to do is tell it which address to write to and then write what you want to inject:

I still haven’t filled in an address for [TITLE] but instead have used PUSH 4030C0. I will be changing this afterwards. The reason I haven’t filled it in is because I like putting all the patches including text injections close together. At the moment I have no idea what address the current injection will finish at without counting all the bytes.

Click Assemble:

And now I add the title underneath:

Changing the stack push for the title:

Ctrl-A to re-analyse:

Looks fine so far. Time to test and run !

I trust you know how to do this by now..

And as we know, that is the endpoint 🙂

I have attached to this thread the original ReverseMe, the two different stages at which we saved a new executable at and also the plugin I used to do the inline patching.

When I first learned inline patching, I went crazy with it cause I loved it so much. I was invoking every single API I could think of, trying to squeeze as much code as I could into tiny codecaves :p Hopefully you find it as exciting as I did and even if you did know about this before, maybe this was a nice refresher.

If you’ve made it here after reading through all of the above, thanks !

Download file: http://www.ziddu.com/download/3550059/ReversingTarget.rar.html

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.