Slugsnack’s Reversing Series [6]


Today we will be using inline patching to bruteforce a serial. There are slight complications that we will learn to overcome such as limited section sizes leading to inadaquete space for our algorithm 🙂

Once again, this reverseme was downloaded at http://www.osix.net/modules/geek/reverse.php, a challenge site.

Attached to this thread is the target we are reversing today (g5.exe) and also the fully patched executable that makes it bruteforce itself (g5-Final Patch.exe).

Some Background Information

The PE File Format

PEs or Portable Executables is the file format developed by Microsoft for executables, object code and DLLs. These executables are portable across all win32 systems. That is, the PE loader of every win32 platform recognises and is able to use this file format.

I will be brief and will be leaving out parts that are less relevant to our purpose so it would be useful for you to do some reading up on PEs yourself. It is a vast subject and I will only skim over its surface.

The format consists of an MS-DOS MZ header, DOS stub, PE signature, PE file header, PE optional header (not actually optional as such), section headers and the respective section bodies. This diagram taken from Iczelion’s tutorial accurately displays this (http://win32assembly.online.fr/pe-tut1.html):

I will go through each of these parts elaborating more on the more relevant details.

MS-DOS MZ header and DOS stub

This component in the PE file format is a structure present for when an attempt is made to load the executable on earlier version of Windows. The operating system will be able to read the file and realise it is not compatible and will often display a message such as this “This program cannot be run in DOS mode.” Running our executable in Olly and looking at the memory window (Alt-M):

So our PE header is at 00400000 and the size of it is 1000:

Double-click on the line with this information and it shows us a dump of that section:

Notice the first two bytes are always 4D 5A (MZ in ASCII). Olly is also very helpful with showing us some of the other information displayed there but I will not look at those today. Following the address of our PE header in the hex dump:

There is the string I mentioned earlier. When a PE is run in MS-DOS, the DOS stub is loaded. It often will simply display a string to notify the user a higher version of Windows is needed, or less commonly, an entire DOS program can be stored and executed from there. In our case, just the message is present.

So now we know the first 2 bytes of the PE header are always 4D 5A. This header is always 64 bytes and the last dword is the PE file header pointer. I have highlighted the last dword of the PE header (note endianness ;)):

And the corresponding bytes in the memory window:

This points to the beginning of the PE file header. So a quick sum up, the PE loader examines the DOS MZ header for the offset to the PE header and follows the pointer.

PE File Header and Signature

This tells us the PE signature is at an offset of C0 from the image base (400000 in our case). Looking up 004000C0 in the hex dump:

As you can see the signature is merely a dword with the term “PE” terminated by two zeroes.

Maybe it is time to explain a little about RVA and VA, relative virtual address and virtual address respectively. All offsets within a PE file are denoted by RVAs. An RVA is the offset from the base address at which the executable is loaded in memory. So, the relative virtual address is the number of bytes a certain object is at in memory from the image base. Therefore:

Image Base Address + RVA = VA

So if our image base address was 0x00400000 and an object was said to be at RVA 0x1000 then that object could be said to be at VA 0x00401000.

Now we are at the PE header:

The PE header contains fields that are required by the PE loader such as information about the physical layout and properties of the PE file in general. Notice the addresses are not linear/absolute but are relative to the image base (RVA).

The PE header is usually located from [image base] to [image base+1000]. Not all the fields in there are relative to what we are doing today so I’ll explain them as we come across them/need them later.

The optional header which contains more information on logical layouts within the PE file ends after this, immediately preceding the section table.

Sections

The sections are one of the most important parts of the PE for you as a reverse engineer so pay close attention.

The section table lies between the PE header and the raw data for the image’s sections. Each section has its own corresponding section header. The order of these sections in the table is decided by their RVA. So sections closer to the image base come first in the table. Each section header is 40 bytes with no padding between the header for the next section.

So we have now determined each structure in the section table contains the information of a section:

Not all the fields are of much importance to us. I’ll highlight the ones that are:

– Name = Pretty self explanatory. By default, a lot of compilers use section names preceded by a period. This is not necessary though.
– VirtualAddress = The RVA of the section. The PE loader reads this value and uses it when mapping each section into memory.
– SizeOfRawData = The size of the section rounded up to the next multiple of file alignment. This file alignment is commonly 200h. The size can be found in the optional header, in our case this is at VA 4000FC.
– PointerToRawData = The file offset of the start of the section. The PE loader uses this value to find where the data in the section is in the file (on disk).
– Characteristics = This contains flags for characteristics of that section. Each of these flags have a value. The sum of all the flags that are set is written here, more on this later.

The sections themselves also is something I will cover later. But what actually is a section ? The content of a PE file is divided into blocks, called sections. A section is a block of data with common attributes. These attributes can be code/data, read/write, these are the flags/characteristics I mentioned above. The data is grouped into sections dependent on their common attributes. As you already know, each of these sections has a header. The section itself is refered to as the body (raw data).

Once each section is mapped into memory, it will have its own page.

The Memory Stack

The second thing I want to talk about today is the memory stack. Before, you may have just thought of it as a sort of place where values were pushed and popped and sometimes work done on them, for example the arguments of an API call.

I will explain a little about EBP and ESP and how they relate to the stack. As always, I will not go into great detail so it would be very beneficial if you went and read up on some of this stuff yourself.

The stack is a writeable area of memory that has a dynamic size and gets bigger and smaller to accomodate its contents. In a process, a lot of the work is done in the stack so it’s important to have an appreciate of how it works.

I mentioned some time ago about the way a process interacts with the stack. When the process pushes data onto the stack, the next data to be pushed is placed on top of the first one. This second data is the first one to be popped back off too. This is commonly known as “last in, first out” (LIFO).

So how does the program keep track of where everything is on the stack ? The answer is in the two registers ESP and EBP. ESP always points to the top of the stack and EBP holds the memory address of the bottom of the stack, the base point in the stack. That is why you will often see stack data referenced by [EBP+*] or [EBP-*].

When something is pushed onto the stack, ESP increases by the number of bytes pushed. When something is popped back off the stack, ESP shrinks with the amount of bytes popped. However, this also works the other way round. If we decreased ESP, the stack would decrease by that number of bytes. For example, the instruction:

SUB ESP,8

This instruction would increase the stack’s size by 8 bytes/a dword. Decrementing ESP increments the size of the stack because the base of the stack is actually at a high memory address compared to the top of the stack.

Back to the base pointer, you may have often seen an application start with these instructions:

Código:
PUSH EBP
MOV EBP,ESP

What does this actually do ? It saves the base pointer onto the stack then sets the base pointer as the current stack pointer (address of top of stack) so it will change the address of the base of the stack. This means the program now has a new reference point for data on the stack.

Enough of the theory ! This probably seems a little confusing at first but you will need to learn a lot of this for stuff like unpacking later on anyway. As you start applying it, hopefully everything will start falling into place 🙂

Here is the description given by the site for this ReverseMe:

A little bruteforcing exercise.

There is more than 1 answer so the md5 of the right password is:

96c4dda0c4a0b34262b1d91d47056f9e

So they are telling us they want us to bruteforce the correct serial, and also that there is more than one valid serial. And to check whether our’s is right, we should put it through a MD5 (Message Digest 5) hash. In case you’re unfamiliar with MD5 hashing, simply think of it as a function where you can supply an input, and the output of the function is a 128 bit (32 character long) hash value.

Let us first examine the target. Opening it up in Olly:

When we run the executable:

Let’s enter a garbage serial and follow how it is processed:

Intercepting this procedure in Olly, I decided to breakpoint on GetDlgItemTextA as this is when the contents of the dialogue are fetched:

We break when we click check again. Let’s do a little tracing to see what is happening:

EAX holds the number of characters fetched. This value is compared to 6. If it is equal, we jump to 401076, otherwise 0 is moved to EAX and we jump to 401083.

Let’s follow the latter first. 401083 is a compare of EAX to 1. However our serial was not 6 characters long so 0 was moved into EAX. There is then a JNZ (jumps when EAX does not equal 1) to 40109D and that is a call to a “Wrong !” message box.

So already we know that our serial needs to be 6 characters long. Now let’s look at what happens when it is indeed 6 characters long.

At our compare, we jump to 401076 instead. There, it pushes EAX onto the stack and loads EAX with the value at EBP-1FE (serial inputted):

In case you’re wondering how my garbage was able to get to that bit, I changed Z-flag :p

So after the serial is loaded to EAX, that is pushed onto the stack and then 4010D6 is called. After the call, EAX is compared to 1. If EAX is equal to 1, then we continue to the “Right !” message box.

So now we know the call at 40107E to 4010D6 holds the important serial checking algorithm we know where to approach this program. Stepping into the call:

You should be familiar with this kinda structure now where there is some sort of serial checking algorithm which then sets EAX to 1 or 0 if the serial is valid or not respectively so we won’t “re-invent the wheel”.

Let’s start stepping the code and see what some of the operands are:

So [EBP+8] is the serial which is moved to ESI. Then at 4010E2, we jump into the first serial check. As you can see, this is a loop from 4010E4 to 4010F3, again using ECX as a counter up to [EBP+C] which is:

However, if you trace back on what has changed in the stack, you will see that this is actually the number of characters fetched by GetDlgItemTextA, so it should actually be 9. Notice at the start of this procedure, EBP, the base pointer was reset to ESP again:

Anyway, during this first serial check, EDI is set to some value and then compared to 2F2. If they are not equal, we jump to 40111D:

There, 0 is moved to EAX, you should realise this is pretty bad. Let’s change ZF anyway and have a look to see what happens next:

Aha, another check:

Same situation, if the serial does not satisfy the criteria given, we jump to DialogBoxParamA. Obviously, this is not what we want to do so let’s toggle ZF once more:

You may be wondering why I’m not even bothering to look at the check algorithms. The reason is, if you look at each of those algorithms, it’s pretty clear that it’s not just a case of reversing the instructions to get the correct input, hence the objective being bruteforce.

Back to the app, we see there is a final check:

No need to explain this really, it’s clear we don’t want to take that jump and we want to go to 401110 instead. Perhaps you’re thinking, why not just negate those jumps or NOP those checks and go straight to the “MOV EAX,1”. Simple, we were asked to bruteforce a serial. If we just patch the application, there is no way of finding a working serial.

So I am thinking now of a few ways of bruteforcing this application. I could simply write an external application to do it for me and just copy the check algorithms over. But I’m not here to teach you how to program so I’ll show you how it can be done with inline patching instead.

So how would we go about bruteforcing in the first place ? Typically, we would want to test every single string from the bytes “00 00 00 00 00 00” to “FF FF FF FF FF FF”. However, a lot of those bytes can actually be discarded since even if they are a working serial, it is unlikely they would be the one we are looking for since they aren’t in the “typable” part of the charset.

To explain this, I went to http://www.asciitable.com/:

For example, one of the characters of the serial would definitely not be 0x9, horizontal tab. I can safely say the serial is somewhere from 0x21 to 0x7E inclusive. This is not the same as the bytes “20 20 20 20 20 20” to “7E 7E 7E 7E 7E 7E”. Why ? Between those bytes, there are values such as “20 20 20 20 00 20”. We have already decided we don’t even want to bother wasting clock cycles on bytes from 0x00-0x19 and 0x7F+.

The way I am about to propose is by no means the best or optimal way of dealing with this problem but it is a simple solution that I devised hoping it would be easy to understand.

I intend to treat each of the 6 bytes of the serial as separate numbers. We want to try every combination of these bytes where each bytes is 0x20-0x7E.

Let us first look at what happens when we count in decimal. First, in the units column, we increment until we reach 9. What happens then ? The tens column is incremented and the units column set back to 0 and then incremented to 9 again. This continues until the tens column reaches a value of 9. The hundreds column then is incremented and the tens and units column reset.. and so on.

Translating that into our context, we want to do this:
– Set all bytes of our serial to “20 20 20 20 20 20”
– Increment the last byte from 0x20-0x7E
– When the last byte reaches 0x7E, reset its value back to 0x20 and increment the second from last byte by 1
– when the second from last byte reaches 0x7E, reset its value and the last byte’s value back to 0x20 and increment the third from last byte by 1
– Repeat for every byte

However, we will encounter a small problem trying to write an inline patch as big as this. As usual, we would patch the free bytes after the JMP thunk table. However, we’d soon encounter a problem. Let’s look up our application in LordPE (google it for a download, it’s freeware):

Click on “PE Editor” and select the location of your executable or if it is open/loaded in Olly, select it from the process list:

Click “Sections”:

So the section we have been looking at so far has been the .text section. So all seems fine, its virtual size is 14C which rounded to the nearest multiple of the alignment (200h) is 200h. But hang on, what was the address of the last item on the JMP thunk table ?

It is 401146 and we start having free bytes from 40114C. So we are able to write (200-14C) B4 bytes. That isn’t really enough for a big patch like the one I am proposing.

Fortunately there are two solutions:
1) Extend the .text section
2) Add a new section to the PE and we can allocate/use as much memory as we like

The first solution is much harder than it sounds. I won’t go into the details but try and fiddle with it yourself if you like. The easiest option by far is method 2. To add this new section, I used CFF Explorer (easily googled).

So after you’ve downloaded and installed CFF Explorer, load the executable in it and you should start with this:

The section we are interested in is section headers although feel free to poke around in the others too:

Adding the new section is pretty straight-forward. Right-click the space below the sections:

512 bytes is enough by far (remember alignment):

We still need to fill in a name (although actually a blank name still works) and we need to fix the characteristics column (which you have to scroll over to see):

I decided to name my section .inline although the “.” is not necessary:

Now the characteristics. I told you about those briefly before. CFF Explorer presents an easy solution to looking up each of the flags’ values and adding them up 🙂

Instead, it gives you little boxes to tick. Let’s look at the .text section first and look at its characteristics. Right-click on that section’s line and click “Change Section Flags”:

So .text is executable, readable and contains executable code. On our new section, I used the following flags:

One more thing though before we save, add the writable attribute to .text, I’ll explain why later. It should end up like this, the same as .inline:

Save the new executable somewhere under a new name, take care to put the file extension .exe since CFF Explorer doesn’t always save it as that format.

Let’s first think of an algorithm for the incrementing part. So what we want to do is to increment each byte then check whether (starting from the last byte although technically it doesn’t matter) its value exceeds the 0x7E and if so, set it back to 0x20 and increment the next byte up by one.

This is a possible algorithm to do just that. I wrote it myself more for understandable reading than optimising clock cycles:

Código:
PUSH EDX			; I will use EDX as a "middleman" since it is an unused register in the serial checking procedure
MOV EDX,0x20202020
MOV DWORD PTR DS:[ESI],EDX	; Changes first 4 bytes of serial to 0x20202020
MOV BYTE PTR DS:[ESI+4],DL	; Changes 5th byte to 0x20
MOV BYTE PTR DS:[ESI+5],DL	; Changes last (6th) byte to 0x20

I want this code to only be executed once, the first time we jump into my procedure. There are several ways of doing this. The way I thought of was to find an empty byte somewhere, compare it to zero then jump if equal to this once-only code. Then in this once-only code, I would have an instruction to move a non-zero value to this byte I compare to. Now you know why I made the .text section writable now. So this is what I ended up with:

Código:
CMP BYTE PTR DS:[401182],0	; Compares the free byte to 0
JNZ MYPROC				; If not zero, jump to label MYPROC

PUSH EDX			; I will use EDX as a "middleman" since it is an unused register in the serial checking procedure
MOV EDX,0x20202020		; Set EDX to 0x20202020
MOV DWORD PTR DS:[ESI],EDX	; Changes first 4 bytes of serial to 0x20202020
MOV BYTE PTR DS:[ESI+4],DL	; Changes 5th byte to 0x20
MOV BYTE PTR DS:[ESI+5],DL	; Changes last (6th) byte to 0x20
XOR ECX,ECX			; Reset counter
XOR EDI,EDI			; Reset EDI
MOV BYTE PTR DS:[401182],0x1	; Set 0x1 to the free byte
JMP FIRST_CHECK			; This label is the first serial checking algorithm, in this case it is at 4010F0

So now I need some code for the MYPROC label. That is, what gets executed every other time than the first. This means I won’t continually push EDX and mess up the stack and I can change the string at [ESI] instead of just moving a static value to it.

Remember 4010F0 is the first check, the serial is held at [ESI] and ECX and EDI need to be reset every iteration of my function regardless of anything else. Therefore, I came up with this:

Código:
MYPROC:

XOR ECX,ECX
XOR EDI,EDI
CMP BYTE PTR DS:[ESI+5],0x7E
JE PROC2:

MOV DL,BYTE PTR DS:[ESI+5]
INC DL
MOV BYTE PTR DS:[ESI+5],DL
JMP 4010F0

PROC2:
CMP BYTE PTR DS:[ESI+4],0x7E
JE PROC3:
MOV DL,0x20
MOV BYTE PTR DS:[ESI+5],DL
MOV DL,BYTE PTR DS:[ESI+4]
INC DL
MOV BYTE PTR DS:[ESI+4],DL
JMP 4010F0

PROC3:
CMP BYTE PTR DS:[ESI+3],0x7E
JE PROC4
MOV DL,0x20
MOV BYTE PTR DS:[ESI+5],DL
MOV BYTE PTR DS:[ESI+4],DL
MOV DL,BYTE PTR DS:[ESI+3]
INC DL
MOV BYTE PTR DS:[ESI+3],DL
JMP 4010F0

PROC4:
CMP BYTE PTR DS:[ESI+2],0x7E
JE PROC5
MOV DL,0x20
MOV BYTE PTR DS:[ESI+5],DL
MOV BYTE PTR DS:[ESI+4],DL
MOV BYTE PTR DS:[ESI+3],DL
MOV DL,BYTE PTR DS:[ESI+2]
INC DL
MOV BYTE PTR DS:[ESI+2],DL
JMP 4010F0

PROC5:
CMP BYTE PTR DS:[ESI+1],0x7E
JE PROC6
MOV DL,0x20
MOV BYTE PTR DS:[ESI+5],DL
MOV BYTE PTR DS:[ESI+4],DL
MOV BYTE PTR DS:[ESI+3],DL
MOV BYTE PTR DS:[ESI+2],DL
MOV DL,BYTE PTR DS:[ESI+1]
INC DL
MOV BYTE PTR DS:[ESI+1],DL
JMP 4010F0

PROC6:

MOV DL,0x20
MOV BYTE PTR DS:[ESI+5],DL
MOV BYTE PTR DS:[ESI+4],DL
MOV BYTE PTR DS:[ESI+3],DL
MOV BYTE PTR DS:[ESI+2],DL
MOV BYTE PTR DS:[ESI+1],DL
MOV DL,BYTE PTR DS:[ESI]
INC DL
MOV BYTE PTR DS:[ESI],DL
JMP 4010F0

I won’t patronise you by explaining all that, it’s pretty simple.

There’s still a few other things we need to do though:
– Patch the JNZs in the check algorithm to our bruteforce algorithm
– Test our new section

However, before we do this, I have a little trick. I figured since we’re not even going to use the patched application except to find the serial, we don’t need to leave all the ends tidy, such as returning EDx’s value, making sure the stack is valid, etc.

Our patched application will run perfectly in Olly of course but it will also run much slower than by itself. That is the reason I made a new section. Our massive algorithm could’ve been put into .text and it would’ve run fine in memory but we wouldn’t be able to save.

What I would do under normal circumstances to see when our algorithm had found the correct serial and what it was would be to set a breakpoint on the instruction after the last JNZ, ie. the instruction that is executed when ALL checks have been passed. Instead, my trick is to make the application display a message box when it gets there instead, then return to the algorithm. There are two reasons for what I do:
– We can see what serial we got to that was valid
– We can keep iterating the algorithm since we know there are multiple solutions

Actually, in the end I didn’t actually do this because there were so many valid serials that it would’ve taken ages to write them all into a file. Instead, I outputted all valid serials to a text file and then did an MD5 hash on each serial and then searched through the hashes for the one we wanted and found the corresponding serial. The reason I won’t show you this is for the sake of the length of this. I’m sure if you wanted to, you could do it yourself anyway 🙂

Anyway, patching the JNZs:

So what have I done ? First of all, I made a call to MessageBoxA with the serial as caption and message. This call is jumped to when a serial passes all checks. Notice the JMP I patched after the last JNZ. The JNZs also now take us to the byte just after the message box call. So this means our after our message box call, we will continue to our algorithm as I described before. Adding that now:

Notice our JNZ there is to 405000 which is where .inline is located if you were paying attention earlier. Because .inline is another section, hence another page, we have to save it separately, so save the changes for now into a new executable and we can patch the rest:

Maybe I’m rushing this, but patching is nothing new now so I’ve just assembled the appropriate instructions at 405000 as well and saved:

Saving and running the new file, you may find nothing happens for a while. I was running mine in Olly at the time so it took a few hours for the first valid serial to come up, but it should be a lot quicker since you can run yours as an independent program. The valid serial to be displayed via message box should probably be this:

If we enter this serial into the original application:

However, if you check that string’s MD5, it does not match with the requirements. So although it meets the criteria the application sets, it does not meet the criteria of the ReverseMe, that is the MD5 hash has to be 96c4dda0c4a0b34262b1d91d47056f9e.

If you do manage to automate an MD5 process for each possible serial, the one that is right is:
sedplk

Either way, you should get the idea now 😉 Thanks for reading, see you next time !

Download file : http://www.ziddu.com/download/3550102/ReversingTarget.rar.html

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.