[Z2A]Bimonthly malware challege – Emotet (Back From the Dead)

Posted: December 19, 2022 in My Tutorials, [Z2A]Bimonthly malware challege - Emotet
Tags: , , , , , , ,

Summary

Sample hash is: fc345d151b44639631fc6b88a979462dfba3aa5c281ee3a526c550359268c694

This write-up of mine will be divided into three parts:

  • Grab core Emotet Dll payload.
  • Recover API functions that used by core payload.
  • Decrypt strings
1. Grab core payload

A quick check of information related to sections of this sample shows that it may be crypted/packed to conceal the real malware inside the original sample, besides there is an extra section with an unusual name: text

Load the sample into x64dbg, set a breakpoint at the VirtualAlloc API function, run payload by press F9. It will break at the VirtualAlloc function:

Execute till return (Ctrl+F9) and follow the allocated memory, trace over the ret instruction to return the Dll’s code will reach the code area like the following:

To quickly get the Emotet core payload, set a bp at the ret command below the loop, then press F9 to let the payload finish decrypting and fill core payload content to the allocated memory. The resulting core payload is decrypted as shown below:

Now, dump the above memory to disk, then fix total size of the payload to 0x2B800, we get the final Emotet core Dll (Md5: 577118e39051f0678a52f871f74cd675):

2. API resolver
2.1. Recover Dll name from pre-calculated hash

Load fixed core Dll above into IDA, go to the export function DllRegisterServer we see there are 2 sub routines as follows:

At sub_1800282D0, Emotet will perform:

  1. Get the address of the API function based on the pre-computed hash value.
  2. Jump to the API function to execute.

At et_retrieve_api_addr (0x18000F174) function, the code snippet does the following:

  1. Retrieve the base address of the Dll based on the pre-computed hash value.
  2. Retrieve the address of the API function belong to the Dll above.

Continuing to dive into the et_get_dll_base_from_hash (0x0180002960) function, the process of getting the base address of the Dll will be as follows:

Based on the above pseudocode, rewrite the hash function in Python for the name of the Dll as follows:

Let’s check again with the name of the Dll is kernel32.dll:

We can write an IDAPython script that recovers the names of the DLLs that Emotet uses from these pre-computed hashes. The script performs the following tasks:

  1. Iterate all addresses refer to et_retrieve_api_addr function.
  2. Find the address of the instruction that assigns the hash value of the Dll name and retrieve this hash value.
  3. Calculate the hash value based on the list of common DLL names, then compare the calculated hash value with the hash value obtained in the previous step.
  4. If equal, create a new enumeration that will store the hash-to-dll-name mapping, then convert this hash value back to the name of the Dll.

import idc, ida_enum, idautils, ida_bytes, idaapi, ida_bytes

most_common_dlls = ['kernel32.dll','user32.dll','ntdll.dll','shlwapi.dll','iphlpapi.dll','urlmon.dll','ws2_32.dll','crypt32.dll','shell32.dll','advapi32.dll','gdiplus.dll','gdi32.dll','ole32.dll','psapi.dll','cabinet.dll','imagehlp.dll','netapi32.dll','wtsapi32.dll','mpr.dll','wininet.dll','userenv.dll','bcrypt.dll', 'comctl32.dll', 'comdlg32.dll', 'msvcrt.dll', 'oleaut32.dll', 'srsvc.dll', 'winhttp.dll', 'advpack.dll', 'combase.dll', 'ntoskrnl.exe']

#----------------------------------------------------------------------
def calc_hash(dll_name):
    """"""
    hash_value = 0x0
    module_name_list = []
    module_name_list = list(dll_name)
    for i in range(len(module_name_list)):
        ch = ord(module_name_list[i])
        hash_value = ((hash_value << 0x10) & 0xFFFFFFFF) + ((hash_value << 0x6) & 0xFFFFFFFF) + ch - hash_value
    # xored value need to change for each payload
    return ((hash_value ^ 0x106308C0) & 0xFFFFFFFF)

#----------------------------------------------------------------------
def get_enum_const(constant):
    """"""
    all_enums = ida_enum.get_enum_qty()
    for i in range(0, all_enums):
        enum_id = ida_enum.getn_enum(i)
        mask = ida_enum.get_first_bmask(enum_id)
        enum_constant = ida_enum.get_first_enum_member(enum_id, mask)
        name = ida_enum.get_enum_member_name(ida_enum.get_enum_member(enum_id, enum_constant, 0, mask))
        if int(enum_constant) == constant: return [name, enum_id]
        while True:
            enum_constant = ida_enum.get_next_enum_member(enum_id, enum_constant, mask)
            name = ida_enum.get_enum_member_name(ida_enum.get_enum_member(enum_id, enum_constant, 0, mask))
            if enum_constant == 0xFFFFFFFF:
                break
            if int(enum_constant) == constant: return [name, enum_id]
    return None

#----------------------------------------------------------------------
def convert_offset_to_enum(addr):
    """"""
    n_operand = 0
    if idc.print_insn_mnem(addr) == "push":
        constant = idc.get_operand_value(addr, 0) & 0xFFFFFFFF
    elif idc.print_insn_mnem(addr) == "mov":
        constant = idc.get_operand_value(addr, 1) & 0xFFFFFFFF
        n_operand = 1
    enum_data = get_enum_const(constant)
    if enum_data:
        name, enum_id = enum_data
        idc.op_enum(addr, n_operand, enum_id, 0)
        return True
    else:
        return False    

#----------------------------------------------------------------------
def enum_for_xrefs(func_addr, eid):
    """"""
    for x in idautils.XrefsTo(func_addr, flags=0):
        call_address = x.frm
        if ida_bytes.is_code(ida_bytes.get_full_flags(call_address)):
            #retrieve address of the instruction that assigns the Dll's hash value to the variable
            pre_module_hash_addr = idaapi.get_arg_addrs(call_address)[1]
                        
            if idc.print_insn_mnem(pre_module_hash_addr) == "mov" and idc.get_operand_type(pre_module_hash_addr, 1) == idc.o_imm:
                print ("[+] Target instruction found at 0x{address:x}".format(address=pre_module_hash_addr))
                pre_module_hash = idc.get_operand_value(pre_module_hash_addr, 1) & 0xFFFFFFFF
                module_hash_addr = pre_module_hash_addr
            
            for dll_name in most_common_dlls:
                calced_hash = calc_hash(dll_name)
                if calced_hash == pre_module_hash:
                    print ('   [+] Module name: %s ==> Hash: 0x%x' %(dll_name, calced_hash))
                    ida_enum.add_enum_member(eid, '%s_hash' % dll_name, int(calced_hash), idaapi.BADADDR)
                    if convert_offset_to_enum(module_hash_addr):
                        print ("   [+] Converted 0x%x to %s enumeration" % (idc.get_operand_value(module_hash_addr, 1) & 0xFFFFFFFF, dll_name))
                        
#----------------------------------------------------------------------
def main():
    """"""
    target_function = 0x018000F174 #change address of function
    '''Adds enum name'''
    if ida_enum.get_enum("MODULE_HASHES") != 0xffffffffffffffff:
        print('Enum already exists ...')
        return 0xffffffffffffffff
    else:
        eid = ida_enum.add_enum(0, "MODULE_HASHES", ida_bytes.hex_flag())
        
    enum_for_xrefs(target_function, eid)

if __name__ == '__main__':
    main()

The following figures is the result after executing the script:

We get the full list of Dlls that Emotet will use during execution:

2.2. Recover API name from pre-calculated hash

The pseudocode at the et_get_api_addr_from_hash (0x0180025D84) function does the following task:

Based on the above pseudocode, it can be seen that this hash function is similar to the hash function for Dll name above, we can rewrite it in Python in another way as follows:

Double-check with the API name is ExitProcess:

Following this article, we can write python script to perform the following tasks:

  1. Get the list of exported API functions from the list of Dlls obtained above.
  2. Calculate the hash, and write the results to a JSON-formatted file as follows: "api_hash_value": "api_name"

Results after script runs:

Once JSON file has been generated, we can write another IDAPython script (similar to above script or refer to this code) does the following tasks:

  1. Read the JSON data from the previously created into a dict variable.
  2. Iterate all addresses refer to et_retrieve_api_addr function.
  3. Find the address of the instruction that assigns the hash value of the Dll name and retrieve this hash value.
  4. Check the hash value if present in the above dict variable, create a new enumeration that will store the hash-to-function-name mapping, then convert our hash back to its enumeration name.

Here are the results after script runs:

3. Decrypt strings

To find the function that decrypt the strings, the fastest way is to find the function that calls the LoadLibraryW API because this function will take as an argument the name of the module to be loaded.

As the figure above, sub_18002629C will return the name of the module. The pseudocode at sub_18002629C stores its encrypted string as stack string, then calls the et_decrypt_string (0x180025C58) function to decrypt:

The et_decrypt_string function accepts parameters for the decryption process, including:

  1. Length of decrypted string.
  2. Multiplier (used for allocating heap memory to store the decoded string).
  3. Encrypted string stored as a stack string. These values are all dynamically calculated by Emotet and then stored on the stack.
  4. Key used for decryption.

The pseudocode of the function as shown below:

  1. Allocate heap memory to store the decrypted string.
  2. Execute the loop, load each dword of the encrypted string, perform the xor operation with the decryption key, and then assign the value after decryption to the allocated memory.

To verify we can do xor each value as below or through debugging:

As mentioned above, the encrypted string has a variable length and the values of the encrypted string are dynamically calculated by Emotet before being stored to the stack. Therefore, it is difficult to get these values for writing script to perform decryption. Therefore, one of the most possible ways is to write a script that uses IDA Appcall feature to execute a call to the decryption function and receive the decrypted string as the return result.

import idc, idautils, idaapi

#---------------------------------------------------------------------- 
def clean_data(data):
    data = data.rstrip(b'\x00')      
    if b'\x00\x00' in data:
        data = data.split(b'\x00\x00')[0].replace(b'\x00', b'')
    else:
        if data.count(b'\x00') == 1:
            data = data.split(b'\x00')[0]
        else:
            data = data.replace(b'\x00', b'')
  
    data = data.decode('latin-1')
    return data

#----------------------------------------------------------------------
def find_and_decrypt_data(func_addr):
    for call_addr in idautils.CodeRefsTo(func_addr, 1):
        func_call_addr = idaapi.get_func(call_addr).start_ea
        print ("Found the function call to the decrypt function at: 0x%x" % func_call_addr)
        
        dec_func_name = idc.get_func_name(func_call_addr)
        print ("Exec function: %s" % dec_func_name)        
        dec_func_proto = "wchar_t * __fastcall {:s}();".format(dec_func_name)
        dec_func = idaapi.Appcall.proto(dec_func_name, dec_func_proto)
      
        #Call function to decrypt data and clean the decrypted data
        try:
            dec_data = dec_func()
            dec_data = clean_data(dec_data)
            if dec_data:
                print("   [-] Decrypted data: %s" % repr(dec_data))
                print('-----\n')
        except Exception as e:
            print("FAILED: appcall failed: {}".format(e))
            continue
        
        #Set comment
        try:
            idc.set_cmt(call_addr, repr(dec_data), idc.SN_NOWARN)
            idc.set_func_cmt(func_call_addr, repr(dec_data), 1) 
        except:
            print("FAILED: to add comment")
            continue

#----------------------------------------------------------------------
def main():
    """"""
    dec_str_funcs = [0x0180025C58]
    print('[+] Decrypt string function: ', ['0x%x' % routine for routine in dec_str_funcs])
    for func_addr in dec_str_funcs:
        find_and_decrypt_data(func_addr)
#----------------------------------------------------------------------
if __name__ == '__main__':
    main()   

The final result after script runs:

4. References

End.

m4n0w4r

Comments
  1. PrFalken says:

    Very nice writeup, clean and clear !

  2. RussianPanda says:

    I really like your approach. Well done!

  3. […] 0day in {REA_TEAM}[Z2A]Bimonthly malware challege – Emotet (Back From the Dead) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.