reverse engineering: for fun's sake

Matching symbols between Ghidra and GDB

August 06, 2019

Foreword

Man, there is just so many niggly things to do to get this going and it’s starting to annoy me.

No I don’t have a PoC yet…

I’ve spent time matching symbols between GDB and Ghidra. This may seem strange, but consider the load address, on execution library load address are random, but load addresses in Ghidra are static for a given import.

So, if I want to say, use the buffer tracker on code that uses memcpy, then I need to be able to do the programmatic SRE in the library rather than the application. This also means you need the exact binaries loaded in Ghidra as loaded in the execution in GDB.

Aim

This time I’m trying to show I’ve got this symbol matching down pat, and highlight some Ghidra/Jython things that became apparent to me.

Matching symbols from the high level

  • Step 1: find a common symbol (how about ’_init’)
  • Step 2: find the symbol in the particular file (like in libc.so) in both GDB and Ghidra
  • Step 3: apply the offset to the GDB symbol address to find the Ghidra symbol address

Simple, for example:


    def get_file_offset(self, file_name):
        '''Gets the file offset between the tool and local

        Args:
            file_name (str): the file in question

        Return:
            int: the offset
            None: could not find offset
        '''

        if file_name in self.file_offsets:
            return self.file_offsets[file_name]

        offset = None
        if file_name != '__inferior__':
            start_addr_local = find_function(self.CommonSymbol,
                                             binary=file_name,
                                             text=False)
            start_addr_local = start_addr_local[0]
            start_addr_tool = \
                self.do_get_symbol_address(name=self.CommonSymbol,
                                           file_name=file_name)

            start_addr_tool = start_addr_tool['address']

            offset = start_addr_tool - start_addr_local
        else:
            offset = 0

        self.file_offsets[file_name] = offset

        return offset

    def get_tool_address(self, address):
        '''Returns the tool address equivalent

        Args:
            address (int): the local address

        Return:
            int: the tool address
        '''

        file_name = get_image_name(address)
        if file_name != '__inferior__':
            file_name = file_name.split(os.path.sep)[-1]

        offset = self.get_file_offset(file_name)

        if offset is None:
            return offset

        return address + offset

Dealing with multiple files in Ghidra scripts

There are examples of this so it wasn’t too bad:


def get_program(file_name):
    '''Gets the program for file_name
    
    Args:
        file_name (str): the filename
    
    Return:
        Program: the program
    '''
    
    if file_name is None:
        return currentProgram
    
    project = state.getProject()
    project_data = project.getProjectData()
    root_folder = project_data.getRootFolder()

    this = object()

    program_ = None
    for file_ in root_folder.getFiles():
        if file_.getName() == file_name:
            program_ = file_.getDomainObject(this, True, False, monitor)

    return program_

What I learnt about Jython

I wondered how to deal with this line in one of the already existing scripts:


		Program program = null;
		try {
			program =
				(Program) domainFile.getDomainObject(this, true /*upgrade*/,
					false /*don't recover*/, monitor);
			processProgram(program);
		}

In particular the cast to Program. I was like ‘but how the fuck do I do that in python?’

After getting a bit worked up, it clicked that everything is an object, and Jython provides the duck-typing we all know. So, I don’t have to worry about casting it is an object and I can access the attributes regardless.

So what does it look like?

The output so far shows the matched up addresses:


Old value = 256
New value = 131328
main (argc=1, argv=0x7fffffffd1e8) at ../tests/main.c:29
29		for (i = 0; i < BUF_SIZE; i++) {
Hit wp! location=__inferior__@[local=0000004011da, tool=0000004011da]

Hardware access (read/write) watchpoint 3: *4215392

Old value = 131328
New value = 50462976
main (argc=1, argv=0x7fffffffd1e8) at ../tests/main.c:29
29		for (i = 0; i < BUF_SIZE; i++) {
Hit wp! location=__inferior__@[local=0000004011da, tool=0000004011da]
buf0: 0x405260
buf1: 0x4052f0
buf2: 0x405380

Hardware access (read/write) watchpoint 3: *4215392

Value = 50462976
0x00007ffff7f3c38d in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
Hit wp! location=/lib64/libc.so.6@[local=7ffff7f3c38d, tool=00000025f38d]

Hardware access (read/write) watchpoint 3: *4215392

Value = 50462976
0x00007ffff7f3c3b4 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
Hit wp! location=/lib64/libc.so.6@[local=7ffff7f3c3b4, tool=00000025f3b4]

Hardware access (read/write) watchpoint 3: *4215392

Value = 50462976
0x00007ffff7f3c3b8 in __memmove_avx_unaligned_erms () from /lib64/libc.so.6
Hit wp! location=/lib64/libc.so.6@[local=7ffff7f3c3b8, tool=00000025f3b8]

Where the tool address is the Ghidra address.

Yay now I can work on the programmatic SRE I was so excited about

Now I have a reference point in GDB and Ghidra I can do the work and dynamically and programmatically get the next what point address.


Dan Farrell

Written by Dan Farrell who lives and works in Seattle tinkering away on firmware. To subscribe send an email to subscribe@re-ffs.com.