reverse engineering: for fun's sake

Progress on buffer tracking PoC

July 29, 2019

Foreword

I’m making progress on the buffer tracking PoC, but I want to make all of the code portable. I mean, the PyGDB plugin for this need not be x86 specific. This has made it a little fiddly, but I am making progress, which I report here.

Things done so far:

  • Added PyGDB watchpoint support
  • Added Ghidra/PyGDB API
  • Added ghidraremote PyGDB plugin
  • Started buffertracker PyGDB plugin
  • Proving to myself Ghidra provides enough PCode/Instruction APIs to get this done

So far what it can do is have a PyGDB script get symbols from Ghidra (including user created symbols), set a watchpoint, do an action on hitting a watchpoint.

What remains:

  • Support matching symbols from loaded libraries between PyGDB and Ghidra
  • Coding up the algorithm for finding the next address for a watch point

Aim

The aim of this post is to show some snippets showing that a programmatic architecture independent method for buffer tracking can be done.

High level view

The high level view is to make it possible to script up finding a buffer address based of reverse engineering and auto-magically follow the buffer through execution.

For example, this script for PyGDB should not change much from now:

#!/usr/bin/env python3

import sys
import os

#sys.path.append(os.getcwd())

import PyGDB
PyGDB.gdb = gdb

from PyGDB.bp import BPStopper
from PyGDB.env import Env
from PyGDB.bp_mgr import BPMgr
from PyGDB.plugins.ghidraremote import ghidraremote
from PyGDB.plugins.buffertracker import buffertracker
from PyGDB.fn_find import find_function, find_symbol

main_bp = BPStopper('__libc_start_main')

gdb.execute('run')
main_bp.enabled = False

env = Env()

client = ghidraremote.initialise()
tracker = None

first_malloc_ret = client.do_get_symbol_address(name='first_malloc')['address']
if first_malloc_ret is None:
    print('Could not find: first_malloc in Ghidra')
else:
    print('First malloc: %012x' % first_malloc_ret)

    tracker = buffertracker.initialise_on_ret(client, first_malloc_ret)

bp_mgr = BPMgr()
bp_mgr.run()

print(tracker)

client.disconnect()

This shows getting the post malloc call address from Ghidra which is found by static analysis, and setting up a buffer tracker based upon it.

Setting up a watchpoint

Then in the tracker we have the initialise:


def initialise_on_ret(client, ret_addr):
    '''Initialises the BufferTracker based on a ret value

    Args:
        client (CmdClient): Ghidra client
        ret_addr (int): Where the ret is valid
    '''

    tracker = BufferTracker(client)

    tracker.start_ret(ret_addr)

    return tracker

And how it gets started:

    def start_ret(self, ret_addr):
        '''Sets up the trackter for the ret value at ret_addr'''

        self.bp_mgr.add(ret_addr, on_stops=[self.on_start_ret])

    def on_start_ret(self):
        '''Callback for hitting start_ret ret_addr'''

        buffer_addr = self.abi.get_return()
        print('Buffer address: 0x%012x' % buffer_addr)

        self.update_wp(buffer_addr)

    def update_wp(self, addr):
        '''Updates the buffer watch point

        Args:
            addr (int): the buffer address
        '''

        if self.wp_addr is not None:
            self.bp_mgr.remove(self.wp_addr, on_stops=[self.on_wp])

        self.wp_addr = addr
        self.bp_mgr.add(self.wp_addr, on_stops=[self.on_wp],
                        _type=self.bp_mgr.WP)

Basic Ghidra code

Adding remote APIs to Ghidra is quite straightforward:

def test(a=None, b=None, c=None):
    ret = {}
    ret['entry_point'] = getFunctionContaining(currentAddress).getEntryPoint()\
            .getOffset()
    
    return ret

def get_symbol_address(name=None):
    '''Finds the symbol address
    
    Args:
        name (str): name of the symbol
        
    Returns:
        dict: return value
    '''
    
    ret = {}
    if not name:
        return ret
    
    symbol_table = currentProgram.getSymbolTable()
    symbols = symbol_table.getSymbols(name)
    symbol = None
    
    for x in symbols:
        if symbol is not None:
            raise RuntimeError('More than one symbol named "%s"' % name)
        symbol = x
    
    if symbol is None:
        ret['address'] = None
    else:
        ret['address'] = symbol.getProgramLocation().getAddress().getOffset()
    
    return ret

server = CmdServer()

server.add_command('test', test)
server.add_command('get_symbol_address', get_symbol_address)

Up next

I’m really hoping I’ll have this buffer tracking done for the next post. And I’ll try and provide a complete example that you can execute yourself.


Dan Farrell

Written by Dan Farrell who lives and works in Seattle tinkering away on firmware. To subscribe send an email to subscribe@re-ffs.com.