reverse engineering: for fun's sake

GDB and Python

July 08, 2019

Foreword

I’ve mentioned GDB’s Python API to various people over the past 5 years. As a grade A procrastinator I didn’t deliver anything, even though I was all like ‘Yo GDB and Python together are the best’.

Now I’m procrastinating about cleaning my apartment, so here is the intro to GDB’s Python API.

I’ve put together what I’ve learnt about GDB and Python by trial and error. Eventually over the years I’ve put some structure around it, and have methods that seem neat-ish and usable.

The files that go with this post: gdb-and-python.tar.gz

Aim

The aim of this post is to, in a hand-waving way, introduce the utility of GDB’s Python API. By hand-waving I mean I will not go through all of the details, in fact what I present here is not practical without further extension. I promise the next post will have actual useful non-hand-waving content.

In terms of style, I’m just going to go straight into examples, it is presumed you can fill in the gaps.

info gdb is your friend.

Let’s go

Guess how much I love interactive GDB? The answer: -1. Interactive GDB is useful but takes too long to explain in a blog what to type next etc. Plus I imagine following it would be painful. So, we will use GDB in batch mode.

Consider:


script.sh

#!/bin/bash

gdb --batch --command=./script.gdb /usr/bin/ls

script.gdb

source script.py

script.py

#!/usr/bin/env python3

print('Hello World!')

What will happen if you run script.sh?

Spoiler alert!

Hello World!

Cool, but /usr/bin/ls didn’t run.

Bonus object

The obvious thing to notice is we didn’t interact with GDB from within Python at all! Let’s do the simplest thing:


script.py

#!/usr/bin/env python3

print('Hello World!')

gdb.execute('run')

Now, we get:

Hello World!
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
script.gdb  script.py  script.sh
[Inferior 1 (process 18722) exited normally]

So we have this bonus undeclared gdb object. This is our handle into GDB.

Read info gdb for more, well, info.

Something nearly useful

Messing with breakpoints is the next step. Rather than explain absolutely everything about this topic I’ll demonstrate and highlight the important points.

Consider:


script.py

!/usr/bin/env python3


class BPStop(gdb.Breakpoint):
    '''Stopping breakpoint'''

    def stop(self):
        '''Stops'''

        return True


class BPPrint(gdb.Breakpoint):
    '''Printing breakpoint'''

    def __init__(self, msg, *args, **kwargs):
        '''Constructor'''

        self.msg = msg

        gdb.Breakpoint.__init__(self, *args, **kwargs)

    def stop(self):
        '''Prints and continues'''

        print(self.msg)

        return False


bp = BPStop('__libc_start_main')

gdb.execute('run')

malloc = BPPrint('Hello malloc!', 'malloc')

gdb.execute('continue')

What are we doing here? Well, a few things:

  • We introduce a breakpoint class that, well, stops
  • We introduce a breakpoint class that prints and doesn’t stop
  • A stopping breakpoint is set at __libc_start_main
  • The executable is run
  • We expect to break at __libc_start_main
  • A printing breakpoint is set at malloc
  • The executable is continued to its end

But what do we see:

Function "__libc_start_main" not defined.
Breakpoint 1 (__libc_start_main) pending.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x00007ffff7dcce40 in __libc_start_main () from /lib64/libc.so.6
Breakpoint 2 at 0x7ffff7e2e960 (2 locations)
Hello malloc!
Hello malloc!
Hello malloc!
***snip***
Hello malloc!
Hello malloc!
Hello malloc!
script.gdb  script.py  script.sh
[Inferior 1 (process 19112) exited normally]

From this you should see that if gdb.Breakpoint stop function return true, then GDB actually returns control of the process back to the Python script. This point is important and will be elaborated on later.

You can do Pythonic things at breakpoints. But there are limitations.

Boo, limitations!

TL;DR there is a way around the limitations, but for now in the GDB info pages describing the gdb.Breakpoint.stop function:

You should not alter the execution state of the inferior (i.e.,
step, next, etc.), alter the current frame context (i.e., change
the current active frame), or alter, add or delete any breakpoint.
As a general rule, you should not alter any data within GDB or the
inferior at this time.

But, but, but… I want to alter the execution state of the inferior, current frame context, add AND delete breakpoints. That’s exactly the point of this effort!

Work-around

Maybe work-around isn’t the right term because this technique actually makes a lot of sense.

As an aside, when I first read that paragraph in the GDB info pages I thought my grand plan of dynamic debugging for reverse engineering was scuppered. It took me ages to figure out how to deal with it… So embarrassing. You can lol, and rofl at this but roflmao is too far, I have feelings too.

Remember when I said GDB returns control back to the Python script?

So, why not just always stop and do some processing in the Python script context rather than the gdb.Breakpoint.stop context? Simple, ay? Again roflmao is too far.

Consider:


script.py

#!/usr/bin/env python3


class BPBase(gdb.Breakpoint):
    '''Base breakpoint class and manager'''

    # Breakpoint that was hit
    Hit = None

    NumEnabled = 0

    def __init__(self, *args, **kwargs):
        '''Constructor'''

        gdb.Breakpoint.__init__(self, *args, **kwargs)

        BPBase.NumEnabled += 1

    @classmethod
    def handle_stop(cls):
        '''Runs the on_stop function of Hit'''

        if cls.Hit:
            cls.Hit.on_stop()
            cls.Hit = None

    def on_stop(self):
        '''Default on_stop function, returns'''

        pass

    def disable(self):
        '''Disables the breakpoint'''

        BPBase.NumEnabled -= 1

        self.enabled = False


class BPPrint(BPBase):
    '''Printing breakpoint'''

    def __init__(self, msg, *args, **kwargs):
        '''Constructor'''

        self.msg = msg
        self.hits = 0

        BPBase.__init__(self, *args, **kwargs)

    def stop(self):
        '''Increments hits, runs base stop'''

        self.hits += 1

        BPBase.Hit = self

        return True

    def on_stop(self):
        '''Prints and disables after 5 hits'''

        print(self.msg)

        if self.hits >= 5:
            print('Too noisy...')

            self.disable()


bp = BPBase('__libc_start_main')

gdb.execute('run')

bp.disable()

malloc = BPPrint('Hello malloc!', 'malloc')

while BPBase.NumEnabled:
    gdb.execute('continue')

    BPBase.handle_stop()

gdb.execute('continue')

And now we get:

Function "__libc_start_main" not defined.
Breakpoint 1 (__libc_start_main) pending.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, 0x00007ffff7dcce40 in __libc_start_main () from /lib64/libc.so.6
Breakpoint 2 at 0x7ffff7e2e960 (2 locations)

Breakpoint 2, 0x00007ffff7e2e960 in malloc () from /lib64/libc.so.6
Hello malloc!

Breakpoint 2, 0x00007ffff7e2e960 in malloc () from /lib64/libc.so.6
Hello malloc!

Breakpoint 2, 0x00007ffff7e2e960 in malloc () from /lib64/libc.so.6
Hello malloc!

Breakpoint 2, 0x00007ffff7e2e960 in malloc () from /lib64/libc.so.6
Hello malloc!

Breakpoint 2, 0x00007ffff7e2e960 in malloc () from /lib64/libc.so.6
Hello malloc!
Too noisy...
script.gdb  script.py  script.sh
[Inferior 1 (process 21514) exited normally]

WTF DOES THIS HAVE TO DO WITH SRE!!!

Whoa, whoa, whoa, language.

Imagine you’re dealing with an piece of obfuscated software, or even just very complex piece of software. Tracking data through such software in an effort to see where the important stuff happens is important.

You can do it with static analysis, but that will take a long time. What if you know where it’s declared/malloc’d and you set watch points and programmatically follow it through the software? That saves time.

Up next

I’ll release a reference POC solution for doing some dynamic debugging with GDB, this will be extended when I tackle the illegitimate software.


Dan Farrell

Written by Dan Farrell who lives and works in Seattle tinkering away on firmware. To subscribe send an email to subscribe@re-ffs.com.