reverse engineering: for fun's sake

Introducing PyGDB

July 10, 2019

Foreword

This is my very much work-in-progress tool for Python based GDB debugging. It doesn’t do much at the moment, and it may be a while until it does. Still, it is more than nothing.

I’m not planning to do a big design and get everything nailed down all in one go. This is an evolutionary project. Like many SRE tools. It serves as a base for tackling the illegitimate software.

For others, this can be an example of where to base their own work off. Just introducing ideas, food for thought, so to speak. And it’s licensed as such, GPL3, such a nasty license.

What have we got

As a proof of concept, I’ve made a malloc/free tracker. This is pretty boring for sure, but introduces some concepts.

What it actually does is:

  • Tracks when and where and what size mallocs are

    • Extracting function arguments
    • Recording return values
    • Locating where calls took place
  • Tracks when and what addresses are freed

    • Comparing free argument to malloc returns
  • Provides a summary at the end of execution

    • Basic data manipulation

This means you can see what memory isn’t freed after the end of execution. Probably more of a forward engineering tool, but there is a bit that goes into this.

Pfft, as if there is a bit that goes into this

You caught me out. There isn’t much that goes into it. Just a few slight details. And some “design” decisions to make it somewhat abstract. I’m sure when I tackle the illegitimate software these “design” decisions will come back to bite me. I’m not very good at design. And actually believe only so much concrete design should be put into SRE tools.

Run the following:


script.sh

#!/bin/bash

gdb --batch --command=./script.gdb /usr/bin/ls

script.gdb

run
break malloc

And we get:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
script.gdb  script.sh
[Inferior 1 (process 2728) exited normally]
Breakpoint 1 at 0x7ffff7e2e960 (2 locations)

It says there are 2 locations for malloc!

But, how can there be two, and which one did it choose?

GDB gives you lots of details

Well, GDB lets you know a lot about the loaded binaries, included which symbols are within which binaries. Ranges for binaries, and a whole bunch of other things. And we can use Python to control GDB so it’s no thing.

The model for symbols

Symbols have particular meaning, If there are more than one address for a given symbol we’re probably interested in all of them. Probably.

The model is, a single symbol can have more than one address. A single binary which has a symbol has one address for that symbol.

For the SRE folks out there we know symbols for many target binaries are only so useful. Maybe they’re correct? Maybe they’re decoys? And most functions are static, so there is no symbol. But much of the actual doing is done in static functions. So getting drowned in symbol details seems a bit silly.

What if not symbols?

Addresses, we’ll reduce everything to addresses. Translate symbols to address, and use the name of the symbol for a mapping, but actually work on addresses.

Thankfully GDB gives a few ways to turn a function name into an address. See find_function in fn_find.py.

Enough

The project: PyGDB

A bit about it

It is trying to provide an abstraction around ABI and have plugins. So hopefully allow a plugin to be cross-platform.

Unfortunately that means getting it to run is a bit silly.

What to read?

The mallocfree plugin is the place to get an idea of how the abstraction works. PyGDB/src/plugins/mallocfree/PyGDB/plugins/mallocfree/mallocfree.py

Getting it to work

Run the following (note: this starts a sub-shell):


script.sh

#!/bin/bash

git clone https://github.com/re-ffs/PyGDB
cd PyGDB/src
./setup.sh

Then run some program as an argument to scratch.sh:

./scratch.sh ls -l

The resulting MallocCalls haven’t been freed. Try it on some of you’re own binaries.

Seems pretty hacky

Yes, at the moment it is incredibly hacky. No unit or integration tests, so pretty lame. If it becomes useful I’ll tidy it up a bit. But there is a lot to do:

  • Handle forks
  • Handle threads
  • Communicate with Ghidra
  • Make a proper install script

But I only have so much time.


Dan Farrell

Written by Dan Farrell who lives and works in Seattle tinkering away on firmware. To subscribe send an email to subscribe@re-ffs.com.