July 10, 2019
This is my very much work-in-progress tool for Python based GDB debugging. It doesn’t do much at the moment, and it may be a while until it does. Still, it is more than nothing.
I’m not planning to do a big design and get everything nailed down all in one go. This is an evolutionary project. Like many SRE tools. It serves as a base for tackling the illegitimate software.
For others, this can be an example of where to base their own work off. Just introducing ideas, food for thought, so to speak. And it’s licensed as such, GPL3, such a nasty license.
What have we got
As a proof of concept, I’ve made a malloc/free tracker. This is pretty boring for sure, but introduces some concepts.
What it actually does is:
Tracks when and where and what size mallocs are
- Extracting function arguments
- Recording return values
- Locating where calls took place
Tracks when and what addresses are freed
Provides a summary at the end of execution
- Basic data manipulation
This means you can see what memory isn’t freed after the end of execution. Probably more of a forward engineering tool, but there is a bit that goes into this.
Pfft, as if there is a bit that goes into this
You caught me out. There isn’t much that goes into it. Just a few slight details. And some “design” decisions to make it somewhat abstract. I’m sure when I tackle the illegitimate software these “design” decisions will come back to bite me. I’m not very good at design. And actually believe only so much concrete design should be put into SRE tools.
Run the following:
#!/bin/bash gdb --batch --command=./script.gdb /usr/bin/ls
run break malloc
And we get:
[Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". script.gdb script.sh [Inferior 1 (process 2728) exited normally] Breakpoint 1 at 0x7ffff7e2e960 (2 locations)
It says there are 2 locations for malloc!
But, how can there be two, and which one did it choose?
GDB gives you lots of details
Well, GDB lets you know a lot about the loaded binaries, included which symbols are within which binaries. Ranges for binaries, and a whole bunch of other things. And we can use Python to control GDB so it’s no thing.
The model for symbols
Symbols have particular meaning, If there are more than one address for a given symbol we’re probably interested in all of them. Probably.
The model is, a single symbol can have more than one address. A single binary which has a symbol has one address for that symbol.
For the SRE folks out there we know symbols for many target binaries are only so useful. Maybe they’re correct? Maybe they’re decoys? And most functions are static, so there is no symbol. But much of the actual doing is done in static functions. So getting drowned in symbol details seems a bit silly.
What if not symbols?
Addresses, we’ll reduce everything to addresses. Translate symbols to address, and use the name of the symbol for a mapping, but actually work on addresses.
Thankfully GDB gives a few ways to turn a function name into an address. See
The project: PyGDB
A bit about it
It is trying to provide an abstraction around ABI and have plugins. So hopefully allow a plugin to be cross-platform.
Unfortunately that means getting it to run is a bit silly.
What to read?
The mallocfree plugin is the place to get an idea of how the abstraction
Getting it to work
Run the following (note: this starts a sub-shell):
#!/bin/bash git clone https://github.com/re-ffs/PyGDB cd PyGDB/src ./setup.sh
Then run some program as an argument to
./scratch.sh ls -l
MallocCalls haven’t been freed. Try it on some of you’re own
Seems pretty hacky
Yes, at the moment it is incredibly hacky. No unit or integration tests, so pretty lame. If it becomes useful I’ll tidy it up a bit. But there is a lot to do:
- Handle forks
- Handle threads
- Communicate with Ghidra
- Make a proper install script
But I only have so much time.
Written by Dan Farrell who lives and works in Seattle tinkering away on firmware. To subscribe send an email to email@example.com.