Writing a Linux Debugger Part 5: Source and signals
on April 24, 2017
under c++
In the the last part we learned about DWARF information and how it can be used to read variables and associate our high-level source code with the machine code which is being executed. In this part we’ll put this into practice by implementing some DWARF primitives which will be used by the rest of our debugger. We’ll also take this opportunity to get our debugger to print out the current source context when a breakpoint is hit.
As I noted way back at the start of this series, we’ll be using libelfin to handle our DWARF information. Hopefully you got this set up in the first post, but if not, do so now, and make sure that you use the fbreg branch of my fork.
Once you have libelfin building, it’s time to add it to our debugger. The first step is to parse the ELF executable we’re given and extract the DWARF from it. Make these changes to debugger:
open is used instead of std::ifstream because the elf loader needs a UNIX file descriptor to pass to mmap so that it can map the file into memory rather than reading it a bit at a time.
Debug information primitives
Next we can implement functions to retrieve line entries and function DIEs from PC values. We’ll start with get_function_from_pc:
Here I take a naive approach of just iterating through compilation units until I find one which contains the program counter, then iterating through the children until we find the relevant function (DW_TAG_subprogram). As mentioned in the last post, you could handle things like member functions and inlining here if you wanted.
Next is get_line_entry_from_pc:
Again, we find the correct compilation unit, then ask the line table to get us the relevant entry.
The additional piece of infrastructure we need is an offset_load_address function. Remember that the program counter is using addresses based on where the binary was loaded, but the original binary may be position-independent and be using offsets as addresses. As such, we may need to offset the program counter so it’s using the right base.
We’ll update debugger::run to find the load address of the program after the debuggee launches successfully:
Add a uint64_t m_load_address; member to the debugger type, then we can set it in initialise_load_address like so:
I’m cheating by reading the first address from the file. We should really check to ensure that that address corresponds to the binary we’re looking for instead of some other dynamically-loaded library, but since we’ve disabled address space layout randomization, the first entry is the one we need. If you find the load address is wrong, check the maps to ensure that this is the case.
offset_load_address then subtracts the load address from whatever address we’re given:
Printing source
When we hit a breakpoint or step around our code, we’ll want to know where in the source we end up.
Now that we can print out source, we’ll need to hook this into our debugger. A good place to do this is when the debugger gets a signal from a breakpoint or (eventually) single step. While we’re at this, we might want to add some better signal handling to our debugger.
Better signal handling
We want to be able to tell what signal was sent to the process, but we also want to know how it was produced. For example, we want to be able to tell if we just got a SIGTRAP because we hit a breakpoint, or if it was because a step completed, or a new thread spawned, etc. Fortunately, ptrace comes to our rescue again. One of the possible commands to ptrace is PTRACE_GETSIGINFO, which will give you information about the last signal which the process was sent. We use it like so:
This gives us a siginfo_t object, which provides the following information:
I’ll just be using si_signo to work out which signal was sent, and si_code to get more information about the signal. The best place to put this code is in our wait_for_signal function:
Now to handle SIGTRAPs. It suffices to know that SI_KERNEL or TRAP_BRKPT will be sent when a breakpoint is hit, and TRAP_TRACE will be sent on single step completion:
There are a bunch of different signals and flavours of signals which you could handle. See man sigaction for more information.
Since we now correct the program counter when we get the SIGTRAP, we can remove this coded from step_over_breakpoint, so it now looks like:
Testing it out
Now you should be able to set a breakpoint at some address, run the program and see the source code printed out with the currently executing line marked with a cursor.
Next time we’ll be adding the ability to set source-level breakpoints. In the meantime, you can get the code for this post here.