Writing a Linux Debugger Part 3: Registers and memory
In the last post we added simple address breakpoints to our debugger. This time we’ll be adding the ability to read and write registers and memory, which will allow us to screw around with our program counter, observe state and change the behaviour of our program.
Series index
- Setup
- Breakpoints
- Registers and memory
- Elves and dwarves
- Source and signals
- Source-level stepping
- Source-level breakpoints
- Stack unwinding
- Handling variables
- Advanced topics
Registering our registers
Before we actually read any registers, we need to teach our debugger a bit about our target, which is x86_64. Alongside sets of general and special purpose registers, x86_64 has floating point and vector registers available. I’ll be omitting the latter two for simplicity, but you can choose to support them if you like. x86_64 also allows you to access some 64 bit registers as 32, 16, or 8 bit registers, but I’ll be sticking to 64. Due to these simplifications, for each register we need its name, its DWARF register number, and where it is stored in the structure returned by ptrace
. I chose to have a scoped enum for referring to the registers, then I laid out a global register descriptor array with the elements in the same order as in the ptrace
register structure.
You can typically find the register data structure in /usr/include/sys/user.h
if you’d like to look at it yourself, and the DWARF register numbers are taken from the System V x86_64 ABI.
Now we can write a bunch of functions to interact with registers. We’d like to be able to read registers, write to them, retrieve a value from a DWARF register number, and lookup registers by name and vice versa. Let’s start with implementing get_register_value
:
Again, ptrace
gives us easy access to the data we want. We construct an instance of user_regs_struct
and give that to ptrace
alongside the PTRACE_GETREGS
request.
Now we want to read regs
depending on which register was requested. We could write a big switch statement, but since we’ve laid out our g_register_descriptors
table in the same order as user_regs_struct
, we can search for the index of the register descriptor, and access user_regs_struct
as an array of uint64_t
s.1
The cast to uint64_t
is safe because user_regs_struct
is a standard layout type, but I think the pointer arithmetic is technically UB. No current compilers even warn about this and I’m lazy, but if you want to maintain utmost correctness, write a big switch statement.
set_register_value
is much the same, we write to the location and write the registers back at the end:
Next is lookup by DWARF register number. This time I’ll actually check for an error condition just in case we get some weird DWARF information:
Nearly finished, now he have register name lookups:
And finally we’ll add a helper to dump the contents of all registers:
As you can see, iostreams has a very concise interface for outputting hex data nicely2. Feel free to make an I/O manipulator to get rid of this mess if you like.
This gives us enough support to handle registers easily in the rest of the debugger, so we can now add this to our UI.
Exposing our registers
All we need to do here is add a new command to the handle_command
function. With the following code, users will be able to type register read rax
, register write rax 0x42
and so on.
Where is my mind?
We’ve already read from and written to memory when setting our breakpoints, so we need to add a couple of functions to hide the ptrace
call a bit.
You might want to add support for reading and writing more than a word at a time, which you can do by incrementing the address each time you want to read another word. You could also use process_vm_readv
and process_vm_writev
or /proc/<pid>/mem
instead of ptrace
if you like.
Now we’ll add commands for our UI:
Patching continue_execution
Before we test out our changes, we’re now in a position to implement a more sane version of continue_execution
. Since we can get the program counter, we can check our breakpoint map to see if we’re at a breakpoint. If so, we can disable the breakpoint and step over it before continuing.
First we’ll add for couple of helper functions for clarity and brevity:
Then we can write a function to step over a breakpoint:
First we check to see if there’s a breakpoint set for the value of the current PC. If there is, we first put execution back to before the breakpoint, disable it, step over the original instruction, and re-enable the breakpoint.
wait_for_signal
will encapsulate our usual waitpid
pattern:
Finally we rewrite continue_execution
like this:
Testing it out
Now that we can read and modify registers, we can have a bit of fun with our hello world program. As a first test, try setting a breakpoint on the call instruction again and continue from it. You should see Hello world
being printed out. For the fun part, set a breakpoint just after the output call, continue, then write the address of the call argument setup code to the program counter (rip
) and continue. You should see Hello world
being printed a second time due to this program counter manipulation. Just in case you aren’t sure where to set the breakpoint, here’s my objdump
output from the last post again:
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 55 push %rbp
118e: 48 89 e5 mov %rsp,%rbp
1191: 48 8d 35 6d 0e 00 00 lea 0xe6d(%rip),%rsi # 2005 <_ZStL19piecewise_construct+0x1>
1198: 48 8d 3d 81 2e 00 00 lea 0x2e81(%rip),%rdi # 4020 <_ZSt4cerr@@GLIBCXX_3.4>
119f: e8 dc fe ff ff callq 1080 <_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc@plt>
11a4: b8 00 00 00 00 mov $0x0,%eax
11a9: 5d pop %rbp
11aa: c3 retq
You’ll want to move the program counter back to 0x1191
offset from the base address so that the rsi
and rdi
registers are set up properly.
In the next post, we’ll take our first look at DWARF information and add various kinds of single stepping to our debugger. After that, we’ll have a mostly functioning tool which can step through code, set breakpoints wherever we like, modify data and so forth. As always, drop a comment below if you have any questions!
You can find the code for this post here.
Let me know what you think of this article on twitter @TartanLlama or leave a comment below!