nc stack.overflow.fail 9001
by hk (Solves: 7, 1150 pts)
This weekend I played with WreckTheLine in 3 CTFs \(X_X)/, AeroCTF, Pragyan CTF and UTCTF. I’m really proud because we performed really well for them, finishing 19th, 3rd and 1st(!) respectively. The challenges in UTCTF were a lot more fun than I had expected. I managed to solve most of the pwns, except for Jendy’s(which our resident pwn expert NextLine solved), and solved ECHO together with the team. I had most fun with the PowerPC challenge in particular. When I saw the challenge, I thought it was a perfect opportunity to learn a new architecture and a new tool that would be useful for this challenge (Ghidra).
Power of PC
So far I’ve only ever done x86 challenges, never trying challenges from other architectures like ARM, PowerPC etc. By no means am I a PowerPC expert (or even an amateur), but heres some things I’ve learnt about PowerPC64 over the weekend as I attempted to solve this foreign challenge.
Similar to ARM, PowerPC has a massive number of registers. Fortunately, the register usage in PowerPC64 seems to be pretty similar to x86, so we don’t have too much of an issue. Here is a table of registers and their short descriptions that I made during the CTF.
|______________|__PowerPC Reg__|__________Description__________|___x86 equiv___| |GPR (General) | r1 | Stack pointer | rsp | | | r2 | System-reserved | ? | | | r3-r4 | Parameter passing/Return val | rdi/rsi/rax | | | r5-r10 | Parameter passing | rdx ... r9 | | | r11-r12 | Wo bu zhi dao | ? | | | r13 | small data area pointer (???) | ??? | | | r14-r30 | Local vars | * | | | r31 | Local var/Environment pointers| ? | |______________|_______________|_______________________________|_______________| |SPR (Special) | LR | Link register, saves IP | _ | | | CR | Condition register | EFLAGS | | | CTR | Count register, loop count | rcx | | | XER | Fixed-point exception register| ? | | | FPSCR | Floating-point status and .? | ? | |______________|_______________|_______________________________|_______________|
This is obviously not a comprehensive list, but it is roughly accurate enough to give you an idea of what kind of registers exist in PowerPC. After spending some time reversing the PowerPC binary, their uses may become a bit more clear.
This part tripped me up the most while doing the challenge. The calling convention of PowerPC is rather different from x86, which requires a bit more mental gymnastics for me to wrap my x86 head around. After reading a pretty useful reference, I was able to get a better idea. But what I found most effective was looking at the function prologues and epilogues in the binary, which gave me a good idea of how the calling convention works.
Here is what the stack will look like from the caller.
Low addresses (top of stack) ________________ ________________ |______Code______| | |<- r1 .... | Linkage area | 0: bl callee <- IP |________________| 4: .... | | 8: .... | Parameter area | |________________| High addresses
bl instruction will
branch, which is like a
call in x86. However, instead of pushing the address of the next instruction (4) onto the stack like in x86, in PowerPC, the address of the next instruction will be contained in the link register(
LR). Now, the function prologue of the callee will setup the stack for it’s own stack frame, and set up for the function to return back to the caller properly.
Low addresses (top of stack) ________________ _______________ ________________ |______Code______| |___Registers___| | |<- r1 .... ... | 4 | 1000: mfspr r0, LR < LR: 0x4 |________________| 1004: std r0, 0x10(r1) <- IP r0: 0x4 | | 1008: .... r1: 0x100 | Parameter area | |________________| High addresses
As can be seen here, the value of the link register(4) will be moved into
r0. The value of
r0 is then stored into the linkage area of the caller’s stack, since
r1 has not been changed yet. Now that the address to return to is stored, the callee prologue will setup it’s own stack frame.
Low addresses (top of stack) ________________ _______________ ________________ |______Code______| |___Registers___| | |<- r1 .... ... | Callee stack | 1014: stdu r1, -0x30(r1) < r1: 0xd0 |________________| 1018: or r31, r1, r1 <- IP r31: 0xd0 | | 101c: .... | 4 | |________________| | | | Parameter area | |________________| High addresses
To setup it’s own stack frame, the callee will decrease the stack pointer
r1 by the number of bytes required for it’s own variables, parameters for any function it will call, and it’s own link area.
r31 is also set to the value of
r1, so it can act as a stack pointer too.
Low addresses (top of stack) ________________ _______________ ________________ |______Code______| |___Registers___| | |<- r1 .... ... | 4 | 10f0: addi r1, r31, 0x30 < LR: 0x4 |________________| 10f4: ld r0, 0x10(r1) < r0: 0x4 | | 10f8: mtspr LR, r0 < r1: 0x100 | Parameter area | 10fc: blr <- IP r31: 0xd0 |________________| 1100: .... High addresses
When the callee has finished execution. The function epilogue will proceed to increase the stack pointer
r1 by 0x30 to get rid of the callee stack. Then,
r0 will load the saved address in the linkage area of the caller. The link register
LR is set to the value of
r0. Afterwards, the
blr instruction will cause program execution to jump to the address of
LR, which is what was saved on the stack earlier (4 in this case). Now the caller’s execution will continue.
Now that the prerequisite knowledge for PowerPC has been covered. The exploitation of the challenge binary begins. Since this challenge was in PowerPC, I wanted to try out the decompiler of Ghidra, to see how well it can help me in this challenge. And I was pleasantly surprised by how helpful Ghidra was in this challenge. Here is the decompiled
main of the challenge binary.
get_input() function first reads 1000 bytes from the user into a global buffer
buf. The amount of input we sent is determined using
strlen(), and then our input is xor’d with the constant 0xcb. The bug of this challenge is found in the
encrypt function that is later called.
As you can see, the
memcpy(local_88, buf, 1000) will cause a gigantic buffer overflow, since
local_88 only holds space for 104 bytes. So how can we exploit this?
Back to classics
This looks like a classic buffer overflow challenge, just with a twist that it is in a PowerPC binary. So let’s apply what we’ve learnt earlier to solve this challenge. Now first thing’s first, whatever payload we send is going to be xor’d with the constant 0xcb. So we are going to have to work around this. There are two options. Firstly, we can xor our payload with 0xcb before sending it over. When it is xor’d again inside the binary, we get back our original payload. This is based on the basic xor principle of something.
(A ^ B) ^ B == A
Alternatively, we can send a null byte at the start of our payload. This will cause
strlen to return 0, so the for loop of the xor cipher will not even run. This is what I did.
Now that that is out of the way, we can get to exploiting the buffer overflow. If you’ve understood the earlier explanation of the calling convention in PowerPC, you can see that the address that a function will return to is stored below the function’s stack frame. With a buffer overflow, we can easily overwrite this address to something else, allowing us to have IP control! This is very similar to exploitation in x86.
Low addresses (top of stack) ________________ ________________ | |<- r1 | |<- r1 | Callee stack | | AAAAAAAAAAA.. | |________________| |________________| | | | | | Saved ret addr | | AAAAAAAAAAA.. | (overwritten!) |________________| |________________| | | | | | Parameter area | | Parameter area | |________________| |________________| High addresses
Rop Rop Rop Rop
In a x86 challenge with a buffer overflow, we always do return-oriented programming (ROP) to gain a shell. It’s the same here. However, there is a difference, since PowerPC does not have a
ret instruction that pops of from the stack. Instead, we need to find gadgets that set the value of the link register
LR from a value on the stack. Then the gadget should end with a
blr instruction which jumps to the address in
LR. Just with any other binary, I threw the challenge into ropper to find the gadgets we need!
Seems we’re going to have to find our gadgets. How can we do this? I decided to mess around in Ghidra, to learn more about it’s capabilities. That’s when I found this really useful feature,
Search > For Matching Instructions. What this feature does is that it allows for us to select an existing instruction, and to find similar instructions in the rest of the binary. To give an example, if we had the instruction
ld r0, 0x10(r1), we could search for many similar variants.
ld r0, 0x10(r1) ld r*, 0x10(r1) ld r0, *(r1) ld *, *(*) ...
I found this feature really powerful and useful when looking for rop gadgets. To search for “pop” gadgets, I searched for
ld r*, *(r1), such gadgets would load a value from the stack into the register. Going through a few of them yielded many very useful gadgets. To save the gadgets, I bookmarked them in Ghidra using the
Bookmark feature, and categorised them together as “gadgets”.
To make sure they were usable gadgets, I checked if they would eventually set the value of
LR to something that is present on the stack, and then
blr to jump to that address. If there wasn’t such an instruction in the gadget, the gadget would not be chainable, thus pretty useless for our needs.
Now we roughly know how to find the gadgets we need in Ghidra, so what should our ROP chain look like? A quick search using the symbol tree shows that we don’t have
system or any
exec variant in the statically linked challenge binary.
In such a situation, ropping all the way to a shell might become more difficult. A useful technique would be to stage our attack. Rather than aiming to rop straight to popping a shell, we can try writing a ROP chain that allows us to jump to shellcode instead! Let’s check if there are any exectuable AND writable sections in the challenge.
GNU_STACK ... RWE. Perfect! We have an executable stack. Therefore, we can write PowerPC shellcode on the stack to pop a shell, then find a gadget that allows us to jump to the stack.
However, finding such a gadget turned out to be harder than I thought, as I could not find gadgets that moved the value of
r1 eventually into the link register
LR. This makes sense since there would basically never be a case where a normal binary needs to jump into the stack.
When in doubt, csu_init
I found the solution to my problem in a familiar gadget that I love to use in x86. The beloved
__libc_csu_init gadget! Luckily for us, this PowerPC binary also had the
__libc_csu_init gadget. The key to using this gadget is the following instructions.
100019a0 09 00 3e e9 ldu r9,0x8(r30) ... 100019b8 a6 03 29 7d mtspr CTR,r9 ... 100019c0 21 04 80 4e bctrl
ldu instruction will dereference
r30+0x8 and put the qword into
r9. In x86, this is equivalent to
mov r9, qword ptr[r30+0x8]. The value of
r9 is then loaded into
bctrl will jump to the address in
CTR. If we control the value of
r30 this gadget can be really powerful! Luckily for us,
__libc_csu_init handles that too.
100019d4 10 00 01 e8 ld r0,local_res10(r1) ... 100019e8 f0 ff c1 eb ld r30,local_10(r1) ... 100019f0 a6 03 08 7c mtspr LR,r0 100019f4 20 00 80 4e blr
With this other gadget, we are able to control our return address (so it’s chainable), and we also set the value of
r30 from a value on the stack.
With these two gadgets chained together, we can jump to the stack if we have a known address that contains a pointer to the stack. This would be the
.bss section of our binary. The section is always mapped to the same address since PIE is not enabled, so we just need to find where the pointers are. I did this using a feature added to the
scan by my friend daniellimws.
gef➤ vmmap Start End Offset Perm Path ... (this is the .bss) 0x00000000100ce000 0x00000000100d2000 0x00000000000be000 rw- /home/lord_idiot/CTF/utctf19/PPower_enCryption/ppc 0x00000000100d2000 0x00000000100f6000 0x0000000000000000 rw- ... (this is the stack) 0x0000004000001000 0x0000004000801000 0x0000000000000000 rw- ... gef➤ scan 0x00000000100d2000-0x00000000100f6000 0x00000040007ff000-0x0000004000800000 [+] Searching for addresses in '0x00000000100d2000-0x00000000100f6000' that point to '0x00000040007ff000-0x0000004000800000' 0x00000000100d2498│+0x0498: 0x00000040007ffb78 → 0x00000040007fff44 → "XDG_SEAT=seat0" 0x00000000100d33f0│+0x13f0: 0x00000040007ffdc8 → 0x0000000000000016 0x00000000100d3490│+0x1490: 0x00000040007ffb68 → 0x00000040007fff3e → 0x4458006370702f2e ("./ppc"?) 0x00000000100d4100│+0x2100: 0x00000040007ff7e0 → 0x7822847c48006138 ("8a"?)
The final address
0x00000000100d4100 was the magic address, as the stack pointer it had (
0x00000040007ff7e0) fell within the 1000 byte buffer overflow we had in the stack. So I just had to make sure that my shellcode ended up in this area.
Chaining everything together (I’ll leave the shellcode writing as an exercise for the reader), we get our ROP chain that jumps to the shellcode that pops our shell!
mtspr LR, r0; blr
The journey to solving this challenge was really fun for me, and now I’m a bit more familiar with PowerPC and Ghidra. I think this tool is actually really cool, and I’m loving that it’s able to decompile all sorts of weird architectures (even if it isn’t a perfect decompile). Also, I’m pretty proud that after struggling through this challenge, I was able to solve this challenge in second place, about 2 hours slower than the first blood team.
Since Ghidra was pretty useful for solving this challenge, I also decided to give it a quick makeover!