Today was a really fun puzzle. First part I managed to get through quite easily and on the second part, I first misunderstood the description, then wrote a ~32 second solution, refactored into a state where example worked but real input didn’t and finally found optimisations (all on my own!) that got the second part down to ~2 seconds.

Today’s solution took a ton of iterations so don’t imagine I came up with this solution from the start.

Let’s see how that happened!

Read input

Our input is a single line of digits:

2333133121414131402

where the alternating numbers represent file lengths and empty space lengths on a disk.

For example, the first 2 is a file with length of 2 and an id of 0. Then the next 3 is 3 empty blocks. Following 3 is another file with length of 3 and an id of 1. And so on until the end.

Today’s input parsing is bit convoluted, mostly because of the optimisations for part 2.

def map_fn(line):
    # filesystem is the state of disk
    filesystem = {}
 
    file_lengths = {}
    file_start_indices = {}
    empty_space_lengths = []
 
    file_id = 0
    fs_pointer = 0
 
    for idx, value in enumerate(line):
        value = int(value)
        # Is file
        if idx % 2 == 0:
            file_start_indices[file_id] = fs_pointer
            file_lengths[file_id] = value
            for _ in range(value):
                filesystem[fs_pointer] = file_id
                fs_pointer += 1
            file_id += 1
        # Is empty space
        else:
            empty_space_lengths.append((fs_pointer, value))
            fs_pointer += value
 
    return filesystem, (file_lengths, empty_space_lengths, file_start_indices)

We loop over every number in the line. Even indices are files and odds are empty spaces.

In part 1, we don’t care about file_lengths, empty_space_lengths nor file_start_indices which is why I return them as a tuple so they are easier to ignore.

For every file, we fill the filesystem (a dict for a sparse grid) with n amounts of that file’s file id to keep track of where each file block is.

For part 1, we ignore the empty spaces and just increase the filesystem pointer (fs_pointer) which is an index for disk slots.

Part 1

In part 1, we need to reorganise the filesystem in a way where we squeeze in everything to get rid of the empty slots. We do it by moving file blocks from the end to the empty spots along the way.

The example above expanded to filesystem looks like:

00...111...2...333.44.5555.6666.777.888899

and to replace the first empty spot (represented by .), we take the last 9 and move it to its spot to create

009..111...2...333.44.5555.6666.777.88889.

To do that, we loop over every possible index (from 0 to maximum file index max(fs)).

If the value in the filesystem at a given index is a file block (ie. it exists in our sparse dictionary), we continue because we don’t need to move anything.

If it’s an empty space, we first find the index of the right-most file block and make sure it’s on the left from our current position (so we avoid moving numbers to later spots).

We then move the right-most block to current empty space and delete it from the right-most position.

To calculate the result, we sum the product of each file_id with their index in the filesystem.

def part_1():
    fs, _ = read_input(9, map_fn)[0]
    max_idx = max(fs)
 
    for i in range(max_idx + 1):
        if fs.get(i) is not None:
            # It's a file in a correct spot, do nothing
            continue
        while fs.get(max_idx) is None:
            # Find the right-most file index
            max_idx -= 1
        if i >= max_idx:
            # Don't move files to the right
            break
 
        # Move from end to the current empty spot
        fs[i] = fs[max_idx]
 
        # Delete old file block pointer
        del fs[max_idx]
        max_idx -= 1
 
    checksum = sum(index * file_id for index, file_id in fs.items())
 
    print(f"Part 1: {checksum}")

Part 2

I then continued to second part but misunderstood the puzzle. Turns out, I technically misunderstood it already on the first spot but since we moved file blocks one at a time, it didn’t matter.

What I tried first was to go through all empty spots and find the right-most file that fits into that empty spot.

The actual puzzle was to go through all the files from right to left and find the earliest spot they fit into. Tiny difference but completely different outcome.

My initial solution was to loop over all the file ids and for each file id, start from the left and find the next open slot.

This meant looping over a long filesystem over and over and over again for every file id which resulted in the correct outcome but took over 30 seconds. Sometimes I’m happy with that but today, I knew I could figure out a better solution myself so I kept going.

Let’s go through the solution one step at a time:

First, I read in and parse the data:

def part_2():
    fs, (file_lengths, empty_lengths, file_start_indices) = read_input(9, map_fn)[0]

Since the file_lengths is a dictionary of file_id -> file_length, I could go through all file_ids in reverse order by taking all the keys and reversing the list. In Python, if you pass a dictionary to anything that accepts an iterable, it will give it its keys.

(If you want values, you can say file_lengths.values() or if you want tuples of (key, value), you can use file_lengths.items().)

for file_id in reversed(file_lengths):

I then had a helper function to find the next empty slot of for a specific length. In it, we get loop over all the empty spots and as soon as we find one that’s large enough to fit the desired file, we return its indices for both the empty spaces list (index) and in the filesystem (fs_index).

def find_empty_slot(empty_spaces, length):
    for index, (fs_idx, empty_length) in enumerate(empty_spaces):
        if empty_length >= length:
            return (index, fs_idx)
    return None

Inside our loop, we get current file’s length:

length = file_lengths[file_id]

and find an empty spot for it

empty_slot = find_empty_slot(empty_lengths, length)

If there was no empty slot, we move on to the next file and leave the current one where it is

if empty_slot is None:
	continue

We then unpack the return value and find the starting index for our current value.

If the empty index is further to the right than where the file starts, we stop because we don’t want to move files to the right.

empty_lengths_idx, empty_fs_idx = empty_slot
 
start_idx = file_start_indices[file_id]
if empty_fs_idx > start_idx:
	continue

Then, we move all of the blocks for the file into the empty space we found and delete the references to the old position.

for k in range(length):
	fs[empty_fs_idx + k] = fs[start_idx + k]
	del fs[start_idx + k]

Finally, we need to update our empty spaces information to remove the newly used space so the computer knows not to try to put anything else into the spot we just filled with a file.

empty_lengths[empty_lengths_idx] = (
	empty_fs_idx + length,
	empty_lengths[empty_lengths_idx][1] - length,
)

This is the full process of moving a file to its new location.

Once we are done with all the files, we calculate the checksum as previously:

checksum = sum(index * file_id for index, file_id in fs.items())
 
print(f"Part 2: {checksum}")

What a lovely puzzle. I had so much fun with it and I’m so proud I found the optimisations all by myself.

At the core of these optimisations today was doing the work up front in parsing so we could get constant speed access to many things that would otherwise required a ton of nested looping.