Advent of Code - 2022
This is a solution to Day 7 of Advent of Code 2022.
Day 7 - No Space Left On Device
You can hear birds chirping and raindrops hitting leaves as the expedition proceeds. Occasionally, you can even hear much louder sounds in the distance; how big do the animals get out here, anyway?
The device the Elves gave you has problems with more than just its communication system. You try to run a system update:
$ system-update --please --pretty-please-with-sugar-on-top
Error: No space left on device
Perhaps you can delete some files to make space for the update?
You browse around the filesystem to assess the situation and save the resulting terminal output (your puzzle input). For example:
$ cd / $ ls dir a 14848514 b.txt 8504156 c.dat dir d $ cd a $ ls dir e 29116 f 2557 g 62596 h.lst $ cd e $ ls 584 i $ cd .. $ cd .. $ cd d $ ls 4060174 j 8033020 d.log 5626152 d.ext 7214296 k
The filesystem consists of a tree of files (plain data) and directories (which can contain other directories or files). The outermost directory is called /. You can navigate around the filesystem, moving into or out of directories and listing the contents of the directory you're currently in.
Within the terminal output, lines that begin with
$
are commands you executed, very much like some modern computers:
cd
means change directory. This changes which directory is the current directory, but the specific result depends on the argument:cd x
moves in one level: it looks in the current directory for the directory named x and makes it the current directory.cd ..
moves out one level: it finds the directory that contains the current directory, then makes that directory the current directory.cd /
switches the current directory to the outermost directory, /.ls
means list. It prints out all of the files and directories immediately contained by the current directory:123 abc
means that the current directory contains a file named abc with size 123.dir xyz
means that the current directory contains a directory named xyz.Given the commands and output in the example above, you can determine that the filesystem looks visually like this:
- / (dir) - a (dir) - e (dir) - i (file, size=584) - f (file, size=29116) - g (file, size=2557) - h.lst (file, size=62596) - b.txt (file, size=14848514) - c.dat (file, size=8504156) - d (dir) - j (file, size=4060174) - d.log (file, size=8033020) - d.ext (file, size=5626152) - k (file, size=7214296)
Here, there are four directories: / (the outermost directory), a and d (which are in /), and e (which is in a). These directories also contain files of various sizes.
Since the disk is full, your first step should probably be to find directories that are good candidates for deletion. To do this, you need to determine the total size of each directory. The total size of a directory is the sum of the sizes of the files it contains, directly or indirectly. (Directories themselves do not count as having any intrinsic size.)
The total sizes of the directories above can be found as follows:
- The total size of directory
e
is 584 because it contains a single file i of size 584 and no other directories.- The directory
a
has total size 94853 because it contains filesf
(size 29116),g
(size 2557), andh.lst
(size 62596), plus filei
indirectly (a
containse
which containsi
).- Directory
d
has total size 24933642.- As the outermost directory,
/
contains every file. Its total size is 48381165, the sum of the size of every file.To begin, find all of the directories with a total size of at most 100000, then calculate the sum of their total sizes. In the example above, these directories are a and e; the sum of their total sizes is 95437 (94853 + 584). (As in this example, this process can count files more than once!)
Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?
Today's puzzle was super fun! Also, bit hard for me as usual because there was a tree and trees mean recursion and I'm really having a hard time grasping and debugging recursive solutions.
Read input
To start with, I created three namedtuples to function as temporary data types between parsing the input file and creating the filesystem tree. By adding the else: raise Exception
I was able to confirm I had not missed any input types as that could have caused a lot of hard to find bugs.
from utils import read_input
from collections import namedtuple
Command = namedtuple('Command', ['command', 'target'], defaults=(None, None))
Directory = namedtuple('Directory', ['name'])
File = namedtuple('File', ['name', 'size'])
def transformer(line):
if line.startswith('$'):
prompt = line.split(' ')
if len(prompt) == 2:
return Command(command=prompt[1])
elif len(prompt) == 3:
return Command(command=prompt[1], target=prompt[2])
elif line[0].isnumeric():
size, name = line.split(' ')
return File(name, int(size))
elif line.startswith('dir'):
_, name = line.split(' ')
return Directory(name)
else:
raise Exception('Unknown line')
listings = read_input(7, transformer)
examples = read_input(7, transformer, True)
Modeling
I then created a class to represent a file in our filesystem (remember kids, everything is a file!).
A file knows its name, size, if it's a directory or not and it's place in the hierarchy: it's parent and children.
class FileObj:
def __init__(self, name, size, is_dir):
self.name = name
self.is_dir = is_dir
self.size = size
self.children = []
self.parent = None
def add_child(self, child):
self.children.append(child)
def add_parent(self, parent):
self.parent = parent
def calculate_size(self):
s = self.size
for child in self.children:
s += child.calculate_size()
return s
def sum_size(self, threshold):
if not self.is_dir:
return 0
elif self.calculate_size() <= threshold:
return self.calculate_size() + sum(child.sum_size(threshold) for child in self.children)
else:
return sum(child.sum_size(threshold) for child in self.children)
def find_larger_than(self, threshold):
if not self.is_dir:
return None
elif self.calculate_size() >= threshold:
return [self.calculate_size()] + [child.find_larger_than(threshold) for child in self.children if child.find_larger_than(threshold)]
else:
return None
def __repr__(self):
if self.is_dir:
return f'(Dir name="{f"/{self.name}"}" children={self.children} size={self.calculate_size()})'
else:
return f'(File name="{self.name}" size={self.size})'
Building the filesystem tree
If I were to refactor this all the way, I would combine this with the data input phase because I'm kind of reading everything twice now. The reason I ended up here is that when I was solving this problem, I used a temporary dictionary to store data as well (as I wasn't confident enough with trees) but I was able to refactor that part out.
The process
function returns the root of our filesystem and everything else can be found from within that, following the children
and parent
references.
def process(listings):
root = FileObj('/', 0, True)
current = root
for listing in listings:
match listing:
case Directory(name):
obj = FileObj(name, 0, True)
current.add_child(obj)
case File(name=name, size=size):
obj = FileObj(name, size, False)
current.add_child(obj)
case Command('ls', None):
continue
case Command('cd', '..'):
current = current.parent
case Command('cd', None):
current = root
case Command('cd', directory):
if(directory == '/'):
current = root
else:
parent = current
current = [d for d in current.children if d.name == directory][0]
current.add_parent(parent)
return root
Part 1
Find all of the directories with a total size of at most 100000. What is the sum of the total sizes of those directories?
To find these, we call the oddly named sum_size
method (might refactor it into a better name later).
root = process(listings.copy())
solution_1 = root.sum_size(100000)
print(f'Part 1: {solution_1}')
assert solution_1 == 1350966
Part 2
Now, you're ready to choose a directory to delete.
The total disk space available to the filesystem is 70000000. To run the update, you need unused space of at least 30000000. You need to find a directory you can delete that will free up enough space to run the update.
In the example above, the total size of the outermost directory (and thus the total amount of used space) is 48381165; this means that the size of the unused space must currently be 21618835, which isn't quite the 30000000 required by the update. Therefore, the update still requires a directory with total size of at least 8381165 to be deleted before it can run.
To achieve this, you have the following options:
- Delete directory
e
, which would increase unused space by 584.- Delete directory
a
, which would increase unused space by 94853.- Delete directory
d
, which would increase unused space by 24933642.- Delete directory
/
, which would increase unused space by 48381165.Directories e and a are both too small; deleting them would not free up enough space. However, directories d and / are both big enough! Between these, choose the smallest: d, increasing unused space by 24933642.
Find the smallest directory that, if deleted, would free up enough space on the filesystem to run the update. What is the total size of that directory?
To find the answer to part two, I calculated the space needed and then found all the candidates recursively from the tree with the find_larger_than
method, flattened it and found the smallest valid directory.
from utils import flatten
TOTAL_DISK_SPACE = 70000000
AVAILABLE_NEEDED = 30000000
root = process(listings.copy())
current_space = root.calculate_size()
space_needed = AVAILABLE_NEEDED - (TOTAL_DISK_SPACE - current_space)
candidates = flatten(root.find_larger_than(space_needed))
solution_2 = min(candidates)
print('Part 2:', solution_2)
assert solution_2 == 6296435