Python prep for Advent of Code 2022
We are 1 day away from this year's Advent of Code, the annual software development puzzle solving Christmas calendar. Last year, I solved nearly all puzzles and did so with Python and Jupyter Notebooks in my toolbox. I've also spent some time since that working on previous years' puzzles with the same setup.
This year, you can follow my progress and writings in GitHub: https://github.com/Hamatti/adventofcode-2022
In this blog post, I want to share some of the approaches and especially standard library modules and functions that I've shared in the past as part of my solutions. Getting familiar with these not only helps you solve Advent of Code solutions but also expands your real-life Python toolkit.
Building your own utility library
My first tip is to build a utility library to help with the things you end up doing every day. Mine usually starts with one function and then I might add some throughout December if I need some more special cases:
import os
def read_input(day, transformer=str, example=False):
"""
Given a day number (1-25), reads the corresponding input file into
a list with each line as an item.
Runs transformer function on each item.
"""
try:
if example:
filename = f'day_{day}_example.txt'
else:
filename = f'day_{day}.txt'
with open(os.path.join('..', 'inputs', filename)) as input_file:
return [transformer(line.strip()) for line in input_file]
except FileNotFoundError as e:
print(e)
My first function is read_input
and it's a helper that reads data from inputs/
folder. It takes three arguments: day
is an integer to tell which day's input we want to read, transformer
is a function (defaulting to str
) that is run on each line of the input. example
enables me to store the provided examples as input files and run the same code on both the example and the actual input.
Transformer function is an interesting and very helpful one. I want to keep my puzzle solution code as focused on the puzzle as possible and not parsing the input. So I can create a function that gets passed into the input reader. If the values are numbers, I can pass int
. If the input is list of comma separated items, I can pass a function that splits on comma. And so on.
Since these actions are things you do every single day with every puzzle, writing a function for it is very helpful.
My utility library is these days only about reading inputs but there's a lot of opportunities to build your own in a way that helps you the best. A good example for inspiration is Vilho's aoc-lib for Java that manages boilerplate creation, input file downloading, caching, benchmarking and other features.
This fall I started experimenting with a Firefox extension that when clicked, would copy that day's puzzle input to my clipboard to save me from a few clicks whenever I'd need that data. I haven't published it yet for Firefox though as I'm
zip & zip_longest
For Advent of Code puzzles, zip
is such a lifesaver. It combines (zips) multiple iterators into a single one in a way where the individual items from each iterable are paired/grouped together:
dogs = ['Frank', 'Lassie', 'Milo', 'Dug']
movies = ['Men in Black', 'Lassie', 'The Mask', 'Up']
for dog, movie in zip(dogs, movies):
print(f'{dog} is from {movie}')
# Frank is from Men in Black
# Lassie is from Lassie
# Milo is from The Mask
# Dug is from Up
If the iterables you give to zip
are not the same length, zip
will just ignore the extras:
numbers = [1, 2, 3, 4]
letters = ['a', 'b', 'c']
for number, letter in zip(numbers, letters):
print(number, letter)
# 1 a
# 2 b
# 3 c
In cases where you are expecting the original iterables to be of same length, you can add a safeguard with strict=True
parameter to zip
as that will raise a ValueError
if the arguments are of different length. The output will be the same as above for equal length arguments. This requires Python 3.10.
If you need all of the items to be included, you can use itertools.zip_longest
:
from itertools import zip_longest
numbers = [1, 2, 3, 4]
letters = ['a', 'b', 'c']
for number, letter in zip_longest(numbers, letters):
print(number, letter)
# 1 a
# 2 b
# 3 c
# 4 None
You can define a fillvalue
if you want the missing pairs to be filled a different default than None
:
from itertools import zip_longest
items = ['apple', 'banana', 'pear', 'mango']
counts = [5, 12, 3]
for item, count in zip_longest(items, counts, fillvalue=0):
print(item, count)
# apple 5
# banana 12
# pear 3
# mango 0
It's important to note that both zip
and zip_longest
return an iterator and if you try to print it as-is, you see something like <zip object at 0x100fbb440>
. You can iterate over it (for example in for
loops) or pass it to any function that accepts iterables but if you need to turn it into a list specifically, you need to wrap it inside list(your_zip)
call.
collections.namedtuple
Collections module contains some really nice utilities for puzzles.
I can't emphasize enough how much I like namedtuple. Giving meaningful names to your data is so crucial and namedtuple offers a few nice additions on top of just giving your data types aliases.
Let's take a look an example
from collections import namedtuple
Instruction = namedtuple('Instruction', ['direction', 'distance'])
instructions = []
raw_data = ['R,15', 'L,20', 'U,17', 'L,42', 'D,3']
for datum in raw_data:
dir, dist = datum.split(',')
instructions.append(Instruction(dir, int(dist)))
print(instructions)
# [Instruction(direction='R', distance=15), Instruction(direction='L', distance=20), Instruction(direction='U', distance=17), Instruction(direction='L', distance=42), Instruction(direction='D', distance=3)]
Here, we have raw data in a form that we could expect for a Advent of Code puzzle: a series of instructions with a direction and distance. We could store this data in regular lists or tuples but with them, 1) we'd have to refer to individual items with numeric indices, 2) it's easy to make mistakes and 3) the output for debugging printing is less useful.
With namedtuples, we get a nice formatted output (Instruction(direction='R', distance=15)
compared to regular tuple's (('R', 15)
) and we can refer to individual attributes with instruction.direction
instead of instruction[0]
.
I tend to use them a lot in my code for added clarity.
namedtuples are also "backwards" compatible with regular tuples: you can use them in-place anywhere that accepts tuples. It supports indexing with numbers and unpacks like regular tuples.
collections.Counter
Another handy tool from collections
is Counter. It is a subclass of dict that takes an iterable as an argument and counts the individual values in that iterable and stores that count data as a dictionary.
from collections import Counter
good_or_bad = [
'good', 'good', 'good',
'bad', 'bad', 'good',
'bad', 'good', 'bad'
]
counts = Counter(good_or_bad)
print(counts)
# Counter({'good': 5, 'bad': 4})
It has a few added methods that regular dictionaries don't have. One good example is most_common
. Continuing from the previous example:
print(counts.most_common(1))
# [('good', 5)]
The method takes an argument n
and returns a list of tuples for n
most common items. It's useful when you have a lot of data and you just want to inspect for example the top 3.
collections.deque
Last year's Advent of Code was probably the first time I ever used deque. It's a last-in, first-out (LIFO) data structure "by default" (see below the first code example) with added features. You can append
data to the deque and when you pop
from it, it removes the latest added item from the deque and returns it.
In Day 10, 2021 I used it to check if nested, matching pairs of parentheses, brackets and braces were in correct order:
from collections import deque
def check_errors(navigation):
pairs = {
'(': ')',
'[': ']',
'{': '}',
'<': '>'
}
openers = deque()
errors = []
for line in navigation:
for char in line:
if char in pairs.keys():
openers.append(char)
else:
opener = openers.pop()
if char != pairs[opener]:
errors.append(char)
break
return errors
Python's deque (short for double-ended queue) can also append or pop to the left side of the queue, making it possible to use it as a LIFO, FIFO (first-in, first-out), FILO (first-in, last-out)
itertools.pairwise
A relatively recent addition, entering the Python standard library in 3.10 last year, pairwise is another great addition to puzzle solving.
It takes an iterable and creates pairs of consecutive items:
from itertools import pairwise
letters = ['A', 'B', 'C', 'D', 'F', 'G']
for first, second in pairwise(letters):
print(first, second)
# A B
# B C
# C D
# D F
# F G
An example of this in the context of Advent of Code, last year I used it day 14, 2021. If you are not using Python 3.10 or newer, you can build your own pairwise functionality, for example by using the method I used in day 1, 2021 to create sliding windows.
itertools.product
Let's say you need a set of coordinates for x & y, for both being between 0 and 10. One way you could do that is with a cartesian product using itertools.product:
from itertools import product
xs = range(10)
ys = range(10)
coordinates = product(xs, ys)
print(list(coordinates))
## Prints
[
(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9),
(1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9),
(2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9),
(3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9),
(4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9),
(5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9),
(6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9),
(7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9),
(8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9),
(9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9)
]
Many Advent of Code puzzles are happening on some sort of coordinate system. My experience and techniques for solving those has changed and evolved a lot over time as I've realized easier and more performant ways to solve them.
For example, these days I often store data in a dictionary for only those coordinates that have a truthy value and dealing with the full grid of coordinates via looping only when needed (often in debugging prints) and direct math on coordinates on other cases (like determining effect on neighboring cells).
If something above resonated with you, let's start a discussion about it! Email me at juhamattisantala at gmail dot com and share your thoughts. In 2025, I want to have more deeper discussions with people from around the world and I'd love if you'd be part of that.