Advent of Code - 2021

This is a solution to Day 14 of Advent of Code 2021.

Day 14 - Extended Polymerization

The incredible pressures at this depth are starting to put a strain on your submarine. The submarine has polymerization equipment that would produce suitable materials to reinforce the submarine, and the nearby volcanically-active caves should even have the necessary input elements in sufficient quantities.

The submarine manual contains instructions for finding the optimal polymer formula; specifically, it offers a polymer template and a list of pair insertion rules (your puzzle input). You just need to work out what polymer would result after repeating the pair insertion process a few times.

For example:
NNCB
CH -> B
HH -> N
CB -> H
NH -> C
HB -> C
HC -> B
HN -> C
NN -> C
BH -> H
NC -> B
NB -> B
BN -> B
BB -> N
BC -> B
CC -> N
CN -> C
The first line is the polymer template - this is the starting point of the process.

The following section defines the pair insertion rules. A rule like AB -> C means that when elements A and B are immediately adjacent, element C should be inserted between them. These insertions all happen simultaneously.

So, starting with the polymer template NNCB, the first step simultaneously considers all three pairs:

The first pair (NN) matches the rule NN -> C, so element C is inserted between the first N and the second N.

The second pair (NC) matches the rule NC -> B, so element B is inserted between the N and the C.

The third pair (CB) matches the rule CB -> H, so element H is inserted between the C and the B.

Note that these pairs overlap: the second element of one pair is the first element of the next pair. Also, because all pairs are considered simultaneously, inserted elements are not considered to be part of a pair until the next step.

After the first step of this process, the polymer becomes NCNBCHB.

Here are the results of a few steps using the above rules:
Template:     NNCB
After step 1: NCNBCHB
After step 2: NBCCNBBBCBHCB
After step 3: NBBBCNCCNBBNBNBBCHBHHBCHB
After step 4: NBBNBNBBCCNBCNCCNBBNBBNBBBNBBNBBCBHCBHHNHCBBCBHCB
This polymer grows quickly. After step 5, it has length 97; After step 10, it has length 3073. After step 10, B occurs 1749 times, C occurs 298 times, H occurs 161 times, and N occurs 865 times; taking the quantity of the most common element (B, 1749) and subtracting the quantity of the least common element (H, 161) produces 1749 - 161 = 1588.

Read input

For this, we need another custom method since there's multiple pieces separated by empty line.

At this point, I refactored things a bit in utils.py to create a new read_multisection_input function that splits input into sections by empty lines and then runs a transformer for each section.

from utils import read_multisection_input

def pair_transformer(section):
    lines = section.split('\n')
    pairs = {}
    for line in lines:
        pair, insert_letter = line.strip().split(' -> ')
        pairs[pair] = insert_letter
    return pairs
        

polymer, pairs = read_multisection_input(14, [str, pair_transformer])

Part 1

I mentioned pairwise function a few days back and here we finally get to use it! It's my first time ever as pairwise was introduced to Python only few months ago when 3.10 was released. Exciting times.

For each iteration (total count provided with the n parameter), we create all pairs of letters, find the corresponding new value to be inserted and add those new letters into a separate list.

We then zip_longest the original polymer and all the new characters together.

zip_longest is a function also from itertools library. It's an extension of the regular zip function that takes one or more iterables (in our case, two strings) and creates pairs from them:

zip([1, 2, 3], ['a', 'b', 'c'])
##[(1, 'a'), (2, 'b'), (3, 'c')]

Since the original polymer is always one longer than the new letters string, we use zip_longest as it allows uneven sized iterables. Using fillvalue parameter, we can define what's used in place of missing values:

from itertools import zip_longest
zip_longest([1, 2, 3, 4], ['a', 'b', 'c'], fillvalue="")
##[(1, 'a'), (2, 'b'), (3, 'c'), (4, '')]

And since zip_longest creates an iterable of tuples, we need to flatten that with the join on line 14.

Once all steps are done, we count the amounts of individual characters and return in the order of most common to least common using Counter's handy most_common function.

from itertools import pairwise, zip_longest
from collections import Counter


def extend(polymer, pairs, n=10):
    new_polymer = polymer
    for step in range(n):
        new_letters = ''
        for pair in pairwise(new_polymer):
            new = pairs[''.join(pair)]
            new_letters += new
        zipped = zip_longest(new_polymer, new_letters, fillvalue="")

        new_polymer = ''.join(char for char_pair in zipped for char in char_pair)


    counts = Counter(new_polymer)
    return counts.most_common()

Apply 10 steps of pair insertion to the polymer template and find the most and least common elements in the result. What do you get if you take the quantity of the most common element and subtract the quantity of the least common element?

results = extend(polymer, pairs, n=10)
result = results[0][1] - results[-1][1]

print(f'Solution: {result }')
assert result == 2408 # For refactoring

Solution: 2408

Part 2

The resulting polymer isn't nearly strong enough to reinforce the submarine. You'll need to run more steps of the pair insertion process; a total of 40 steps should do it.

In the above example, the most common element is B (occurring 2192039569602 times) and the least common element is H (occurring 3849876073 times); subtracting these produces 2188189693529.

Day 14 is the day when my straight-forward "let's loop over everything one by one" approach is no longer valid. With the evergrowing polymer, 40 steps is something that my naive algorithm cannot solve. (Believe me, I tried.)

And when I see a performance question, my first answer is always: dictionaries!

Dictionaries are nice for two reasons: first, a lookup from dictionary is fast (O(1) in algorithm complexity language) and second, in cases like this, it grows much slower than lists or strings because the keys are unique.

For this more optimized solution, we need to give up the string completely. Iterating over it won't cut when it grows into millions of characters.

So instead of using pairwise on the polymer on every iteration, we only do it once to create a baseline of how many of each pair there are.

defaultdict is our friend here as it allows us to do += operations even if the key is not in the dictionary as it defaults to 0.

Once we know the initial counts of each letter pair, we can iterate over our desired step count (provided by the keyword argument n that we default to 40).

On each iteration, we loop over a dictionary of pairs (the amount of keys grows significantly slower than the length of the string would because there are so many repeated pairs) of those previously counted pair counts.

For each pair, we see if our pair has a corresponding new letter in the provided input. If we do, we create two new pairs: (pair[0], new_letter) and (new_letter, pair[1]). To clarify what happens here, let's look at a simplified example:

pair = ('N', 'N')
new_letter = 'C' # From example's NN -> C

(pair[0], new_letter) # == ('N', 'C') which is the first letter with the new inserted
(new_letter, pair[1]) # == ('C', 'N') which is the new inserted before the second letter

This way we'll create both new pairs that come from inserting a single character between two characters.

And if we had for example 4 ('N', 'N') pairs so far, when we create these new pairs, we create 4 of each and add those to potentially already existing ones.

If I'd like to refactor the code a bit, at this point I'd realize that the inputs NN -> C should have been stored as { ("N", "N"): "C" } instead of { "NN": "C" } to avoid those ''.join(pair) transformations on each step both in part 1 and 2.

Once we have completed all of our steps, we count all the first characters in our pairs into a Counter. The reason we only care about the first character is that each second character is a first character in some other pair.

The only exception that we need to take into account is to add the very last character of the polymer (polymer[-1]) into the counts because it will never start a pair.

from collections import defaultdict, Counter
from itertools import pairwise


def optimized_extend(polymer, pairs, n=40):
    pair_counts = defaultdict(int)
    for pair in pairwise(polymer):
        pair_counts[pair] += 1
        
    for step in range(n):
        new_pair_counts = defaultdict(int)
        for pair, count in pair_counts.items():
            new_letter = pairs[''.join(pair)]
            new_pair_counts[(pair[0], new_letter)] += count
            new_pair_counts[(new_letter, pair[1])] += count

        pair_counts = new_pair_counts

    # Count all letters that start pairs
    # (no need to count second letters because 
    #  they always start another pair)
    counter = Counter()
    for (first, second), count in pair_counts.items():
        counter[first] += count
        
    # Don't forget the last character of the original 
    # polymer that never started a pair
    counter[polymer[-1]] += 1
    
    return counter.most_common()

Apply 40 steps of pair insertion to the polymer template and find the most and least common elements in the result. What do you get if you take the quantity of the most common element and subtract the quantity of the least common element?

results = optimized_extend(polymer, pairs, n=40)
result = results[0][1] - results[-1][1]

print(f'Solution: {result }')
assert result == 2651311098752 # For refactoring

Solution: 2651311098752