Juha-Matti Santala
Community Builder. Dreamer. Adventurer.

Advent of Code - 2021

This is a solution to Day 16 of Advent of Code 2021.

Day 16 - Packet Decoder

As you leave the cave and reach open waters, you receive a transmission from the Elves back on the ship.

The transmission was sent using the Buoyancy Interchange Transmission System (BITS), a method of packing numeric expressions into a binary sequence. Your submarine's computer has saved the transmission in hexadecimal (your puzzle input).

The first step of decoding the message is to convert the hexadecimal representation into binary. Each character of hexadecimal corresponds to four bits of binary data:

0 = 0000
1 = 0001
2 = 0010
3 = 0011
4 = 0100
5 = 0101
6 = 0110
7 = 0111
8 = 1000
9 = 1001
A = 1010
B = 1011
C = 1100
D = 1101
E = 1110
F = 1111

The BITS transmission contains a single packet at its outermost layer which itself contains many other packets. The hexadecimal representation of this packet might encode a few extra 0 bits at the end; these are not part of the transmission and should be ignored.

Every packet begins with a standard header: the first three bits encode the packet version, and the next three bits encode the packet type ID. These two values are numbers; all numbers encoded in any packet are represented as binary with the most significant bit first. For example, a version encoded as the binary sequence 100 represents the number 4.

Packets with type ID 4 represent a literal value. Literal value packets encode a single binary number. To do this, the binary number is padded with leading zeroes until its length is a multiple of four bits, and then it is broken into groups of four bits. Each group is prefixed by a 1 bit except the last group, which is prefixed by a 0 bit. These groups of five bits immediately follow the packet header. For example, the hexadecimal string D2FE28 becomes:

110100101111111000101000
VVVTTTAAAAABBBBBCCCCC

Below each bit is a label indicating its purpose:

  • The three bits labeled V (110) are the packet version, 6.
  • The three bits labeled T (100) are the packet type ID, 4, which means the packet is a literal value.
  • The five bits labeled A (10111) start with a 1 (not the last group, keep reading) and contain the first four bits of the number, 0111.
  • The five bits labeled B (11110) start with a 1 (not the last group, keep reading) and contain four more bits of the number, 1110.
  • The five bits labeled C (00101) start with a 0 (last group, end of packet) and contain the last four bits of the number, 0101.
  • The three unlabeled 0 bits at the end are extra due to the hexadecimal representation and should be ignored.

So, this packet represents a literal value with binary representation 011111100101, which is 2021 in decimal.

Every other type of packet (any packet with a type ID other than 4) represent an operator that performs some calculation on one or more sub-packets contained within. Right now, the specific operations aren't important; focus on parsing the hierarchy of sub-packets.

An operator packet contains one or more packets. To indicate which subsequent binary data represents its sub-packets, an operator packet can use one of two modes indicated by the bit immediately after the packet header; this is called the length type ID:

  • If the length type ID is 0, then the next 15 bits are a number that represents the total length in bits of the sub-packets contained by this packet.
  • If the length type ID is 1, then the next 11 bits are a number that represents the number of sub-packets immediately contained by this packet.

Finally, after the length type ID bit and the 15-bit or 11-bit field, the sub-packets appear.

For example, here is an operator packet (hexadecimal string 38006F45291200) with length type ID 0 that contains two sub-packets:

00111000000000000110111101000101001010010001001000000000
VVVTTTILLLLLLLLLLLLLLLAAAAAAAAAAABBBBBBBBBBBBBBBB
  • The three bits labeled V (001) are the packet version, 1.
  • The three bits labeled T (110) are the packet type ID, 6, which means the packet is an operator.
  • The bit labeled I (0) is the length type ID, which indicates that the length is a 15-bit number representing the number of bits in the sub-packets.
  • The 15 bits labeled L (000000000011011) contain the length of the sub-packets in bits, 27.
  • The 11 bits labeled A contain the first sub-packet, a literal value representing the number 10.
  • The 16 bits labeled B contain the second sub-packet, a literal value representing the number 20.

After reading 11 and 16 bits of sub-packet data, the total length indicated in L (27) is reached, and so parsing of this packet stops.

As another example, here is an operator packet (hexadecimal string EE00D40C823060) with length type ID 1 that contains three sub-packets:

11101110000000001101010000001100100000100011000001100000
VVVTTTILLLLLLLLLLLAAAAAAAAAAABBBBBBBBBBBCCCCCCCCCCC
  • The three bits labeled V (111) are the packet version, 7.
  • The three bits labeled T (011) are the packet type ID, 3, which means the packet is an operator.
  • The bit labeled I (1) is the length type ID, which indicates that the length is a 11-bit number representing the number of sub-packets.
  • The 11 bits labeled L (00000000011) contain the number of sub-packets, 3.
  • The 11 bits labeled A contain the first sub-packet, a literal value representing the number 1.
  • The 11 bits labeled B contain the second sub-packet, a literal value representing the number 2.
  • The 11 bits labeled C contain the third sub-packet, a literal value representing the number 3.

After reading 3 complete sub-packets, the number of sub-packets indicated in L (3) is reached, and so parsing of this packet stops.

For now, parse the hierarchy of the packets throughout the transmission and add up all of the version numbers.

Here are a few more examples of hexadecimal-encoded transmissions:

  • 8A004A801A8002F478 represents an operator packet (version 4) which contains an operator packet (version 1) which contains an operator packet (version 5) which contains a literal value (version 6); this packet has a version sum of 16.
  • 620080001611562C8802118E34 represents an operator packet (version 3) which contains two sub-packets; each sub-packet is an operator packet that contains two literal values. This packet has a version sum of 12.
  • C0015000016115A2E0802F182340 has the same structure as the previous example, but the outermost packet uses a different length type ID. This packet has a version sum of 23.
  • A0016C880162017C3686B18A3D4780 is an operator packet that contains an operator packet that contains an operator packet that contains five literal values; it has a version sum of 31.

Decode the structure of your hexadecimal-encoded BITS transmission; what do you get if you add up the version numbers in all packets?

Read input

from utils import read_input

hex_transmission = read_input(16)[0]

Part 1

Today was the hardest day with Advent of Code ever. Took me over 12 hours, so many rants and so much frustration and of course a lot of help to figure this one out. Huge thanks to Erno for your patient help, I truly appreciate it.

Returned today's part 2 11:52PM, just before the midnight.

Tree structures are really hard for me. I've never used them in real life, only in some exercises at university's data structure courses over a decade ago. Reasoning and trying to balance them in my head while trying to figure out when to pass what and when to return what is so hard.

And due to recursion, debugging them step by step is so hard.

The easiest part of the puzzle was this: convert the hex value into binary.

We take each character in hex value, convert it to decimal and then binary and fill in zeroes to make it 4 bits long.

def convert_hex_to_binary(transmission):
    return ''.join(
                bin(int(char, 16))[2:].zfill(4) 
                for char 
                in transmission)

Helpers!

Literal and Operator are types of sub packets. We keep track of version to calculate the score for part 1 and we keep track of value and type_id for the second part. Have I said how cool namedtuples are? Because they are.

By spec, each packet starts with a few metadata bits: three for version and three for type id so it's good to have a function for those.

from collections import namedtuple


Literal = namedtuple('Literal', ['version', 'value'])
Operator = namedtuple('Operator', ['version', 'type_id', 'children'])

def read_version_and_type_id(packet):
    version = packet[:3]
    type_id = packet[3:6]
    payload = packet[6:]
    
    return (int(version, 2), int(type_id, 2), payload)

Sometimes we parse until we run into the end of the packet payload (sometimes instead of being exactly empty, it's just zeroes.

A helper function for that makes it easy to sprinkle around where needed.

def valid(transmission):
     return transmission != '' and not all(c == '0' for c in transmission)

Parsing a literal packet

Literal packets are the simpler ones because they don't contain any other packages, just a single numeric value.

We loop over the payload in chunks of 5 to find the number. We return the unused payload along with the actual literal.

def parse_literal_packet(packet):
    """ 
    Takes in a binary string and parses it in chunks of 5
    where first bit is flag bit (0 == last part) and rest 4 is binary value
    """
    version, type_id, payload = read_version_and_type_id(packet)
    finished = False
    binary = ''
    while not finished:
        is_last, number, payload = payload[0] == '0', payload[1:5], payload[5:]
        binary += number
        if is_last:
            finished = True
            
    number = int(binary, 2)
    literal = Literal(value=number, version=version)

    return payload, literal

Parsing sub packets

For sub packets, there are two types: ones with length ID of 11 and one with length ID of 15.

Let's start with the one with 11.

We take the 11 bits for counting how many sub packets it includes and parse packets that many times.

On each round, we collect the sub packets into child_operations and once done, add them to our current Operator's children property.

def parse_sub11_packet(packet, operations):
    version, type_id, payload = read_version_and_type_id(packet)
    length_id, payload = payload[0], payload[1:]
    
    n, payload = int(payload[:11], 2), payload[11:]
    curr = operations[-1]
    
    child_operations = []
    for _ in range(n):
        payload, child_operations = parse_packet(payload, child_operations)
    curr.children.extend(child_operations)

        
    return payload, operations

For sub15, we do a similar thing, but instead of looping over n children, we know the length before hand and can split the payload to "our" part and the rest that we'll return to the caller.

def parse_sub15_packet(packet, operations):
    version, type_id, payload = read_version_and_type_id(packet)
    length_id, payload = payload[0], payload[1:]
    length, payload = int(payload[:15], 2), payload[15:]
    
    payload, rest = payload[:length], payload[length:]
    
    curr = operations[-1]
    child_operations = []
    while valid(payload):
        payload, child_operations = parse_packet(payload, child_operations)
    curr.children.extend(child_operations)
    
    return rest, operations

Parsing an operator packet

We call above functions based on the length ID that we peek into in parse_operator_packet.

We add our new parsed Operator into operations and move forward with parsing the actual packet content.

def parse_operator_packet(packet, operations):
    version, type_id, payload = read_version_and_type_id(packet)
    length_id = payload[0]

    operations.append(Operator(version=version, type_id=type_id, children=[]))
    
    match length_id:
        case '0':
            payload, operations = parse_sub15_packet(packet, operations)
        case '1': 
            payload, operations = parse_sub11_packet(packet, operations)        
  
    return payload, operations

Parsing any packet

And the starting point of each recursive call is parse_packet that takes a full packet and a tree of operations already parsed.

Then based on the type id of the package we either parse a literal or a operator packet.

def parse_packet(packet, operations):
    if not valid(packet):
        return packet, operations

    type_id = int(packet[3:6], 2)

    match type_id:
        case 4:
            payload, literal = parse_literal_packet(packet)
            operations.append(literal)
            return payload, operations
        case _:
            return parse_operator_packet(packet, operations)

Test Suite

First time this year, I had to build a test suite when after 3 attempts of rewrites I was no closer to the end result. With TDD and these tests, supported by examples in the puzzle description, I was able to make it run correctly.

After fixing the code for part 2's star, I had to heavily rewrite these tests as well because everything changed.

import unittest


class PacketTest(unittest.TestCase):
    
    def test_literal(self):
        packet = '110100101111111000101000'
        result = 2021
        rest, literal = parse_literal_packet(packet)
        
        self.assertEqual(literal.value ,result)
        self.assertEqual(rest, '000')
    

    def test_parse_packet_only_literal(self):
        packet = '110100101111111000101000'
        
        packet, operations = parse_packet(packet, [])
        
        self.assertFalse(valid(packet))
        self.assertEqual(operations, [Literal(value=2021, version=6)])


    def test_parse_sub15_literals(self):
        packet = '00111000000000000110111101000101001010010001001000000000'
        
        packet, operations = parse_packet(packet, [])
        
        self.assertFalse(valid(packet))
        self.assertEqual(operations, 
            [
                Operator(type_id=6,version=1, children=[
                    Literal(value=10, version=6), Literal(value=20, version=2)
                ])
            ]
        )
        
    def test_parse_11_multiple_literals(self):
        packet = '11101110000000001101010000001100100000100011000001100000'

        packet, operations = parse_packet(packet, [])
        
        self.assertFalse(valid(packet))
        self.assertEqual(operations,[
            Operator(version=7, type_id=3, children=[            Literal(version=2, value=1),
            Literal(version=4, value=2),
            Literal(version=1, value=3)]),

        ]
        )
        
    def test_parse_one_branch(self):
        packet = convert_hex_to_binary('8A004A801A8002F478')
        
        packet, operations = parse_packet(packet, [])
        
        self.assertEqual(operations, [
            Operator(version=4, type_id=2, children=[
                Operator(version=1, type_id=2, children=[
                    Operator(version=5, type_id=2, children=[
                         Literal(version=6, value=15)
                    ])
                ])
            ])]
        )
            
        
    def test_parse_two_branches(self):
        packet = convert_hex_to_binary('620080001611562C8802118E34')
        
        packet, operations = parse_packet(packet, [])
        
        self.assertEqual(operations, [
            Operator(version=3, type_id=0, children=[
                Operator(version=0, type_id=0, children=[
                    Literal(version=0, value=10), 
                    Literal(version=5, value=11), 
                ]),
                Operator(version=1, type_id=0, children=[
                    Literal(version=0, value=12), 
                    Literal(version=3, value=13)
                ])
            ])
        ])

    def test_valid(self):
        self.assertTrue(valid('1000'))
        self.assertFalse(valid(''))
        self.assertFalse(valid('000'))
        
unittest.main(argv=[''], verbosity=0, exit=False)

----------------------------------------------------------------------
Ran 7 tests in 0.001s

OK

<unittest.main.TestProgram at 0x10b70ed70>

Decode the structure of your hexadecimal-encoded BITS transmission; what do you get if you add up the version numbers in all packets?

I parse the packet, collect all operations and then calculate the sum of their version numbers.

I learned this consume technique by doing 2015's Advent of Code last night and had to do a much simpler version of it.

packet = convert_hex_to_binary(hex_transmission)
# packet = convert_hex_to_binary('620080001611562C8802118E34')

packet, operations = parse_packet(packet, [])

def consume_version(node):
    match node:
        case Literal():
            return node.version
        case Operator():
            return node.version + sum(consume_version(child) for child in node.children)
 
result = consume_version(operations[0])
print(f'Solution: {result}')
assert result == 984

Solution: 984

Part 2

Now that you have the structure of your transmission decoded, you can calculate the value of the expression it represents.

Literal values (type ID 4) represent a single number as described above. The remaining type IDs are more interesting:

  • Packets with type ID 0 are sum packets - their value is the sum of the values of their sub-packets. If they only have a single sub-packet, their value is the value of the sub-packet.
  • Packets with type ID 1 are product packets - their value is the result of multiplying together the values of their sub-packets. If they only have a single sub-packet, their value is the value of the sub-packet.
  • Packets with type ID 2 are minimum packets - their value is the minimum of the values of their sub-packets.
  • Packets with type ID 3 are maximum packets - their value is the maximum of the values of their sub-packets.
  • Packets with type ID 5 are greater than packets - their value is 1 if the value of the first sub-packet is greater than the value of the second sub-packet; otherwise, their value is 0. These packets always have exactly two sub-packets.
  • Packets with type ID 6 are less than packets - their value is 1 if the value of the first sub-packet is less than the value of the second sub-packet; otherwise, their value is 0. These packets always have exactly two sub-packets.
  • Packets with type ID 7 are equal to packets - their value is 1 if the value of the first sub-packet is equal to the value of the second sub-packet; otherwise, their value is 0. These packets always have exactly two sub-packets.

Using these rules, you can now work out the value of the outermost packet in your BITS transmission.

For example:

C200B40A82 finds the sum of 1 and 2, resulting in the value 3.
04005AC33890 finds the product of 6 and 9, resulting in the value 54.
880086C3E88112 finds the minimum of 7, 8, and 9, resulting in the value 7.
CE00C43D881120 finds the maximum of 7, 8, and 9, resulting in the value 9.
D8005AC2A8F0 produces 1, because 5 is less than 15.
F600BC2D8F produces 0, because 5 is not greater than 15.
9C005AC2F8F0 produces 0, because 5 is not equal to 15.
9C0141080250320F1802104A08 produces 1, because 1 + 3 = 2 * 2.

Earlier, different search algorithms have been a bit of struggle for me (especially Day 9's BFS) but today I realized how bad I am with tree structures. I basically had the right solution (operations) for hours but couldn't figure out how to convert them to a tree and traverse it by calculating values.

I even considered running the calculations of almost 300 nodes by hand, just to get a result.

Finally just before midnight, Erno helped me fix the code to achieve solution.

Operator functions

Each operator type maps to a function: 0-3 take in any number of values and 5-7 take exactly two. To keep the interface clean, I made all of them take a list of values.

A dictionary maps types into functions nice and clean.

import math
from collections import deque


def gt(values):
    a, b = values
    return 1 if a > b else 0

def lt(values):
    a, b = values
    return 1 if a < b else 0

def eq(values):
    a, b = values
    return 1 if a == b else 0

OP_CODES = {
    0: sum,
    1: math.prod,
    2: min,
    3: max,
    5: gt,
    6: lt,
    7: eq
}

This part 2 section seems simple but that's because all of the refactoring and fixing happened in the main parsing and tree building logic.

packet = convert_hex_to_binary(hex_transmission)
packet, operations = parse_packet(packet, [])

def consume(node):
    match node:
        case Literal(value=value):
            return value
        case Operator(type_id=type_id, children=children):
            func = OP_CODES[type_id]
            return func(consume(child) for child in children)
            

result = consume(operations[0])
print(f'Solution: {result}')
assert result == 1015320896946

Solution: 1015320896946