Juha-Matti Santala
Community Builder. Dreamer. Adventurer.

Unit test your Python code in Jupyter Notebooks

In December of 2021, I participated in Advent of Code, an annual Christmas calendar of programming puzzles. Each year I try to learn something new or hone a specific part of my skills. In 2021, I decided to learn how to use Jupyter Notebook and focus on writing and explaining my solutions in addition to coding them.

It turned out quite well, I enjoyed the experience and you can see the results (in static, non-runnable versions) from my Advent of Code 2021 page.

One thing I did not do though, was to write any proper unit tests for my code. I was focusing on solving puzzles (not my strong suite) and learning a new tool so I relied on manual testing by running the code with various inputs by hand. I did that partly because I didn't have time and energy to figure out how to properly do unit testing inside Jupyter Notebooks.

Come summer and I have that time and energy! So let me take you to a journey with three different solutions for how to keep your Jupyter Notebook code tested.

doctest

The first solution is to use doctest, a Python testing suite, that relies on your function documentation for tests. Here's what it looks like in practice:

def sum(a, b):
   """Addition of two numbers.

   >>> sum(0, 0)
   0
   >>> sum(1, 2)
   3
   >>> sum(1, -2)
   -1
   """

   return a + b

Each test starts with >>>, followed by the function call and then the following line has the desired result. It looks the same as if you'd use Python's REPL.

Once you have these functions with doctest compatible docstrings written in your notebook, you can then add a separate cell to run them:

import doctest

doctest.testmod()

I think doctest is a suitable option for simple functions with limited corner cases to test. I like that the tests are documented right along-side with the function definition itself so they are easy to find, it makes it easier to see how the function is intended to use and for simple use cases, it's effortless to write them.

Debug printing in doctest

One downside or annoyance I had with doctest had to do with debug printing. When a test would fail, I would go into the code, add debug print statements to see what was going wrong. But as doctest tests based on the stdout output, a debug print would make every test fail:

import doctest

def sum(a, b):
  """
  >>> sum(1, 2)
  3
  """
  print(a, b)
  return a + b

doctest.testmod()

The above would fail as doctest would compare 3 with 1, 2\n3.

In December 2022, I learned that I can get around this by printing out to stderr instead:

import doctest, sys

def sum(a, b):
  """
  >>> sum(1, 2)
  3
  """
  print(a, b, file=sys.stderr)
  return a + b

doctest.testmod()

Now the test passes and I get the debug prints!

Downsides or challenges

However, it starts to fall apart with more complex functions that have even a bit more arguments or lots of corner cases. It makes the docstring long and hard to read, especially since you don't usually get syntax highlighting for the docstring.

unittest

Second option is to use unittest. It takes a different approach from doctest that you write your tests in separate test classes that are then run. It offers a clean interface and enables you to give your tests proper names so it's easier to find the failing test case.

def count_increases(data):
    """Advent of Code 2021, Day 1, Part A"""
    increase_count = 0
    prev = None

    for measurement in data:
        if prev and measurement > prev:
            increase_count += 1

        prev = measurement
    return increase_count

Let's say we have the above solution from Advent of Code's 2021 Day 1 puzzle and we want to test it. In a separate cell (for example, at the bottom after all the logic code), we can write the tests:

import unittest

class CountIncreaseTestCase(unittest.TestCase):

    def test_empty_has_none(self):
        self.assertEqual(count_increases([]), 0)

    def test_one_measurement_has_none(self):
    	self.assertEqual(count_increases([199]), 0)

    def test_three_ascending_has_two(self):
        self.assertEqual(count_increases([199, 200, 201]), 2)


if __name__ == '__main__':
    unittest.main(argv=[''], verbosity=2, exit=False)

Here we can group our tests based on the functionality we're testing: this class contains tests for the part A where we count increases and each method is a separate test case for a specific corner case.

It's important to note that since we're running this inside Jupyter Notebook, you need to provide argv=[''] and exit=False to unittest.main or otherwise it'll error out.

Using unittest is a great approach but it can make the notebooks harder to read if there's a lot of tests at the end or sprinkled in. They do bring a lot of value in showing the reader/future developer that a) tests exists, b) where they are and c) they can be run by executing the cell.

testbook

Third option is to use testbook that is a library written for this exact use case: testing Jupyter Notebooks. You'll write your functions as usual inside the notebook(s):

def count_increases(data):
    """Advent of Code 2021, Day 1, Part A"""
    increase_count = 0
    prev = None

    for measurement in data:
        if prev and measurement > prev:
            increase_count += 1

        prev = measurement
    return increase_count

and then write the tests in regular Python files and testbook enables you to reference functions inside notebooks:

from testbook import testbook

@testbook('testing.ipynb', execute=['count_increases'])
def test_empty(testbook):
    func = testbook.get('count_increases_testbook')
    assert func([]) == 0
    

In this example, I have my notebook called testing.ipynb and I have tagged the cells (in this case, one cell tagged count_increases) that I want the test to execute before running the test. If you want to run all of your code, you can replace the argument with execute=True.

With testbook.get function, I can access functions from within the notebook and then use whatever test suite I want to test. In this case, I have a assert at the end of the test function and I can run it with pytest with !pytest inside a cell and it will run the terminal command pytest which in turn will run (by default) all the files that match the pattern test_*.py or *_test.py.

Great value in testbook is that I can keep the tests in a separate file and run them with whatever test framework and tooling I want but unfortunately it is a bit cumbersome that you have to jump between interactive notebook and terminal just for the tests. This also introduces a bit of friction to test writing and running, meaning it might become too much of a temptation (or just forgetting) to add tests.

Conclusion

After I spent a while playing around with these three different options writing tests for my last year's Advent of Code code, I'll probably use unittest in the future.

testbook had a lot of potential ideas but in the context of what kind of code I see myself writing within notebooks in future, it's probably not the best option.

doctest has its place but unfortunately the bar for when the test cases become too complex for it is too low and it can make reading the functions too messy for it to be for my taste. If you know a good open source project that is using doctest extensively with complex stuff, I'd love to know so tweet at me!

Few more Jupyter Notebook tricks

If you'd like to learn more about Jupyter Notebook, I recommend reading Juha Kiili's recent blog post Five things to know about Jupyter notebooks.

Syntax Error

Sign up for Syntax Error, a monthly newsletter that helps developers turn a stressful debugging situation into a joyful exploration.