Debugging Python
This blog post is a written form of my presentation Debugging Python (recently given in PyCon Sweden ‘23 and archipylago #1).
What is debugging?
In essence, debugging is the process of figuring out what’s wrong when the computer says no. That could manifest itself as an error (software crashing) or logical error (software doing something in a way it’s not meant to do). The ones that come with an error are usually a bit easier to start with because there’s a starting point for you to do your detective job. That doesn’t mean they are easy to solve though.
I really like this quote:
“Debugging starts with your current body of knowledge and ends by answering the question: What is happening?” - Ryan Zezeski
We all start somewhere: we have some context in our mind about the current codebase, feature, and - if it’s a recently developed feature - issue at hand. We then apply different actions, using tools and techniques, to discover more and pin point what is actually happening.
This blog post (and my usual definition of debugging) does not discuss solving those problems (by writing code) or mitigating them (by writing tests, documentation, code reviews and so on). I focus on things that help developers become better and more efficient in finding what’s wrong.
There are two main parts to my approach: the debugging mindset which includes non-technical things and tools & techniques which includes tooling used on the computer to debug. You need to master both to become efficient. If you’re great with the mindset but don’t know the tools, you’ll likely reach the goal but it will be slow and very manual. If on the other hand, you know the tools well but don’t have a good mindset, you might end up doing unnecessary things.
Debugging mindset
What do you do when you notice something is wrong? Depending on the situation and timing, it may cause a panic response. It’s Friday afternoon and you notice something’s broken in production and the pressure might set in.
Take a break
In my opinion, the most important first step is to stop, take a deep breath and assess the situation. Rushing only leads to more issues in the long run. You should make sure you understand what happened and follow the steps of debugging.
One thing I see way too often experienced developers do (and I’ve been guilty of it myself too) is make guesses. We have a tendency to think we know what’s happening and why and jump right into the code. Then we try to read the code to identify the problem. It doesn’t work because software is complex and mistakes often really difficult to spot. Especially when we think what the outcome should be and it makes us not see the problems.
Step-by-step process
Instead of making guesses and assumptions, it’s best to adopt a process. A step-by-step approach where you double check everything and don’t let yourself jump to conclusions. The process is iterative and you repeat it every time making small progress until you finally reach either a dead end or the solution.
The first, crucial step is to make sure the code you think is running, is actually running. The above meme from @JenMsft’s tweet is so spot on. I can’t emphasise the importance of this enough. There are many reasons this could happen and if you let the tunnel vision take over, it’s easy to miss them.
First, it might be that the code you’re modifying and the software you’re running are not the same. Maybe you’re accidentally running the production or staging site of your web application instead of your development. Been there, done that. Or maybe an older dev server is running on different code and overriding your later dev server. Or the automatic build tool isn’t picking up the changes and updating your software.
In all of these cases, I first make a small change, usually a visual one, to
make sure it gets picked up. Add a 1
to
a heading and reload your software. If things aren’t changing, you’ll avoid
wasting a lot of time figuring out why your fixes aren’t working.
Second, it might be that your assumption of the flow inside the codebase isn’t
correct. Maybe an API end point you think should be called isn’t or a
different function is responsible for it. Printing something like
print("::DEBUG:: Code was run")
helps
you confirm your assumptions quickly. We’ll talk more about printing in just a
bit.
Talk to them ducks
Rubber duck debugging is a wonderful technique for solving problems. You may have heard about it before. It’s partly an inside joke in the industry but it also works.
In rubber duck debugging, you pick your favorite duck and ask it to help you. You explain your situation, what you’ve tried so far and what you think might be the issue - the same way you would ask a colleague. Often, before the duck has time to reply, you’ve figured the issue out yourself.
One reason why this works so well, is that we tend to be very good at skipping details when we think about them. We convince our own brain that we did something or definitely tried that other thing, omitting crucial details. When we talk about them to someone else, we tend to not skip so much because we know they are important and then we notice what we actually missed earlier.
If there’s a shortage of ducks at your location, I’ve found brain dump to be a alternative solution. I take a page or two from my notebook and write down the things I would have told the ducks. What’s the problem, what have I tried, what I think the problem is. I then go for a walk and think about something completely different or take a nap. The subconscious keeps working once we stop the conscious solving part.
Tools & Techniques
Let’s then take a look at the Python specific tools and techniques.
Printing is the best debugging tool
Printing to the console is the thing we usually learn first when we start learning programming or pick up a new language. And it has a lot of uses in building software.
I argue printing is the best debugging tool. It seems to be controversial and every time I talk about it, I get a lot of questions and comments from people who say their teammates or mentors or more senior developers tell them to not use print and make them feel bad for doing so. Some people feel like printing is not “advanced enough” to be used if you’re an experienced developer.
The reason printing is the best tool – despite not being the most powerful or more advanced – is because it has the lowest friction. Everyone knows how to do it, you don’t need to install any new libraries and you don’t need to configure anything. You just throw in a print statement and run your software and then read the output. It takes only a few seconds to add it in and you can do it with pretty much any codebase, no matter how familiar you are with it.
With Python, we use print()
function to
print things:
print('::DEBUG:: This code was run')
Since Python 3.6, we’ve had access to f-strings (with a helpful fstring.help cheatsheet page) that make it easier to add variables and expressions inside prints:
user = 'Juhis'
print(f'{user} logged in')
# Prints "Juhis logged in"
And if you’re doing something like this:
print(f'user = {user}, email = {email}')
you can shortcut it with
print(f'{user=}, {email=}')
Printing is a good way to quickly examine what different variables in our code store.
But it can also become a bit cumbersome and slow when you need to do a lot of exploration as it requires you to run the software again every time you make changes.
Snoop is a toolkit for more
The next step is to use tools from
snoop library. The basic usage of snoop
is to
decorate a function with it and get a line-by-line output of the execution
with all the state changes printed out. From their example in readme:
import snoop
@snoop
def number_to_bits(number):
if number:
bits = []
while number:
number, remainder = divmod(number, 2)
bits.insert(0, remainder)
return bits
else:
return [0]
number_to_bits(6)
when ran, will print
15:42:39.18 >>> Call to number_to_bits in File "/example.py", line 4
15:42:39.18 ...... number = 6
15:42:39.18 4 | def number_to_bits(number):
15:42:39.18 5 | if number:
15:42:39.18 6 | bits = []
15:42:39.18 7 | while number:
15:42:39.18 8 | number, remainder = divmod(number, 2)
15:42:39.18 .................. number = 3
15:42:39.18 .................. remainder = 0
15:42:39.18 9 | bits.insert(0, remainder)
15:42:39.18 .................. bits = [0]
15:42:39.18 .................. len(bits) = 1
15:42:39.18 7 | while number:
15:42:39.18 8 | number, remainder = divmod(number, 2)
15:42:39.18 .................. number = 1
15:42:39.18 .................. remainder = 1
15:42:39.18 9 | bits.insert(0, remainder)
15:42:39.18 .................. bits = [1, 0]
15:42:39.18 .................. len(bits) = 2
15:42:39.18 7 | while number:
...
You can see from the example output how it prints each iteration of the while after each other, showing how the values change. This is handy when you have multiple variables that keep changing as you only need to run it once and you’ll get all the changes for all the variables in scope.
snoop.pp
is a function that can be used
to print values in the middle of expressions. It prints the argument passed in
and also returns it so it can be put in anywhere:
from snoop import pp
x = 1
y = 2
pp(pp(x + 1) + max(*pp(y + 2, y + 3)))
prints
12:34:56.78 LOG:
12:34:56.78 .... x + 1 = 2
12:34:56.78 LOG:
12:34:56.78 .... y + 2 = 4
12:34:56.78 .... y + 3 = 5
12:34:56.78 LOG:
12:34:56.78 .... pp(x + 1) + max(*pp(y + 2, y + 3)) = 7
I find its biggest weakness to be that cleaning them up after you’re done with your debugging session becomes a bit much work, especially if used a lot inside complex expressions.
This works similar to another library called icecream which it is inspired by.
Finally, snoop offers a way to combine snoop with a great debugger called birdseye. I’ll talk more about this at the end of the debugger section below.
Debuggers
The third “level” of debugging tools are debuggers. On the base level, what a debugger does is that it suspends the execution of code at a given point and injects you into it. It then gives you granular control to move around in the code, executing lines and running arbitrary Python code to examine it further.
Compared to printing or snooping, the power of debuggers is that you don’t have to keep making changes and re-running the code every time you want to see something different. You’ll have powerful access to the code with Python’s great REPL interface to try things out and follow the code step by step to find out the issue.
Python community has built many debuggers so everyone can find one that fits their flow the best. I’ll introduce few of my favorites here but there are many many more.
Controlling the debugger with
You can choose which debugger to use by defining it in PYTHONBREAKPOINT environment value.
# Run IPython Debugger (replace `ipdb` with any other debugger)
PYTHONBREAKPOINT=ipdb.set_trace
# Don’t stop on breakpoint()s
PYTHONBREAKPOINT=0
# Run default Python Debugger
PYTHONBREAKPOINT=
To choose a non-default debugger, you can set
PYTHONBREAKPOINT
to the
set_trace
function of your favorite
debugger (you need to install them!).
By setting the value to 0, you can skip all breakpoints which can be handy to set on a production environment or CI to make sure things don’t stop if a breakpoint accidentally makes its way to production code.
Finally, by setting it to empty string (or not setting it at all) will use the default Python Debugger.
Invoking the debugger
To kick off a debugger, you need to add
breakpoint()
to your code in a line
where you want to suspend the execution:
def find_collection(collection_name):
url = f"{BASE_URL}/Collections"
req = requests.get(url)
content = req.content
soup = BeautifulSoup(content, “html.parser")
breakpoint()
collection_tds = soup.css.select(f"#{collection_name}table td")
When the code execution hits this function and then the line 6, it will start the debugger and you’ll be able to examine the state of the program and move around.
The default Python debugger is called
pdb
or The Python Debugger. It looks very similar to the Python REPL but adds a bunch of helpful
commands for operating the debugger. A couple of the most helpful ones are:
-
(n)ext
(the letter in parentheses is a shortcut for the entire command), you can execute current line and move to the next. If there’s a function call on that line, it gets executed completely. -
(s)tep
does the same as next but it enters functions when called. -
(w)here
prints the current stack trace, letting you know exactly where in the execution you are. -
(u)p
and(d)own
move you up and down in the stack. -
(c)ontinue
, you can run the code until the next breakpoint (or end of script).
ipdb - IPython Debugger
If you’re used to running IPython as your REPL, you’ll be right at home with the IPython Debugger. It combines the functionality of the pdb with IPython REPL’s functionality.
PuDB debugger
PuDB is a terminal UI debugger, starting a multi-pane debugging session:
Compared to pdb
and
ipdb
, a big benefit is seeing many
things at once. You can see the code currently being run, the current local
variables, stack and breakpoints. And there’s still space for the REPL.
Terminal UIs can take a bit to get used to if you haven’t used them a lot.
If you’re more used to web interfaces, check out the next debugger,
web-pdb
web-pdb debugger
web-pdb
starts a local web server when invoked with
breakpoint()
and offers a similar
multipane view like PuDB but on your browser:
It gives you similar panes with code and variables and the pdb REPL but also buttons that can be clicked with a mouse. I find it a good alternative to PuDB if you don’t like navigating in terminal.
birdseye
For me, the most impressive of the debuggers is the
birdseye
debugger I mentioned earlier. To debug a function, you decorate it with
birdseye.eye
(or with
snoop.spy
):
from birdseye import eye
@eye
def find_collection(collection_name):
...
You then call your function by running the code and it records the execution.
To view it, you can start the birdseye web server with
python -m birdseye
and navigate to it
with your browser.
What makes birdseye so good is that it gives you very granular view into what’s happening. Not only can you view the value after a line has finished running but you can see the outcome of individual expressions within any line:
Every rectangle in the image above is a expression you can examine by clicking it. You can click multiple ones to keep watching them.
And for loops, you can adjust the iteration counter on the left to see what the values were on each iteration separately. I love it.
Birdseye balances nicely the amount of information it records (which is a lot) with how much you see (which you control).
It has a slightly different approach than other debuggers so it may not be the best solution for every situation and I often find myself combining it with other debuggers (and printing) depending on the type and difficulty of bugs.
Quick-fire Django tooling round
A couple tools used for debugging Django apps specifically, without going into them in-depth.
Debugger in templates
To trigger a debugger from Django templates, add this to your custom filters:
@register.filter
def pdb(element):
breakpoint()
return element
## And use it in templates
{{ msg | pdb }}
Django Debug Toolbar
Django Debug Toolbar is a library that adds a debugger sidebar to the frontend. It lets you examine requests made, SQL queries ran and other handy information.
Kolo
Kolo is a new VS Code extension that offers much of the same information than Django Debug Toolbar but inside your VS Code. If you’re using that to edit your code, it’s handy to have the information directly in your code editor. Kolo is still currently in beta but I think it’s worth checking out.
As it’s a development-only tooling, I see way less issues in already picking it up as part of your workflow compared to something that would get shipped as part of the production app.
Learn more from Syntax Error
Syntax Error is a debugging newsletter I write to help developers turn stressful debugging situations into joyful explorations. It’s not Python specific but instead I try to write general tips for all developers and explore how different technologies and languages approach debugging so we can all learn from each other’s work.
You can subscribe to it via email or RSS or just read it on the website. Whatever you find easiest and most comfortable.
Comments
Loading comments...
Continue discussion in Mastodon »