PL Researcher. Evil Feminist. Ocean Monster.

MyPy Tutorial, Part 1

26.02.2016

I've had to write a lot of Python code in the past few months, and I found myself missing the comfort of static types. Enter MyPy, a standalone typechecker which uses a bunch of the new features in Python 3 to make the language more friendly to us type-lovers.

Let's jump right in!

You will need:

This tutorial is aimed at people with some programming experience but little to no experience with static typing, including but not limited to: first- and second-year college students, self-taught/hobbyist programmers (welcome, GirlDevelopIt members!), and students in online programs like Udacity.

Installing MyPy

MyPy might be available through your Linux distribution's package manager. If it is, I recommend grabbing it from there (it's what I do). If not, you'll need to download it yourself, as follows.

If you have some experience using Python libraries, you might be used to getting them directly from PYPI; unfortunately, as of this writing, the version of MyPy on PYPI isn't compatible with Python 3.5. To get around this, we'll install directly from the MyPy git repository:

$ pip3 install git+git://github.com/python/mypy.git

Getting Started

First, let's make sure your tools are set up correctly. Try saving this code in a file named "hello1.py" and running it with python3.5 hello1.py:

hello1.py
def get_message() -> str:
return input('Type something:')

s = get_message()
print(s)

It should prompt you to enter some text, print what you typed, and then exit.

The only part of this code that's different from "normal" Python is the type annotation on the get_message function. Where def get_message() means "get_message is a function", def get_message() -> str means "get_message is a function that returns a string".

The type annotation serves two purposes: to make it easier for other humans to read your code, and to enable MyPy to detect errors in your code before you run it. (Actually, MyPy could typecheck this without an annotation, but we'll get to that later.)

To see MyPy in action, try typechecking the code with mypy hello1.py. Like many Linux programs, MyPy will print nothing when it runs successfully; since there aren't any mismatched types in this code, you shouldn't see any output after running the command.

Congratulations, you've just typechecked a Python program for the first time!

Making Mistakes

Okay, so that was a little bit anticlimactic. You wrote those extra seven characters to give the get_message function a type, and nothing even happened! Let's see where MyPy really shines: when your code isn't right.

Make a new copy of your file. Now edit it to claim that get_message returns an integer, by changing str to int on the first line. Your code should look like this now:

hello2.py
def get_message() -> int:
return input('Type something:')

s = get_message()
print(s)

Before running it, try checking it with MyPy. You should get an error message indicating the line number and saying something like "Incompatible return value: expected builtins.int, got builtins.str".

What just happened? We told MyPy that get_message returns an integer, but MyPy noticed that the return statement in get_message returns the result of a call to the "input" function, which always returns a string. (Since the input function is built in to Python, MyPy can look up its type on its own, and we don't need to write its type in our own code.)

Here's where it gets really interesting. Go ahead and run this code with Python. You'll notice that nothing bad happens; in fact, this program acts exactly like the program from before. What's going on?

Without wandering too deep into the dark forest of compiler theory, we can say that this happens because the typechecking stage and the execution stage occur separately. Typically this feature is a hallmark of compiled languages like C++ and Java, whereas interpreted languages like Python and Ruby do typechecking while the program is running (aptly named "run-time typechecking"), but there's no rule preventing us from typechecking a program in any language before running it!

This is really important to understand: MyPy annotations don't affect your program while it's running. They only help you catch more bugs before runnning your code.

A Slightly Better Example

That was awesome, but at this point it all probably feels contrived, and you might be wondering why this is important. Or maybe you're convinced, but still a little uncertain about how to use MyPy in a more realistic scenario.

Imagine that you wanted to get a user's favorite number. You might write code that looks something like this:

def get_favorite_number():
return input('Enter your favorite number: ')

num = get_favorite_number()
print('Your favorite number is', num)

You can save and run this code, and it will work. But even though it doesn't crash, there's still a problem with it: get_favorite_number doesn't actually return a number! This is what we call a semantic problem: it isn't "wrong" in a technical sense, but it represents a mismatch between your idea of what the code means and what the code actually means.

This distinction can be difficult to understand at first, so don't worry if it doesn't make sense yet. Just know that it makes more sense for a function that returns a number to, well, return an actual number. This will become apparent when we try to actually use the number in a "number-y" way:

def get_favorite_number():
return input('Enter your favorite number: ')

num = get_favorite_number()
print('Twice your favorite number is', 2*num)

What happens when we run this code?

Think about it and try to guess what will happen, and then go run the code and test your prediction.

You'll notice that this code has some... interesting behavior. Specifically, if you tell it that your favorite number is 5, it will tell you that twice your favorite number is 55. Which is technically true, as far as Python is concerned, but it's probably not what you meant.

This kind of bug--where your program doesn't crash, but does something wrong and continues running--is what static typecheckers like MyPy excel at finding.

"Okay," you say, "that's nice, but I would totally notice this as soon as I ran the program!". Yeah, you would, but what if it were part of a web form? You might not notice it until you realized that all the numbers in your database were numbers like 3131, 77, 101101, and 00. "00" would probably be the biggest red flag, since programming languages usually don't display leading zeros unless you explicitly ask for them. By this point, you might have gone days or weeks collecting wrong data.

And yes, you could go through and fix this manually, but the point is to avoid problems like this in the first place. For example, you would have some very angry customers on your hands if you entered the price of a shirt as "20" and your website tried to charge them $2,020 to buy two shirts.

Let's start fixing our code. First, let's formalize the idea that get_favorite_number should return an integer, by giving it a type annotation. (From now on, I'm only going to reproduce the part of the code that changes.)

def get_favorite_number() -> int:
return input('Enter your favorite number: ')

Try running MyPy on this code. It should display the same error as it did with "hello2.py": expected an integer, got a string.

We have two choices: we can change the return type to str, or we can change the function body so it actually returns an int. We're going to do the latter, since it matches up with our goal.

In Python, you can convert a string to an integer by wrapping it in the int constructor, so let's do that:

def get_favorite_number() -> int:
return int(input('Enter your favorite number: '))

Now, if we tell it our favorite number is 5, it actually prints 10. And the way our code is written matches up with our intuitive understanding of what it's supposed to do, which will make it much easier for other programmers (including our future selves) to read.

Success!

Digging Deeper

We'll continue the saga of static typing in Python next time with Part 2 of this tutorial, in which we write our own class for array-backed lists and implement sorting with a custom predicate. Stay tuned, even if that sounds confusing! Especially if it sounds confusing.

I'll also have something to say at some point about the unique properties of the None type, and how to take advantage of it when developing larger projects with MyPy.

In the meanwhile, play around with MyPy on your own! Adding annotations to some previous code you've written is a great way to build familiarity with the tool. (Just remember to make a copy of your files or check them into a git repository first!)