A deep dive into short circuit evaluation

One consequence of not having ever learned programming in any systematic way is that sometimes I come across a very unexpected behaviour that I don’t really have the language to describe (or more problematically, to Google). Recently when this happened I was led down a very interesting rabbit hole to the idea of short circuit evaluation. Importantly, I learned that my mental model for how logical operators work was not quite right.

I’m going to explain here what short circuiting is, how it works, and when it can be useful. I use examples in both Python and R, but even if you’re not familiar with both languages I encourage you to read the whole post. The differences in how something as seemingly basic as Boolean operations are handled in different languages are both surprising and instructive.

A quick refresher on Boolean logical operators

The logical operators you’ll most often deal with in Python are and and or. Python also has the bitwise operators & and |, though I won’t be discussing those here.

The and and or operators behave in such an obvious way that most of the time you don’t even need to think about what they’re really doing. Here are some examples:

x = 5
print(x < 2)

## False

print(x == 5)

## True

print(x < 2 and x == 5)

## False

print(x == 5 or x < 5)

## True

As is so often the case, in R things are rather different. R has two sets of logical operators you’ll commonly encounter: & and | on the one hand and && and || on the other. The & and && operators are equivalent (mostly) to Python’s and operator whereas | and || are like Python’s or operator.

x <- 5
x < 2 | x == 5

## [1] TRUE

x < 2 || x == 5

## [1] TRUE

You can see that both | and || behave similarly to or in Python is this example. This won’t always be the case though!

The most important difference between R’s two sets of logical operators is that & and | are applied element-wise to vectors but && and || aren’t.

x <- c(1, 6, 5, 2)
x < 2 | x == 5

## [1]  TRUE FALSE  TRUE FALSE

The above expression returns a vector of the same length as the vector x, as x < 2 | x == 5 is evaluated for each element in x. If we use || instead of |, we see that only a single value is returned:

x <- c(1, 6, 5, 2)
x < 2 || x == 5

## [1] TRUE

In this case, x < 2 || x == 5 is evaluated for only the first element of vector x. This is a bit dangerous and probably not what you want.

Where logical operators begin to get complicated

Applying multiple conditions using and or or can be really helpful when one of the conditions shouldn’t always be tested. Here’s an example in Python:

x = 5
print(isinstance(x, int) and x > 4)

## True

In this case, it doesn’t make sense to test if x > 4 unless we already know x is an int. Testing if x > 4 when x is the wrong type will generate a TypeError:

x = "5"
print(x > 4)

## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: '>' not supported between instances of 'str' and 'int'
## 
## Detailed traceback: 
##   File "<string>", line 1, in <module>

But something magical happens when we include the same expression as the second condition in this expression:

x = "5"
print(isinstance(x, int) and x > 4)

## False

If the first condition in isinstance(x, int) and x > 4 is False and the second condition generates an error, shouldn’t we get an error? Note that when we reverse the order of the two conditions, we do get an error:

x = "5"
print(x > 4 and isinstance(x, int))

## Error in py_call_impl(callable, dots$args, dots$keywords): TypeError: '>' not supported between instances of 'str' and 'int'
## 
## Detailed traceback: 
##   File "<string>", line 1, in <module>

This gives us a hint as to what Python is doing. Basically, Python doesn’t evaluate the second condition in the expression unless it has to. This means that for the expression P and Q, Q will only be evaluated if P is true. This makes a lot of sense, since if P is false, P and Q must be false for any Q.

Similarly, for the expression P or Q, if P is true we don’t even need to look at Q. A truth table explains why:

P	Q	P and Q	P or Q
True	True	True	True
True	False	False	True
False	True	False	True
False	False	False	False

Since there are cases when we know whether the expression is true or false without even looking at Q, it makes sense to not evaluate Q at all in those cases.

Short circuit evaluation in Python

From how I’ve described this so far, it might seem like Python will automatically return True for a expression like P or Q when P is true. This isn’t exactly right, however. Python will actually return Q, whatever Q is!

The examples I’ve shown so far suggest that Q is always something that can be evaluated to either True or False, but this doesn’t have to be the case.

x = 5
print(x < 2 or "hello")

## hello

In this case the first condition is false, so whether the whole condition is true depends entirely on the second condition. Python can save itself some work by simply returning this part of the expression as it’s irrelevant what it evaluates to. Usually this will be something that evaluates as True or False, but in the example above it’s actually a string.

I found this very surprising when I first came across it, since it seems like Boolean logical operations should only return Boolean values.

What’s happening here is basically equivalent to:

x = 5
if x < 2:
    print(True)
else:
    print("hello")

## hello

This also works for class methods:

x = "hello"
print(isinstance(x, str) and x.upper())

## HELLO

And for functions:

my_list = [1, 2, 3, 4]
print(len(my_list) > 2 and sum(my_list))

## 10

Including lambda functions:

my_list = [1, 2, 3, 4]
print(len(my_list) > 2 and (lambda a: a[2])(my_list))

## 3

For an expression with an or where the first condition is true, we can even sneak in the name of a variable that doesn’t actually exist. Python won’t object because it doesn’t bother to check:

x = 5
print(x > 2 or madeup_variable_name)

## True

Short circuit evaluation in R

So far I’ve mostly focused on Python, but short circuiting also works in R.

I already discussed how a key difference between & and && (and between | and ||) is that only & and | are applied to each element in a vector. Another important difference between the two types of operators is with regard to short circuiting. At first it seems like short circuiting works with &:

x <- "5"
is.numeric(x) & x > 4

## [1] FALSE

We don’t get an error message, but something is wrong. We see this clearly when we change the expression a little:

x <- "5"
is.numeric(x) & x > 40

## [1] FALSE

x > 40

## [1] TRUE

While Python will refuse to compare a string to a number in this way, R instead quietly converts the number to a string and then compare the two strings. However, comparison between strings and numbers is different because numbers and characters are ordered differently; while the number 4 is less than the number 5, the string "4" is greater than the string "5".

We can, however, make R throw an error if we try to make it add a number to a string:

x <- "5"
is.numeric(x) & x + 1 == 6

## Error in x + 1: non-numeric argument to binary operator

This reveals that R is indeed evaluating both parts of the expression, even though since the first part is false the whole expression must also be false. This tells us that short circuiting does not work with the & and | operators.

Fortunately, short circuiting does work with && and ||.

x <- "5"
is.numeric(x) && x + 1 == 6

## [1] FALSE

Here we don’t get an error because only the first part of the expression is evaluated. Short circuiting in R is not quite a flexible as in Python though. This is what happens when we try to use a function in the second part of the expression:

x <-  5
x < 2 || print("hello")

## [1] "hello"

## Error in x < 2 || print("hello"): invalid 'y' type in 'x || y'

Note that "hello" is printed, but then we get an error. Since the left side of the expression is false, the right side is evaluated and "hello" prints. However, R is unhappy that print("hello") is not something that can be true or false. For P or Q, when P is false Python will just return Q, whatever it is. In contrast, R balks at returning something that isn’t a Boolean and can’t be coerced to one.

Of course, it may surprise you what R can turn into a Boolean value:

x <-  5
x < 2 || x + 1

## [1] TRUE

As x < 2 is false, x + 1 must be evaluated as true for TRUE to be returned. As it turns out, x + 1 is indeed true, as in this situation R coerces numeric values to logical values. Non-zero values become TRUE and 0 becomes FALSE. This kind of thing makes me very hesitant about using short circuiting in R beyond the most simple cases.

Using short circuit evaluation in vectors and lists

What if we want to use short circuiting on a vector or list? Remember that in R only && and || short circuit, and these operators don’t work for vectors as they only look at the first element.

x <- list(5, "a", 3, "7")
is.numeric(x) && x > 4

## [1] FALSE

In this case, you might expect TRUE to be returned since the first element of the vector is numeric and is also greater than 4. However, is.numeric(x) is false because not every element in the vector is numeric, so the overall expression is also false. If you want to use short circuiting with a vector in R, one approach is to use *apply and an anonymous function.

sapply(x, function(y) is.numeric(y) && y > 4)

## [1]  TRUE FALSE FALSE FALSE

You can do something similar in Python using map:

x = [5, "a", 3, "7"]
print(list(map(lambda y: isinstance(y, int) and y > 4, x)))

## [True, False, False, False]

When should you use short circuit evaluation?

After all this discussion of how short circuiting works, it might surprise you that my advice is to use it sparingly. While something like print(x < 2 or "hello") does work, it’s less readable than the equivalent if/else statement. In general it’s better to signal clearly the intention of the code by actually using if/else. Where there’s little or no difference in speed, the better choice is the more readable code.

One place where understanding short circuiting can be very helpful is when it comes to ordering multiple conditions in a logical expression. For example, we saw that in Python x > 4 and isinstance(x, int) will throw a type error when x is not an int whereas isinstance(x, int) and x > 4 won’t. Understanding how Python and R evaluate logical expressions will save you from running into problems like this.

This deep dive into short circuiting has also helped me improve my mental model of how logical operations work in both Python and R. Logical operators might behave precisely in line with your intuitions almost all the time, but when they don’t it can be very confusing. Understanding what’s happening “under the hood” of your programming language of choice can make you a better programmer.

Resources

The answers in this StackOverflow post not only explain the different logical operators in R very clearly, but also highlight the kinds of situations where short circuiting is useful.
This Rosetta Code page on short circuit evaluation has lots of examples from many different programming languages.