Python pitfalls: comparing objects of different types

Static vs dynamic typing is a constant debate in the programming languages community. There is no right or wrong answer to what is best, it strongly depends on your application and goals.

Dynamically typed languages (like Python), are more flexible, make programming easy and fast. That’s why they are incredibly popular for implementing software with changing or unknown requirements, like, you know, data analysis software I write.

Sometimes though, dynamical typing turns programming in Python into walking through a minefield and brings me memories of John Hughes’s lectures on Haskell at Chalmers.

CCC #17: Typing

So, let’s say we are comparing numbers. Type this in your Python 2.x interpreter. What will it return?

>>> 5 < 8

You guessed right. It will return True . No surprises here. Same if you type

>>> "a"< "b"

This will also be True.

So, what do you think Python interpreter give you if you type this:

>>> 5 < "b"

Well, you are unlikely to ever type it on purpose. But your large data table might have one numerical column and one of type string.
If for some reason your code will end up comparing this two objects to each other, you are going to regret it.

Well, won’t Python throw me an error? Or at least a warning? After all, what sensible answer can one get by comparing strings to numbers? Well, apparently, someone decided otherwise.

From the Python 2.7 tutorial:

Note that comparing objects of different types is legal. The outcome is deterministic but arbitrary: the types are ordered by their name. Thus, a list is always smaller than a string, a string is always smaller than a tuple, etc. [1] Mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc.

Footnotes

 [1] The rules for comparing objects of different types should not be relied upon; they may change in a future version of the language.

 

My first thought when I read this was: WHAT THE… Guido van Rossum.

Luckily, I’m not the first to arrive at this thought. So in Python 3.x this weird behaviour has been changed so that if you attempt to order an integer and a string, an error will be raised:

>>> 5 < '8'
Traceback (most recent call last):
  File "", line 1, in 
    5 < '8'
TypeError: unorderable types: int() < str()

But it is still possible to check for equality of objects of different types. Comparisons like this will always return false, but are absolutely legal.

>>> print (8=='10');
False

Dynamical typing does give you certain flexibility, but it also gives you responsibility to follow what’s going on with your objects. This is incredibly important in data analysis, so make sure to check data integrity in between the steps of your data pipeline.

2 Comments

  1. Hm, but C++ is statically typed right? But you can still get some pretty crazy implicit conversions when attempting to do things similar to what you desicribe in the post.

    Is the difference that python actually changes the type of an object first and then acts while C++ only uses some pre-difened rule for every given operation and type-pair you throw at it?

    1. C++ is statically typed, but it is weak http://en.wikipedia.org/wiki/Strong_and_weak_typing

      I tried to do similar comparison in C++ over here: https://ideone.com/aPKwEm and it throws an error and a warning 🙂 But I guess if you do some numerical operations, things can still go wrong under the hood.

      Python 2.x just has a defined rule that if one compares objects of different types it follows an arbitrary order (have a look at the quote from the tutorial)

Leave a Comment

Your email address will not be published. Required fields are marked *