Dynamic typing (part 1)

Python is a dynamically-typed language, just like PHP, JavaScript or Ruby, and contrary to statically-typed languages such as C/C++, Java or Scala.

In a dynamically-typed language, the code can never know in advance what will be the type of any variable. You can define a function to perform a particular operation on, say, a collection, but unless you explicitly filter out an argument that is not a collection, the code is never certain that it is indeed one – and the bytecode compiler sure cannot be certain either.

Using dynamic types has consequences for the language – both from a conceptual standpoint and from an implementation standpoint.

Conceptual consequences

From a conceptual perspective, dynamically-typed languages care more about the attributes and methods supported by an object than its actual type. This is the concept of duck typing: if it quacks like a duck then we consider it’s a duck – whether or not it is actually a duck.

An example of duck typing in statically-typed languages would be Java interfaces. When a Java class implements an interface, it needs to implement certain methods. Once it does, a program can use this class without knowing -nor caring about- its actual type. The difference of course is that this type is statically defined, and the Java compiler will not let you compile your code unless your classes implement all the methods required by the interfaces they use.

Duck typing has however its downsides. Namely, it forces to give details about the implementation. Consider the case where we want to create a function “double”:

def double(elt):
    return elt + elt

So far, so good. This function may have been designed with numbers in mind, but it does work with other types such as strings. Heck, it can work with any type that defines an __add__ method. And that is where lies the problem: one needs to have some detail about the underlying implementation of double(). What if we change later on that implementation for something like:

def double(elt):
    return 2 * elt

Will it work for all my use of double()? For types like numbers or strings, sure. But for other types, who knows…

Implementation consequences

From an implementation standpoint, Python needs to be able to dynamically check whether a given object has a specific attribute or function. Even in the case of CPython where the code is being compiled into bytecode on the fly, there is only so much the compiler can do. The bytecode instruction that is involved here is LOAD_ATTR whose job is to determine at runtime whether the object has the desired attribute and push it onto the stack if it exists (as previously shown, a method is just an attribute of type function). Since an object in Python is really a map from strings to values, the logical implementation choice for such a lookup is a dictionary(1).

I do not know if this is a result of using dictionaries for attribute lookup or the cause, but dynamically-typed languages typically allow to add new attributes to existing objects and define new methods to already defined classes – something that can be easily implemented when you are already using a dictionary to map attribute values.

To recap:

  • Because two different instances of the same class can have completely different attributes, they must each have a dictionary containing all their attributes and their values.
  • If two different instances of the same class inherit the same attributes/methods from it, one can nonetheless define new class attributes/methods on the fly. Which means that each class needs to have its own attribute dictionary.

This is a stark contrast with statically-typed languages where a compiler knows exactly what attribute or method needs to be called. In a language such as C++, objects are pretty much an array of attributes. All the instance of the same class have the exact same size. The C++ compiler knows exactly where will the attribute be in memory (e.g. at offset 8 from the beginning of the object). Likewise, it knows exactly what method is being called by every line of code(2).

Dynamic typing thus has an impact on performance as CPython must 1) use a dictionary lookup each time an attribute needs to be accessed and 2) perform that lookup at runtime.

Attribute lookup and object inheritance

Object inheritance is even complexifying attribute lookup. Indeed, the runtime may need to check for an attribute in the class object and any parent class. As a general rule though, Python will return the attribute value from the object dictionary if there is any. If there isn’t, it will return the result from a class attribute lookup following the Method Resolution Order (MRO).

Let’s consider the following code:

>>> class MyClass1(object):
...     attr1 = 1
...     attr2 = 2
...
>>> class MyClass2(MyClass1):
...     attr2 = 3
...
>>> obj = MyClass2()
>>> obj.attr1            # MyClass1.attr1
1
>>> MyClass1.attr2 = 42
>>> MyClass1.attr3 = 43
>>> obj.attr3            # MyClass1.attr3
43
>>> obj.attr2            # MyClass2.attr2
3
>>> obj.attr2 = 44       # obj.attr2

As you can see, you can dynamically add new attributes in MyClass1 (line 12) and it will be automatically be reflected in the object (lines 13, 14). You can this way define a new class method by adding a lambda function as a class attribute. Updating “MyClass1.attr2” doesn’t however affect “obj” (lines 15, 16) as MyClass2 is overwriting it (line 6). However, you can overwrite any class attribute in the object itself (line 17)

You can see the result when looking at the dictionaries of the various objects and classes involved. The object dictionary contains the value that it defined at line 17. MyClass2’s dictionary contains the values for “attr2” it set in its class definition. And MyClass1’s dictionary contains the values for “attr1” (set at line 2), “attr2” and “attr3” (set at lines 11 and 12):

>>> obj.__dict__
{'attr2': 44}
>>> MyClass2.__dict__
mappingproxy({'__module__': '__main__', 'attr2': 3, '__doc__': None})
>>> MyClass1.__dict__
mappingproxy({'__module__': '__main__',
'attr3': 43, 'attr2': 42, 'attr1': 1,
'__dict__': <attribute '__dict__' of 'MyClass1' objects>,
'__weakref__': <attribute '__weakref__' of 'MyClass1' objects>,
'__doc__': None})

Note that the MRO here is pretty simple. It is a bit more complex when using mixins. In any case, a class “__mro__” attribute tells what classes to look at and in what order:

>>> MyClass2.__mro__
(<class '__main__.MyClass2'>, <class '__main__.MyClass1'>, <class 'object'>)

Last but not least, dir(obj) gives all the attributes visible for “obj”. This includes the attributes declared in “obj” as well as those declared in MyClass1 and MyClass2:

>>> dir(obj)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__',
 '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__',
 '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__',
 '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__',
 '__subclasshook__', '__weakref__', 'attr1', 'attr2', 'attr3']


(1) This is not always the case. V8, the JavaScript engine used in Chrome and Node.js, is sometimes using the concept of hidden class and can represent objects as an array of values, avoiding costly dictionary lookups.

(2) A notable exception is virtual functions, where the C++ compiler has no way of knowing what actual method will be called at runtime. But even in this case, the way the C++ runtime gets around it (virtual tables) is pretty efficient and does not involves a dictionary lookup.

Leave a comment