Variables and integers

Python does not have variables like in other languages. In a language such as C++ or Java, writing something like:

int a = 42;

means that some memory gets allocated to store an integer value and that a variable gets associated with that memory range. When you change the value of the variable a, the memory gets modified.

In Python, things work differently.  Consider the following output:

>>> number1 = 42
>>> number2 = 41
>>> id(number1)
507104304
>>> id(number2)
507104272
>>> number2 += 1
>>> id(number2)
507104304

You declare number1 and number2, and both have a different ids (lines 3 to 6). However, as soon as you increment number2, both have the same id (liner 9-10). How is this possible? This is because in Python, when you initialize a new variable (an integer in this case), what the Python runtime really does is:

  • It comes up with an ‘int’ object, either by instantiating the ‘int’ class or by reusing an existing instance
  • It adds ‘number1’ to the current namespace to point to that object

As an optimization, CPython pre-allocates some integers deemed small as they are used the most, and will reuse them instead of instanciating duplicates. In Python 3.4, the numbers from -5 to 256 are pre-allocated (see Objects/longobject.c  – thanks to xiaobingjiang for the pointer to the location in the source code)

Immutable numbers

A consequence is that numbers in Python are immutable. Which means once they are allocated they do not change value (technically you can but that’s a hack). This is similar to the concept of immutable string in Java where “modifying” a string is really creating a new string and assigning it to the variable used to represent the string.

Pre-allocated numbers

You can see the numbers pre-allocates by looking at the ids of the numbers. The object id is indeed its address in memory:

>>> hex(id(40))
'0x1e39cbf0'
>>> hex(id(41))
'0x1e39cc10'
>>> hex(id(42))
'0x1e39cc30'
>>> hex(id(43))
'0x1e39cc50'

notice how the numbers are all spaced by the same offset (0x20). This is because it is the size of an int object in Python, which contains four fields (see the struct _longobject in Include/longintrepr.h):

  • The reference count
  • The type
  • The array length (always 1 for integers)
  • The value

These 4 fields occupy 8 bytes each (I ran the test on Python 64-bit) for a total of 4*8 = 32 = 0x20 bytes.

Looking what an integer looks like under the hood (plus one dirty hack)

The ctypes library allows to map an area of the memory to a particular C type. This allows us to look at the object for integer 42 and verify the value of the fields. We can thus check respectively the reference count, the type, the length and the value:

>>> import ctypes
>>> ctypes.c_int.from_address(id(42)).value
10
>>> ctypes.c_int.from_address(id(42) + 8).value
506764368
>>> ctypes.c_int.from_address(id(42) + 16).value
1
>>> ctypes.c_int.from_address(id(42) + 24).value
42

Interestingly enough, we can also use a hack to mess with the value – only do this for testing though

>>> ctypes.c_int.from_address(id(42) + 24).value = 1
>>> 42
1
>>> 1 == 42 # Yup, we've messed things up
True
>>>

 

5 thoughts on “Variables and integers

  1. this may be a little error:

    As an optimization, CPython pre-allocates the numbers from 0 to 256

    should be [-5, 257)

    #define NSMALLPOSINTS 257
    #define NSMALLNEGINTS 5

    static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

    Like

    • Indeed, thanks! However, aren’t the numbers only up to 256 (and not 257) pre-allocated? NSMALLPOSINTS should include also zero.

      Like

      • What I meant is that the preallocated numbers seem to for the [-5, 256] range. A comment in the source says “The integers that are preallocated are those in the range -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).”

        Like

Leave a comment