Python does not have variables like in other languages. In a language such as C++ or Java, writing something like:
int a = 42;
means that some memory gets allocated to store an integer value and that a variable gets associated with that memory range. When you change the value of the variable a, the memory gets modified.
In Python, things work differently. Consider the following output:
>>> number1 = 42 >>> number2 = 41 >>> id(number1) 507104304 >>> id(number2) 507104272 >>> number2 += 1 >>> id(number2) 507104304
You declare number1 and number2, and both have a different ids (lines 3 to 6). However, as soon as you increment number2, both have the same id (liner 9-10). How is this possible? This is because in Python, when you initialize a new variable (an integer in this case), what the Python runtime really does is:
- It comes up with an ‘int’ object, either by instantiating the ‘int’ class or by reusing an existing instance
- It adds ‘number1’ to the current namespace to point to that object
As an optimization, CPython pre-allocates some integers deemed small as they are used the most, and will reuse them instead of instanciating duplicates. In Python 3.4, the numbers from -5 to 256 are pre-allocated (see Objects/longobject.c – thanks to xiaobingjiang for the pointer to the location in the source code)
Immutable numbers
A consequence is that numbers in Python are immutable. Which means once they are allocated they do not change value (technically you can but that’s a hack). This is similar to the concept of immutable string in Java where “modifying” a string is really creating a new string and assigning it to the variable used to represent the string.
Pre-allocated numbers
You can see the numbers pre-allocates by looking at the ids of the numbers. The object id is indeed its address in memory:
>>> hex(id(40)) '0x1e39cbf0' >>> hex(id(41)) '0x1e39cc10' >>> hex(id(42)) '0x1e39cc30' >>> hex(id(43)) '0x1e39cc50'
notice how the numbers are all spaced by the same offset (0x20). This is because it is the size of an int object in Python, which contains four fields (see the struct _longobject in Include/longintrepr.h):
- The reference count
- The type
- The array length (always 1 for integers)
- The value
These 4 fields occupy 8 bytes each (I ran the test on Python 64-bit) for a total of 4*8 = 32 = 0x20 bytes.
Looking what an integer looks like under the hood (plus one dirty hack)
The ctypes library allows to map an area of the memory to a particular C type. This allows us to look at the object for integer 42 and verify the value of the fields. We can thus check respectively the reference count, the type, the length and the value:
>>> import ctypes >>> ctypes.c_int.from_address(id(42)).value 10 >>> ctypes.c_int.from_address(id(42) + 8).value 506764368 >>> ctypes.c_int.from_address(id(42) + 16).value 1 >>> ctypes.c_int.from_address(id(42) + 24).value 42
Interestingly enough, we can also use a hack to mess with the value – only do this for testing though
>>> ctypes.c_int.from_address(id(42) + 24).value = 1 >>> 42 1 >>> 1 == 42 # Yup, we've messed things up True >>>
[…] has already been addressed in a previous post: numbers in Python are immutable. You do not change the content of a variable, you make it point to […]
LikeLike
this may be a little error:
As an optimization, CPython pre-allocates the numbers from 0 to 256
should be [-5, 257)
#define NSMALLPOSINTS 257
#define NSMALLNEGINTS 5
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
LikeLike
Indeed, thanks! However, aren’t the numbers only up to 256 (and not 257) pre-allocated? NSMALLPOSINTS should include also zero.
LikeLike
my english is poor, I don’t understand what is your mean.
_PyLong_Init() in this function, you will see how dose python initial it.
LikeLike
What I meant is that the preallocated numbers seem to for the [-5, 256] range. A comment in the source says “The integers that are preallocated are those in the range -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).”
LikeLike