Breaking News

How is Memory Managed in Python

 

Memory Management

Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the Python memory manager. The Python memory manager has different components that deal with various dynamic storage management aspects, such as sharing, segmentation, preallocation, or caching.

At the lowest level, a raw memory allocator ensures that there is enough room in the private heap for storing all Python-related data by interacting with the memory manager of the operating system. On top of the raw memory allocator, several object- specific allocators operate on the same heap and implement distinct memory management policies adapted to the peculiarities of every object type. For example, integer objects are managed within the heap different from strings, tuples, or dictionaries because integers imply different storage requirements and speed/space tradeoffs. The Python memory manager thus delegates some of the work to the object-specific allocators, but ensures that the latter operate within the bounds of the private heap.

It is important to understand that the management of the Python heap is performed by the interpreter itself and that the user has no control over it, even if she regularly manipulates object pointers to memory blocks inside that heap. The allocation of heap space for Python objects and other internal buffers is performed on demand by the Python memory manager through the Python/C API functions listed in this document.

To avoid memory corruption, extension writers should never try to operate on Python objects with the functions exported by the C library: malloc(), calloc(), realloc(), and free(). This will result in mixed calls between the C allocator and the Python memory manager with fatal consequences because they implement different algorithms and operate on different heaps. However, one can safely allocate and release memory blocks with the C library allocator for individual purposes, as shown in the following example:

PyObject *res;

char *buf = (char *) malloc(BUFSIZ); /* for I/O */

if (buf == NULL)

    return PyErr_NoMemory();

...Do some I/O operation involving buf...

res = PyString_FromString(buf);

free(buf); /* malloc'ed */

return res;


In this example, the memory request for the I/O buffer is handled by the C library allocator. The Python memory manager is involved only in the allocation of the string object returned as a result.

In most situations, however, it is recommended to allocate memory from the Python heap specifically because the latter is under control of the Python memory manager. For example, this is required when the interpreter is extended with new object types written in C. Another reason for using the Python heap is the desire to inform the Python memory manager about the memory needs of the extension module. Even when the requested memory is used exclusively for internal, highly-specific purposes, delegating all memory requests to the Python memory manager causes the interpreter to have a more accurate image of its memory footprint as a whole. Consequently, under certain circumstances, the Python memory manager might or might not trigger appropriate actions, such as garbage collection, memory compaction, or other preventive procedures. Note that by using the C library allocator as shown in the previous example, the allocated memory for the I/O buffer escapes completely the Python memory manager.

Memory Interface

The following function sets, modeled after the ANSI C standard, are available for allocating and releasing memory from the Python heap:

  • ANY*—. Used to represent arbitrary blocks of memory. Values of this type should be cast to the specific type that is needed.

  • ANY* PyMem_Malloc(size_t n)—. Allocates n bytes and returns a pointer of type ANY* to the allocated memory, or NULL if the request fails. Requesting zero bytes returns a non-NULL pointer.

  • ANY* PyMem_Realloc(ANY *p, size_t n)—. Resizes the memory block pointed to by p to n bytes. The contents will be unchanged to the minimum of the old and the new sizes. If p is NULL, the call is equivalent to PyMem_Malloc(n); if n is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-NULL. Unless p is NULL, it must have been returned by a previous call to PyMem_Malloc() or PyMem_Realloc().

  • void PyMem_Free(ANY *p)—. Frees the memory block pointed to by p, which must have been returned by a previous call to PyMem_Malloc() or PyMem_Realloc(). Otherwise, or if PyMem_Free(p) has been called before, undefined behavior occurs. If p is NULL, no operation is performed.

  • ANY* Py_Malloc(size_t n)—. Same as PyMem_Malloc(), but calls PyErr_NoMemory() on failure.

  • ANY* Py_Realloc(ANY *p, size_t n)—. Same as PyMem_Realloc(), but calls PyErr_NoMemory() on failure.

  • void Py_Free(ANY *p)—. Same as PyMem_Free().

The following type-oriented macros are provided for convenience. Note that TYPE refers to any C type.

  • TYPE* PyMem_NEW(TYPE, size_t n)—. Same as PyMem_Malloc(), but allocates (n * sizeof(TYPE)) bytes of memory. Returns a pointer cast to TYPE*.

  • TYPE* PyMem_RESIZE(ANY *p, TYPE, size_t n)—. Same as PyMem_Realloc(), but the memory block is resized to (n * sizeof(TYPE)) bytes. Returns a pointer cast to TYPE*.

  • void PyMem_DEL(ANY *p)—. Same as PyMem_Free().

Examples

Here is one example from the previous section, rewritten so that the I/O buffer is allocated from the Python heap by using the first function set:

PyObject *res;

char *buf = (char *) PyMem_Malloc(BUFSIZ); /* for I/O */

if (buf == NULL)

    return PyErr_NoMemory();

/* ...Do some I/O operation involving buf... */

res = PyString_FromString(buf);

PyMem_Free(buf); /* allocated with PyMem_Malloc */

return res;


With the second function set, the need to call PyErr_NoMemory() is obviated:

PyObject *res;

char *buf = (char *) Py_Malloc(BUFSIZ); /* for I/O */

if (buf == NULL)

    return NULL;

/* ...Do some I/O operation involving buf... */

res = PyString_FromString(buf);

Py_Free(buf); /* allocated with Py_Malloc */

return res;


Here's the same code using the macro set:

PyObject *res;

char *buf = PyMem_NEW(char, BUFSIZ); /* for I/O */

if (buf == NULL)

    return PyErr_NoMemory();

/* ...Do some I/O operation involving buf... */

res = PyString_FromString(buf);

PyMem_DEL(buf); /* allocated with PyMem_NEW */

return res;


Note that in the three previous examples, the buffer is always manipulated via functions/macros belonging to the same set. Indeed, it is required to use the same memory API family for a given memory block so that the risk of mixing different allocators is reduced to a minimum. The following code sequence contains two errors, one of which is labeled as fatal because it mixes two different allocators operating on different heaps.

char *buf1 = PyMem_NEW(char, BUFSIZ);

char *buf2 = (char *) malloc(BUFSIZ);

char *buf3 = (char *) PyMem_Malloc(BUFSIZ);

...

PyMem_DEL(buf3);  /* Wrong -- should be PyMem_Free() */

free(buf2);       /* Right -- allocated via malloc() */

free(buf1);       /* Fatal -- should be PyMem_DEL()  */


In addition to the functions aimed at handling raw memory blocks from the Python heap, objects in Python are allocated and released with _PyObject_New() and _PyObject_NewVar(), or with their corresponding macros PyObject_NEW() and PyObject_NEW_VAR().



No comments