@node Memory allocation @chapter Memory allocation @cpindex Memory allocation @cpindex Allocate memory The ability to allocate memory of on at compile-time unknown amount is an important feature for most programs. @cpindex Virtual address space @cpindex Virtual memory-address @cpindex Memory, virtual address space @cpindex @sc{RAM} @cpindex Swap space On modern operating systems, processes do not have direct access to the memory. Only the operating system kernel does. Instead, the process have virtual memory-addresses, that the kernel maps to either, real @sc{RAM}, swap space (disc backed), file segments, or zeroes. @cpindex Forks @cpindex Exec:s @cpindex Process image @cpindex Memory sharing, private memory @cpindex Private memory sharing @cpindex Memory deduplication @cpindex Deduplication memory Memory for a process is either allocated programmatically, or when the process forks (is created) or exec:s (changes process image.) The operating system kernel is typically configured to share a much memory between processes as possible. For example, and a process forks, they will share their memory rather than duplicate the memory, and the kernel will only remap the memory when the processs' memory content diverges. It is also possible to allocate memory in such a way that processes can share it. so that updates from one process influences the other processes. @cpindex Virtual address space segments @cpindex Memory segments @cpindex Segments, virtual address space A process' virtual address space is divided into segments. There are three important segments. @table @i @item text segment @cpindex Segment, text @cpindex Text segment @cpindex @code{.text} @cpindex Instructions @cpindex Static constants @cpindex Constants, static @cpindex Literals @cpindex Strings, literals When a process exec:s this segment is allocated. It contains instructions, static constants, and literals. @item BSS segment @cpindex Segment, @sc{BSS} @cpindex @sc{BSS} segment @cpindex @code{.bss} @cpindex Block Started by Symbol @cpindex Uninitialised variables @cpindex Zero variables @cpindex Global variables @cpindex Static variables When a process exec:s this segment is allocated. It contains all global and static variables that are initialised to zero or lacks explicit initialisation. On some systems this segment is merged into the data segment. @sc{BSS} is an acronym: `Block Started by Symbol'. @item data segment @cpindex Segment, data @cpindex Data segment @cpindex @code{.data} @cpindex Heap @cpindex Memory, heap @cpindex Global variables @cpindex Static variables When a process exec:s this segment is allocated. It is filled all global and static variables that are not covered by the @sc{BSS} segment. This segment's lower end is fixed, and its upper end is unfixed; it can be resized. Any part of the segment that is a result of it being resized, is referred to as the heap. @item stack segment @cpindex Segment, stack @cpindex Stack segment @cpindex Memory, stack @cpindex Call-stack @cpindex Automatic variables This segment contains a@footnote{Program stack's can created programmtically, hence `a' rather than `the'.} program stack. A program stack contains information required to return after a function call, and automatic variables. Depending on the platform and the function it may also contain arguments passed in function calls and return values. It grows as the stack grows, but it does not shrink as the stack shrinks. @end table The layout of the segments is machine dependent and operating system. However, a common@footnote{Keeping in mind this cannot be assumed practice, and that it is in fact different between systems.} layout is have the text segment (as other fixed segments) start at 0, and end at some positive address. The text segment is then followed by the @sc{BSS} segment, and the data segment. The stack segment however, grows from the highest address (@math{-1} on two's complement-machines) downwards. The process cannot allocate any more memory when the allocation would require the data segment to be grown to such an extent that it would overlap with the stack segment. In C there are multiple ways to allocate memory. @itemize @bullet{} @item @cpindex Global variables @cpindex Static allocations Variables that are declared outside functions are called global variables@footnote{Even if that are declared with @code{static}. In this context, @code{static} is only use to hide object from other translation units.}. These are stored either in the @sc{BSS} segment or in the data segment, depending on their initialisation. Pointer are stored as numerical values, and the content is stored in the text segment. Arrays however are not stored, but their content is. In the data segment (or in the @sc{BSS} segment if the elements are zeroes.) These allocations are known as @i{static allocations}. @item @cpindex @code{static} @cpindex Static variables @cpindex Static allocations Variables that are declared inside functions, with @code{static}, are stored just like global variables. These are called static variables, and remain unchanged between function calls, and only change in statements other than the declaration statement. These allocations are known as @i{static allocations}. @item @cpindex @code{auto} @cpindex Local variables @cpindex Automatic variables @cpindex Automatic allocations @cpindex Stack-allocations Variables that are declared inside functions (these are known as local variables), without @code{static}, but with @code{auto}, are called automatic variables. These are stored in the stack segment, and are deallocated when the function returns. Local variables that are declared with neither @code{static}, @code{auto}, or @code{register} are often though of as automatic; however, the compiler may choose to add @code{register}. These allocations are known as @i{automatic allocations}@footnote{Known in some other programming languages as stack-allocations.}. @item @cpindex @code{register} @cpindex Register variables Variables that are declared with @code{register} are not stored in any segment. They lack addresses and stored as CPU registers. @item @cpindex Dynamic allocation @cpindex Heap-allocation Using system calls, a heap can be created, where new allocations are stored. These are often allocated with the function @code{malloc}, and are known as @i{dynamic allocations}@footnote{Known in some other programming languages as heap-allocations.}. @end itemize @cpindex Dynamic allocation @cpindex Automatic allocations @cpindex Stack-allocations Some compilers, including @sc{GCC}, provide two additional ways to allocate memory. @table @asis @item @i{Variable-length arrays} @cpindex Variable-length arrays @cpindex Arrays, variable-length A simple example of variable-length arrays, available with some compilers, is @example void function(size_t n) @{ int array[n]; /* ... */ @} @end example Variable-length arrays have a special property: they may be deallocated before the function returns. In fact, they are returned once the variable becomes invisible, this causes the follow example to work without every exhausting the memory. @example void function(size_t n) @{ for (;;) @{ int array[n]; /* ... */ @} @} @end example @item @code{alloca} @fnindex alloca @code{alloca} is a special function that is implement by the compiler. It increases the stack and returns a pointer to the beginning of the new part of the stack. It is similar to variable-length arrays, however the allocation is not deallocated before the function returns. This causes the follow example to eventually exhaust the memory. @example void function(size_t n) @{ for (;;) @{ int pointer* = alloca(n); /* ... */ @} @} @end example @end table Both of these allocation-methods are both automatic and dynamic. @cpindex Memory exhaustion Memory allocation functions return @code{NULL} if the process cannot allocate more memory. However, under typical configurations of the operating system kernel, memory allocation functions can return succesfully, even if the system's memory is exhausted. The process's virtual address space is not exhausted, so it thinks it can allocate the memory, but the machines memory is exhausted, so once the process tries to write to the memory, the kernel cannot map the virtual address to a real address. When this happens the process is killed by a @code{SIGSEGV}@footnote{Segmentation violation, also known as segmentation fault.} signal. @menu * The alloca function:: Dynamically allocate automatically freed memory. * Basic memory allocation:: Basic functions for dynamic memory allocation. * Aligned memory allocation:: Dynamic memory allocation with alignment. * Resizing memory allocations:: How to resize memory allocations. * Efficient stack-based allocations:: Improving the performance using constrained allocation methods. * Resizing the data segment:: How to change the size of the heap. * Memory locking:: How to prevent pages from being swapped out. @end menu @node The alloca function @section The @code{alloca} function @cpindex Dynamic allocation @cpindex Automatic allocations @cpindex Stack-allocations @fnindex alloca @hfindex alloca.h The function @code{void* alloca(size_t n)} appears on multiple systems: 32V, @sc{PWB}, @sc{PWB}.2, 3@sc{BSD}, 4@sc{BSD}, and @sc{GNU}. It has been added to @command{slibc}, without require on feature-test macros, despite not being standardised (for instance, it does not appear in @sc{POSIX}). This function is however not portable, and will not be made available if @code{_PORTABLE_SOURCE} or @code{_LIBRARY_HEADER} is defined. @code{alloca} is defined in the header file @file{}. @code{void* alloca(size_t n)} is similar to the function @code{malloc}. It allocates a contiguous space of usable memory of @code{n} bytes, and returns a pointer, which points to the beginning of the allocated memory. However, the allocate appears on the stack rather than the heap, and it automatically deallocated when the function whence the call to @code{alloca} was made. Be cause if this, @code{alloca} is implemented as a macro --- using an intrinsic function provided by the compiler --- rather than as a function. You must not try to free the memory explicitly, with a function like @code{free}, or resize it with a function like @code{realloc}. Just like arrays and pointers to automatic variables, memory management functions cannot operate memory allocated with @code{alloca}. @cpindex Memory exhaustion Unlike @code{malloc}, @code{alloca} does not detect memory exhaustion, thus it will never return @code{NULL}. However, it is likely that the process will receive @code{SIGSEGV}@footnote{Segmentation violation, also known as segmentation fault.} if it tries to access memory that could not be allocated, or, depending on the kernel's configuration, before it returns. On typical kernels and kernel configurations, @code{alloca} and @code{malloc} will handle memory exhaustion identically. Undefined behaviour may be invoked if @code{alloca} is called within a function call. The behaviour depends on the machine, the compiler, and optimisations. You should avoid code similar to @example #define strdupa(string) \ strcpy(alloca((strlen(string) + 1) * sizeof(char)), string) @end example @code{alloca} has its restrictions --- limited lifetime, cannot be explicitly deallocated, not growable, and not shrinkable --- but it can also be advantageous because: @itemize @item Results in cleaner code because it is deallocated automatically. @item It does not waste any space on metainformation required for bookkeeping. @item Uses a faster memory allocation mechanism. @end itemize @node Basic memory allocation @section Basic memory allocation @cpindex Dynamic allocation @fnindex malloc @hfindex malloc.h @hfindex stdlib.h @code{malloc} is the most general, and most commonly used memory allocation facility. It is also the most portable, and is available on all environments. There are three important function that were introduced in @sc{ANSI}@tie{}C. These are defined in @file{}, however, they should be included via @file{} whihc includes @file{}. @table @code @item void* malloc(size_t size) @fnindex malloc @cpindex Memory allocation without initialisation @cpindex Initialised memory allocation Allocates a contiguous memory block of @code{size} bytes of usuable memory, and returns a pointer to the beginning of the usable part of the allocation. The function allocates some extra memory for internal bookkeeping, this part of the allocation is located before the address the returned pointer points. Allocated memory is uninitialised. If the memory cannot be allocated (due to memory exhaustion,) @code{NULL} is returned and @code{errno} is set to @code{ENOMEM}. It is implementation defined whether this function returns @code{NULL} or a unique pointer if @code{size} is zero. In @code{slibc}, @code{NULL} is returned. The allocation be grown or shrinked at any time. See @ref{Resizing memory allocations}. In @sc{ISO}@tie{}C, @code{void*} can be implicitly casted to any other pointer type. Therefore it is sufficient (and preferable) to write @example int* ten_integers = malloc(10 * sizeof(int)); @end example You do not need to write @example int* ten_integers = (int*)malloc(10 * sizeof(int)); @end example @item void* calloc(size_t count, size_t size) @fnindex calloc @cpindex Memory allocation without uninitialisation @cpindex Uninitialised memory allocation This function is similar to @code{malloc}, but it initialises the memory to zeroes. The only other difference is that it as two parameters rather than one. If @code{count * size} would overflow, @code{NULL} is returned and @code{errno} is set to @code{ENOMEM}. Otherwise, @code{p = calloc(a, b)} is identical to @example p = malloc(a * b); memset(p, 0, a * b); @end example @item void free(void* ptr) @fnindex free @cpindex Deallocate memory @cpindex Memory, deallocation Memory that is dynamically allocated (without automatic deallocation) can be deallocated with this function. It is required that @code{ptr} is the pointer returned by the function that allocated it. It must not have been incremented. Undefined behaviour is invoked otherwise. One exception is @code{NULL}, if @code{ptr} is @code{NULL} nothing happens. Do not try to free a pointer twice, undefined behaviour. Neither should you try to pass pointers to variables to this function, nor pointers to functions, arrays, @code{main}'s parameters, memory-mapped I/O, or memory allocate with @code{alloca}. Function-like macros that allocate memory with @code{alloca}@footnote{There functions that will return pointers allocate with @code{alloca}, because the memory would be deallocated at the return. Thus, such facilities are implemented as function-like macros.} specifies so, and their name typically end with an `a', which is not too common for other functions. Be also cautious of whether functions return statically allocated arrays, which must not be deallocated, or dynamically allocated memory, which should be deallocated to avoid memory leaks. @end table @file{} also includes three unportable functions. @table @code @item void* zalloc(size_t size) @fnindex zalloc This function is similar to @code{calloc}, but it only has one parameter. Assuming @code{a} and @code{b} are two @code{size_t}:s, and @code{a * b} does not overflow, @code{calloc(a, b)} is identical to @code{zalloc(a * b)}. @code{zalloc} is a @command{klibc} extension. @item void cfree(void* ptr, ...) @fnindex cfree This function is an obsolete function provided by a number of C standard library implementations. It has been replaced by @code{free}. @code{cfree} is identical to @code{free}. Different implementions @command{libc}, defined @code{cfree} with different numbers of parameters, therefore @code{slibc} declares @code{cfree} as a variadic function, and ignores all but the first argument. @item size_t malloc_usable_size(void* ptr) @fnindex malloc_usable_sizew @cpindex Retrieve allocation size @cpindex Allocation size, retrieve This function returns the number of usable bytes for a pointer. If @code{ptr} is @code{NULL}, zero is returned. It has the same restrictions as the function @code{free}. @code{malloc_usable_size(malloc(n))} returns @code{n} (and allocates memory in the process.) This function is a @sc{GNU} extension and requires @code{_GNU_SOURCE}. @end table @node Aligned memory allocation @section Aligned memory allocation TODO @node Resizing memory allocations @section Resizing memory allocations TODO @node Efficient stack-based allocations @section Efficient stack-based allocations TODO obstack.h @node Resizing the data segment @section Resizing the data segment TODO brk, sbrk @node Memory locking @section Memory locking TODO mlock, munlock, mlockall, munlockall