How to Detect Memory Leaks

Posted February 01, 2002

Memory leaks can be particularly risky for long-running embedded systems. Here are some tools to identify, track, and analyze memory leaks in embedded C and C++ programs.

The standard C library functions malloc() and free() allow memory blocks of arbitrary size to be allocated to an application for an arbitrary period of time. As blocks are used and freed, the area of storage dedicated to the heap—the pool of available memory—becomes a jumble of used and free blocks mixed throughout that memory area. Any small patches of free space that remain in the heap might become unusable. For instance, an application might request 30 bytes from the heap. If the heap has 20 small blocks of three bytes each—a total of 60 bytes—the heap still cannot satisfy the request, since the 30 bytes requested have to be contiguous. This is an example of memory fragmentation.

In a long-running program, fragmentation can cause a system to run out of memory even though the total amount of memory allocated does not exceed the total available. The amount of fragmentation depends on how the heap is implemented. My article "Safe Memory Utilization" discusses fragmentation further, describes why it is not as much of a threat as once thought, and investigates heap implementations that minimize the loss of space due to fragmentation. ¹

Most programmers use the heap via the malloc() and free() provided by their compiler, so the fragmentation is outside their control. This is very different from a memory leak.

A memory leak is a block of memory that was allocated, but will never be freed. A leak is always a bug in the application. If all pointers to that block have gone out of scope or were assigned to point elsewhere, the application will never be able to free that piece of memory. A small leak may be acceptable in a desktop application that will exit at some point, since an exiting process returns all memory to the operating system. But in a long running embedded system, it is often necessary to guarantee that absolutely no leaks exist—a non-trivial challenge.

In this article, we'll discuss the nature of memory leaks, review some commercial tools for tracking memory allocation, and look at ways to develop your own tools to track and analyze memory usage in embedded C and C++ programs.

The trouble with memory leaks

Avoiding leaks is difficult. To ensure that all allocated memory is subsequently freed, we must establish a clear set of rules about who owns the memory. To track memory, we could use a class, an array of pointers, or a linked list. Lists are commonly used since, by the nature of dynamic memory allocation, a programmer doesn't know ahead of time how many blocks will be allocated at any given time.

For example, imagine that one task is receiving messages from a communications channel. It allocates space for the message and that space will not be freed until the message is completely processed. Since the messages may not be processed in the order in which they were received, some may live longer than others. All pending messages live on a list whose length varies according to the number of messages being processed at any given time. Let's say that this embedded system must forward the messages to another device, and the message cannot be deleted until delivery is confirmed. Since the messages are going to many different destinations, and some destinations may have some down-time, causing retries, it would not be feasible to process the messages in first-in-first-out manner.

In problem areas such as these, dynamic memory management enables more efficient use of the available RAM than a predefined number of buffers. When the memory is not being used by one message queue, it can be used by another queue, or by a completely different part of the program.

One particular problem often arises when there is more than one pointer to a specific block. If the first entity owns the memory and wants to free it, then it must first consider whether any other pointers point at that location. If any do, and the first entity frees the memory, those other pointers become dangling pointers—pointers that point to space that is no longer valid. When the dangling pointer is used, you may happen to get the correct data, but eventually the memory will be reused (via another call to malloc()), leading to unpleasant interactions between the dangling pointer and the new owner of that piece of memory. A dangling pointer is the opposite of a leak.

A leak occurs when you fail to free something; a dangling pointer occurs when you free something that was not yet ready to be freed.

Memory leaks are similar to race conditions in a number of ways. The misbehavior they cause may occur far from where the bug was caused. As a result, these problems are difficult to resolve by stepping through the code with a debugger. For both memory leaks and race conditions, code inspections sometimes catch these problems more quickly than any technical solution.

Adding debug code to generate output is often a better alternative than a source code debugger, but in the case of race conditions, it could alter the behavior enough to disguise the problem. With memory leaks, adding debug code can change the shape of the memory layout, which means that dangling pointer bugs may exhibit different behavior. Another drawback is that if the debugging code consumes memory, you may run out of RAM sooner in the debug version than you would in the production version. Still, a leak is a leak and should remain detectable regardless of these side effects of the debug code.

Automatic memory management

Some programming languages provide intrinsic memory management capabilities that can reduce the possibility of memory leaks. Java, for example, has automatic memory management in the form of garbage collection. Java programmers do not have to worry about freeing allocated memory. If you ever program in Java, you will appreciate just how much time and effort other languages require to keep track of memory.

In Java, you pay a runtime cost in exchange for programming simplicity. This is because manually managing the memory produces a more efficient implementation. When programs become large, however, manual management breaks down. While good manual management will always minimize total heap size, it might not always be easy to implement. In a large program that involves dozens of programmers, human error will introduce enough leaks to tip the performance scales in favor of an automatic solution.

I read a report years ago about how bar code readers in supermarkets identified the wrong product 1% of the time. The article seemed to imply that this meant that the supermarkets were ripping us off. What the article failed to point out was a comparison with the alternative, where the person on the till has to manually type in the price of each item, which, I am guessing, would have had a higher error rate. The other consideration is that shop assistants, like programmers, vary widely in their skill levels. If the automatic system has a measurable error level, then at least management can allow for it in their plans. If the volume of customers and groceries is high, the supermarket and the customer are better off in the long run with an automatic system.

A similar logic applies to automatic garbage collection. An excellent individual programmer will do far better than an automatic collector, but in a large project, with a large number of programmers, it may be impossible to find and plug all of the leaks. Choosing an automatic system may mean that you compromise performance. You also have to accept that the automatic collector will occasionally mess up. The article "Java Q&A: How Do You Plug Java Memory Leaks?" includes a list of ways to trick Java garbage collectors. ²

While automatic collection is attractive for very large programs, most embedded developers are not developing systems of that complexity. Of those who are, only the minority have access to a programming environment, such as Perl, Smalltalk, or Java, that includes automatic garbage collection. So most of us need to know how to track down leaks in C programs that use malloc() and free(), or C++ programs that use new and delete.

Memory leak detection tools

A number of tools help in the hunt for memory leaks. The most popular free ones are dmalloc and mpatrol. ³ ^, ⁴ These tools provide debugging versions of the heap that record and check all allocations to facilitate analysis of leaks and dangling pointers. dmalloc and a number of similar libraries provide drop-in replacements for malloc() and free().

On a number of projects, I was responsible for tracking down memory leaks and providing some evidence that all such leaks had been removed. Initially, I assumed that one of the tools mentioned above would solve my problems. However, I found that a complete replacement for malloc() and free() was not always appropriate for embedded systems. For instance, I could be perfectly happy with my current implementation and simply want to add some monitoring. If I choose a library that replaces malloc() and free(), I have to port those routines.

Since the free libraries available are Unix-oriented, they use the sbrk() call to obtain blocks of memory from the operating system. I might not have that call available in my operating system—in fact, I might not even have an operating system. When porting, I would also have to address issues such as pointer size and memory alignment issues of the specific processor. These issues have already been addressed by the version of malloc() and free() provided with my compiler library, which obviously has been ported to the processor that I am using, so I want to avoid repeating that work.

The other challenge of porting one of these debugging versions of malloc() is that they assume that they can store their analysis data in a file. Alas, the systems I work on generally do not have any file system available. I may have to restrict how much data is stored on a device with minimal resources.

I also considered running a portion of the code on a desktop computer using an off-the-shelf tool. BoundsChecker from Compuware was available and already being used in-house for other purposes. This tool is Windows-specific, but in one case, my code was straight ANSI C, so I simply compiled it on my PC and ran it with the BoundsChecker libraries. The BoundsChecker tool checks many parts of the Win32 API also, but I was only interested in heap allocation.

I was disappointed with the results. Large amounts of data were available, but it was necessary to eliminate numerous red herrings to discern the important information. The biggest obstacle was that BoundsChecker delivers its report after the program has exited. Any data that was not freed up before exit is considered a leak. While this is a reasonable definition, it was not a good match for my application. Unlike PC applications, embedded programs often have no need to exit.

My code contained one large loop that ran continuously. It was possible to add some check at the end of the loop that would artificially cause it to exit for the purposes of this test. However, stopping without freeing up all memory resources would cause BoundsChecker to indicate that all memory in use at that point was a leak. Most of this memory was simply waiting to be used again on the next iteration of the loop. I could have made BoundsChecker behave better by writing an exit routine that freed up all the memory that I could access, but such a routine would never run in the real system, and could disguise real problems.

I came to the conclusion that BoundsChecker is more suitable for identifying the exact source of a particular leak, once you know which line of code is under suspicion. I was more interested in the overall picture.

Such detection tools are ineffective in at least one other scenario. Imagine a linked list of buffers that may grow and shrink as the program runs. Since following the links allows the program to find every buffer, it is possible to free all of them at any time. Now consider a bug that causes a buffer that should be removed and freed to remain on the list. The list will grow indefinitely. If the program erases the complete list, all evidence of the bug will vanish—but the list will start growing again. In a system that runs for a long time, the list will eventually get so large that all memory will be used up.

A bug like this would be a problem even if an automatic garbage collector were managing memory because, strictly speaking, the extra buffers are not leaks; they can still be retrieved.

To solve this type of problem we want to establish whether total memory usage is increasing, regardless of whether it is freed, and regardless of whether it is possible to free it.

Measuring memory usage

If we are going to modify malloc(), ideally we should replace all calls to malloc() with a different name. I will call it mmalloc(), for "measured malloc." This will allow us to write a function that does a little extra work and then calls the normal malloc(). (This can be accomplished in other ways, such as using a #define to replace malloc() or using the linker to rename the malloc() function in the compiler library.)

One weakness here is that calls to malloc() from functions in libraries that I cannot change or recompile will not be monitored. For example, the standard library contains a function called strdup(), which in turn calls malloc(). I cannot replace this call to malloc() unless I have the source to the library I am using.

The first pass at measuring usage is to simply add up how much is allocated and subtract any memory freed. For malloc(), this is trivial. Assuming we have a static value G_inUse, then the following code will track allocations:


void *mmalloc(size_t size)
{
   G_inUse += size;
   return malloc(size);
}

mfree() is a bit trickier, since free() is not passed a size. The free() function is passed a pointer to the block. Usually the size is hidden in a header just before the block pointed to, so something like this might work:


void mfree(void *p)
{
   size_t *sizePtr=((size_t *) p)-1;
   G_inUse -= *sizePtr;
   free(p);
}

This would not be portable since your free might not use this convention, or may store it at a slightly different offset.

The size freed may not match what was allocated. Some implementations of malloc() will round up a size. For example, if you request 11 bytes, you might actually receive 12 bytes. In this case, the size 12 would be stored in the header. So allocating and freeing the block would lead to a balance in G_inUse of -1.

Obviously, we have to do more work to make our monitoring code portable. We should also add some more code to keep track of sizes and then look at what other data is worth collecting.

Listing 1 shows how we can improve our memory monitoring facility. Here, we use an array to keep a record of each block of allocated memory. When the block is freed, we remove that record from our array. After each iteration of the main loop, we print a summary of the number of allocated blocks. Ideally, we would sort these blocks by type, but calls to malloc() and free() do not contain any type information. The best indication available is the size of the allocations, so that's what we'll record. We also need to store the address of the block allocated, so that we can locate and remove the record for that block when free() is called.


static BlockEntry blocks[NUM_BLOCKS];
static Counter counters[NUM_SIZES];

static void incrementCountForSize(size_t size);
static void decrementCountForSize(size_t size);

static long FS_totalAllocated = 0;

void *mMalloc(size_t size)
{
   int i;
   void *newAllocation = malloc(size);

   for (i = 0; i < NUM_BLOCKS; i++)
   {
      if (blocks[i].addr == 0)
      {
         // found empty entry; use it 
         blocks[i].addr = newAllocation;
         blocks[i].size = size;
         incrementCountForSize(size);
         break;
      }
   }
   assert(i < NUM_BLOCKS);

   FS_totalAllocated += size;
   return newAllocation;
}

void mFree(void *blockToFree)
{
   int i;

   for (i = 0; i < NUM_BLOCKS; i++)
   {
      if (blocks[i].addr == blockToFree)
      {
         // found block being released 
         decrementCountForSize(blocks[i].size);
         FS_totalAllocated -= blocks[i].size;
         blocks[i].addr = 0;
         blocks[i].size = 0;
         break;
      }
   }
   assert(i < NUM_BLOCKS);

   free(blockToFree);
}

void mDisplayTable(void)
{
   printf("%s", "\nSize\tFreq.");
   for (int i = 0; i < NUM_SIZES; i++)
   {
      if (counters[i].size == 0) break;
      printf("\n%d\t\t%d", counters[i].size, counters[i].count);
   }
}

void mClearTable(void)
{
   for (int i = 0; i < NUM_SIZES; i++)
   {
      counters[i].size = 0;
      counters[i].count = 0;
   }
}

Listing 1. A smarter memory allocator

As we add and remove blocks, we also keep track of the number of blocks of each size. Listing 1 contains the code to do this. As blocks are allocated and freed, an array of:


typedef struct
{
   void * address;
   size_t size;
} BlockEntry;

keeps track of all blocks currently in existence. Another array of:


typedef struct
{
   int count;
   size_t size;
} Counter;

keeps track of the total number of blocks of each size in existence.

The mDisplayTable() function allows us to output the results at the end of each main loop. If printf() is not available, you can halt the system with a debugger and examine the contents of the array.

The code shown needs to be tuned to set NUM_SIZES and NUM_BLOCKS large enough to cope with the number of allocations in your system, but not so large that you use up all of your RAM before you begin.

Analyzing memory usage

As a quick aside, we will look at the size of the Sensor structure. It is defined as:


typedef struct
{
   int offset;
   int gain;
   char name[10];
} Sensor;

Assuming an int is 32 bits, a reasonable guess at the size of this structure would be 18 (4 + 4 + 10). However, running our test showed it to be 20. The compiler is free to add padding between data members of a structure in order to force the alignment to be on a word boundary. In this particular case, each field starts on a word boundary already, so why the padding? The padding is added at the end of the structure. If we declare an array of Sensor, all elements of the array (and not just the first) will be word aligned. Depending on the processor, word alignment may make a speed difference, and sometimes those compilers will provide a switch that trades size against speed. In any case, it is best to make no assumptions about the size of a structure from reading the definition in the source code.

Let's see what type of output we get when we use these functions. Listing 2 shows a contrived example that demonstrates some ways to store dynamic memory. Listing 2 runs 10 iterations of what would normally be the main outer loop. At the end of each iteration, we call mDisplayTable() to output a summary of the blocks allocated.


#include "mmalloc.h"
int main(void)
{
   char cbuf;

   initialize();
   mClearTable();

   for (int i = 0; i < 10; i++)
   {
      replacer();
      growAndShrink();
      growForever();
      printf("\n End of iteration %d", i);
      mDisplayTable();
   }

   return 0;
}

Sensor *replacePtr = NULL;

void replacer(void)
{
   if (replacePtr != NULL) mFree(replacePtr);
   replacePtr = mMalloc(sizeof(Sensor));
}

MsgBuffer *manyBuffers[30];
int numBuf = 0;

void growAndShrink(void)
{
   if (numBuf > 20)
   {
      for (int i = 0; i < 5; i++)
      {
         numBuf--;
         mFree(manyBuffers[numBuf]);
         manyBuffers[numBuf] = NULL;
      }
   }
   else
   {
      for (int i = 0; i < 5; i++)
      {
         manyBuffers[numBuf] = mMalloc(sizeof(MsgBuffer));
         numBuf++;
      }
   }
}

MsgBuffer *manyBadBuffers[200];
int numBadBuf = 0;

void growForever(void)
{
   for (int i = 0; i < 5 ; i++)
   {
      manyBadBuffers[numBadBuf] = mMalloc(sizeof(BadBuffer));
      numBadBuf++;
   }
}

Listing 2. Example application code

Many blocks are allocated during initialization. We aren't concerned with them because that code does not repeat, and, therefore, does not threaten a leak. Since we don't want to clutter our table with those allocations, we clear it just before we start the iterations that interest us. To clear the table, we call mClearTable().

The main loop calls three different functions. Each one allocates memory in different ways.

The replacer() function shows a pointer that is used to allocate a block that is not freed until the following iteration of the loop. If we examine one iteration of the main loop, we see the block allocated and not freed. By monitoring the total number of blocks of size 20, we see, as shown in Table 1, that the total number of blocks at the end of each iteration is one. No leak has occurred.

End of Iteration 0		End of Iteration 5
Size	Frequency	Size	Frequency
20	1	20	1
24	5	24	20
44	5	44	30
End of Iteration 1		End of Iteration 6
Size	Frequency	Size	Frequency
20	1	20	1
24	10	24	25
44	10	44	35
End of Iteration 2		End of Iteration 7
Size	Frequency	Size	Frequency
20	1	20	1
24	15	24	20
44	15	44	40
End of Iteration 3		End of Iteration 8
Size	Frequency	Size	Frequency
20	1	20	1
24	20	24	25
44	20	44	45
End of Iteration 4		End of Iteration 9
Size	Frequency	Size	Frequency
20	1	20	1
24	25	24	20
44	25	44	50

Table 1. Allocated memory blocks by size

The growAndShrink() function manages a linked list of size 24 structures that grows longer and shorter over time. But we don't want the list to grow indefinitely. Examining the number of blocks of size 24, we see that, while the number of blocks in existence at any one time varies, it does not exceed 25.

The growForever() function deals in blocks of size 44. Here we can see quite plainly that the number of blocks in existence keeps growing. When you first observe this in the table, you will not know the source. The first quick and dirty check you can perform is a conditional breakpoint on mMalloc() that only fires if the size parameter is 44. When you hit this breakpoint, examine the stack to see where the allocation took place. You should probably do this several times, since blocks of this size may be allocated in more than one place.

Strictly speaking, the memory consumed in growForever() is not a leak, because there are references to all blocks allocated and they could, theoretically, be freed later. It will usually be fairly obvious if a particular application is likely to do this.

Tracking the type of allocated memory

The preceding technique is less useful when many different types of objects share the same size. In practice, I have not found this much of an issue, but if it does cause problems, you have a couple of options.

One option is to store type information in each record. This is not difficult, but I shy away from it because it requires adding something new to the signature of mMalloc(). We could define an enumerated type that lists all of the types that may be allocated. In each call to the mMalloc() function, an extra argument that was one of the elements of the enumerated type could be passed. If this element were stored along with the address in our table, we could always identify the type of the object.

This would also allow us to link together allocations of different size, but related types, such as character arrays of varying length.

C++ facilitates this approach by allowing us to overload new and delete on a per-class basis. ⁵ While this is a useful path, I will not pursue it here; I would prefer to stick with techniques that can be used in a C environment.

Tracking where memory is allocated

Alternatively, we can collect information about where memory is allocated in a program. Fortunately, we can pick up this information without changing our signature if we make some clever use of macro definitions:


#define mMalloc(size_t size) mMallocLineNo(size, __LINE__, __FILE__)

(My article "Assert Yourself" discusses the __LINE__ and __FILE__ macros, as well as some tricks to avoid the overhead from the __FILE__ string. ⁶ )


void *mMallocLineFile(size_t size, int line, char *file)
{
   int i;
   void *newAllocation = malloc(size);

   for (i = 0; i < NUM_BLOCKS; i++)
   {
      if (blocks[i].addr == 0)
      {
         // found empty entry; use it
         blocks[i].addr = newAllocation;
         blocks[i].size = size;
         blocks[i].line = line;
         blocks[i].file = file;
         incrementCountForSize(size);
         break;
      }
   }
   assert(i < NUM_BLOCKS);

   FS_totalAllocated += size;
   return newAllocation;
}

Listing 3. Allocations can be tracked by line number

The mMallocLineNo() function is a slight variation of the mMalloc() function from Listing 1. We now want to store line number and filename information as shown in Listing 3. To hold this extra information, our BlockEntry structure becomes:


typedef struct 
{
   void * addr;
   size_t size;
   int line;
   char * file;
} BlockEntry;

By storing the line number and file name for each block, we can know exactly where any block was allocated. I find it useful to add a function called mDisplayLocation() that outputs the line number and file name for all entries of a particular size. This makes it easy to identify the source of any blocks of the size that looks suspicious.

Returning to Table 1, we may worry about the blocks of size 44. To learn more about where this memory came from, we can put the following line at the end of main():


mDisplayLocation(44);

This leads to the output of 50 copies of the line:

line = 162, file = listing2.c

This shows unambiguously that all allocated blocks were allocated within the growForever() function.

Choosing a memory tracking strategy

The size of some allocations may vary greatly. For example:


char *p = malloc(strlen(name)+1);

is a common way to allocate a block of memory large enough to store a string name plus a string terminator. In embedded systems, string and file manipulation are uncommon; data structure allocations are not. For example:


Motor *m = malloc(sizeof(Motor));

If we assume Motor is a structure, this allocation will always result in a block of the same size. This makes it easier to identify these blocks in the output of the functions described above.

When allocations of varying sizes are a concern, the code can be modified to use the combination of line number and file name as the key for counting the number of allocations. In our examples, we stored the line number and filename, but the printed summaries were based on size. Using the line number and filename to group allocations would combine all allocations that occurred in the same place, regardless of size. In some cases, this analysis is more revealing even when varying sizes is not an issue.

Any code with a memory leak should cause the tables shown here to grow. Not all leaks will be identified as clearly or unambiguously as our growForever() example. Even if you use another technique for leak detection and removal, these output tables will help confirm whether a leak has been eliminated.

The loop shown here did not have varying input data. In real projects, I usually insert some calls to fake input, such as a call to simulate a sequence of keypresses. In your system, you will also have to create some suitable input. Unless you vary it, you may not visit the pieces of code that create the leak. For this reason, the examples shown here represent a good starting point for your hunt, but each leak is still likely to take some detective work. Happy hunting.