The safety and security of every embedded system is dependent upon proper operation of the stack (or stacks, if there are multiple). If a stack overflow occurs, a major or minor malfunction is very likely to follow. Despite this, the stack in the majority of embedded systems is sized based on nothing more than a hunch. As well, run-time stack monitoring is too seldom used.
The remainder of this page will be a transcript of a 1-hour webinar. A recording of the webinar is available at https://vimeo.com/182587785.
Slide 1: How to Prevent and Detect Stack Overflow
Moderator: Welcome and thank you for attending Barr Group’s webinar How to Prevent and Detect Stack Overflow. My name is Sherry, and I will be your moderator for the next hour. Today’s webinar presenter is Nigel Jones, Principal Engineer for Barr Group. Before we get started with Nigel’s presentation, I am pleased to introduce Michael Barr, our Chief Technical Officer who will provide a brief company overview.
Slide 2: About Barr Group
Hello everyone and thank you for joining us. My name is Michael Barr, I am a co-founder and the Chief Technical Officer of Barr Group. At Barr Group our mission is to help as many people as possible build safer, more reliable, and more secure embedded systems. We are an independent consulting firm, specializing in embedded systems and software. And we regularly consult with numerous companies in many industries, on process and rearchitecture for systems and software. As well we take on design and development work, where we participate in the software and sometimes also electrical and mechanical design of systems. As well, we train engineers in best practices through webinars like this, through public training events and also at private companies. And finally, we sometimes, our expert sometimes testify before judges and juries about issues relating to embedded systems and embedded software.
Slide 3: Barr Group Training Courses
Before we begin today’s webinar, I would just like to give you three URLs relating to our training courses. The first is for our Training Calendar for upcoming public courses, this is where one person or a couple of people from a company can join others from the industry at a hotel meeting room or our headquarters training room to learn about different topics. And if you go to that first URL, you will find a list of upcoming courses, dates and locations. The second URL I would like to bring to your attention is for our Course Catalog, this is a list of all of the courses that we offer, including courses that are offered publicly but also courses that are only offered at on-site locations at private companies. And of course any of the courses in the catalog can be brought to your company or it can be even sometimes be customized for your needs. Finally, we have an archive of all of our Past Webinars. After today’s webinar has been transcribed and recorded--you can find this one there as well. And we have a number of other interesting and valuable webinars there. So with that I turn the microphone over to today’s speaker
Slide 4: Nigel Jones, Principal Engineer
Nigel Jones: I think that’s enough preliminaries. I’m Nigel Jones, today’s speaker. I’ve been designing embedded systems for over 30 years now and I’m equally comfortable doing hardware and firmware. I particularly enjoy the challenge of small low power systems.
For those of you that don’t know, I maintain a blog that is appropriately called “stack overflow” that you can sign on embeddedgurus.com.
With that out of the way, let’s dive in.
Slide 5: Overview of Today’s Webinar
Stacks are ubiquitous in embedded systems. There is widespread recognition that a stack overflow is a really bad thing. Indeed, it’s fair to say that if a stack overflow occurs, the question isn’t, will my system malfunction, but rather, how much damage will be done to my property, reputation, et cetera.
Despite this, the vast majority of the embedded systems, one, place the stacks in memory with little forethought to what happens if they overflow. Two, size the stacks using nothing more than a WAG. Three, make no effort to effect an impending overflow. And four, take no remedial action if the stack does indeed overflow.
The objective for this webinar is to help you improve upon this solid state of affairs. The approach I’m taking in this presentation is to introduce the material and roughly the order to become as relevant during the product development cycle.
Slide 6: How Many Stacks?
Before we start talking about stack sizing and stack overflow, we need to recognize that many, and indeed most, embedded systems contain multiple stacks. If you don’t know or not sure how many stacks your system uses, then you are doomed to failure.
Some common combinations are shown on this slide. Single stack. This is the classic approach where function return addresses, parameters, automatic variables and registers are all saved on to a single stack, microchips 32-bit compiler when used without an RTOS has this architecture.
Two stacks, one for return addresses and one for everything else. This approach is used by IAR. IAR refers to these two stacks as the RSTACK and the CSTACK. You also effectively get this approach with microprocessors that have a hardware return stack. For example, on 8-bit PIC processors.
Two stacks, one for exception handling and one for everything else. This is often done with hardware that has a dedicated exception stack.
One stack per task. This is a common approach when using an RSTACK. For example, Micrium’s MicroC OS III uses this approach, not all sorts of variance on this basic approach. For example, a common variant is one stack per stack plus a separate exception handling stack. RTEMS, for example the support with this.
In general, the techniques and analysis that I described in this webinar needs to be applied to every single stack in your system. As you’ll discover, this is a lot of work.
Slide 7: Growing Down?
Having determined how many stacks you have in your system, the next thing to know is in which direction does the stack grow. If you don’t know which way your stacks grow, then you once again pretty much doomed to failure.
There are two options, high address to low address or low address to high address. Be aware that there’s no guarantee that all of the stacks in your system grow in the same direction.
The picture here shows the stack growing down, that is when something is pushed on to the stack, the stack pointer increase. All of the examples in this webinar use this convention.
Slide 8: Wither the Stack?
Having determined number of stacks in your system and the direction in which they grow, the next step is to work out the best place in the memory. You can of course just use the default placements that your tool chain vendor uses. However, like most things in embedded systems, not giving sufficient thought to an issue would initially result in a best, a suboptimal design and at worst, a completely disastrous design.
Here’s what I do, the basic premise is to place the stack or stacks in such a location that if overflow occurs, then damage is minimal. Let’s consider the simplest case of a system with a single stack that grows down, high address to low address. And it has a single heap that hopefully grows up, that is low address to high address.
For such a system, the optimal arrangement is as shown here, with the stack placed at the top memory, statically allocated data, ultimate memory with the heap placed immediately above the statically allocated data.
In this arrangement, if the stack overflows, then it will first overflow into unused memory and then things really get out of hand into hopefully unused heap space. Note that interchanging the space for the heap and statically allocated data will result in a suboptimal design as the stack overflow would potentially corrupts data that is being used.
Incidentally, when I say the heap grows up, I recognize that I’m being a loose with my terminology. What I’m trying to convey is that when the first allocation is performed on the heap, that it occurs at the lowest address and subsequent allocations occur at the lowest possible address. Whether your heap works this way is highly implementation dependent. If it doesn’t work this way and you’d like it to, then there’s nothing stopping you from implementing your own design.
Of course there are also other schools of thought when it comes to stack placements. Another perfectly valid approach is to place the stack that grows down at the bottom of the memory. The idea here is that if the stack overflows, you are stomping on it. Furthermore, the more advance process is out there, almost certainly for an exception, if the stack pointer loss out the bounds of physical memory. These get things to the realm of what to do when your stack overflows which I’ll be addressing shortly.
Slide 9: Multiple Stack Placement
What about when you have multiple stacks? Well, the same basic premise applies. Now many or indeed all process allow you to either statically or dynamically allocates stack space tasks. Unless you’ll be creating and destroying task, I strongly recommend that you statically allocate stacks in your task. I also strongly recommend that you use the appropriate fragments to force the statically allocated stacks into the linker section such that you can control where the stacks get placed in memory. On the assumption that stacks grow from high to low, then a placement that looks like this is optimal.
I should also mention that sometimes, you want to place different RTOS stacks into different memory space areas for performance reasons. For example, a task that has to run really fast might want to have it stacked in zero wait state RAM, while other tasks are fine to use more pedestrian DRAM. If you do this, then you really need to think carefully about each memory space so as to minimize the potential damage.
Incidentally, if your RTOS vendor doesn’t give you any control over your stack placements, then you might take this as a hint to rethink your RTOS selection.
Slide 10: Arranging Task Stacks
The obvious problem with the previous slide is that it’s only the last task that has the benefit of being able to overflow in to the unused heap area. So the question is, what task should be given this privilege position? Well, many RTOS require either that all tasks use the same stack size or require that task stack size to be one of the limited set such as 256, 512, 1024 or 2048 bytes.
As a result, many tasks will have a large surplus stacks space whilst others will be close to their limits. When ordering a task stacks in memory, place the task stack with biggest margin at the top of memory and work down until the task with the smallest margin and hence the one that is most likely to overflow is immediately above communications or the one that handles the UI.
If you don’t know which task that is, then here is a good rule of thumb. It’s the one that handles TCP/IP communications or the one that handles the UI. To see the advantage of this approach, consider an allocation that reverses the stack allocation strategy such that Task 3 is at the highest memory location with Task 4 below it and so on.
Now if the Task 3 overflows, it will crash the stack in Task 4. This in turn will cause Task 4 to malfunction, possibly causing it to overflow its stack such that it crashes the stack of Task 1 and so on. Thus, you will build a potentially more resilient system by ordering the stacks in a manner that I’ve advocated.
Slide 11: Guestimating Your Initial Stack Sizes
The obvious problem with the previous slide is that it’s only – so you’ve worked out how many stacks your project needs and where to place them in memory. So how do you go about assigning initial stack sizes? Well, this is an extraordinarily difficult thing to do. Here’s my approach.
One, never use the default values assigned by your tool chain. Trust me, it’s the wrong value. Next, as a minimum, allocate 10% of your available RAM to your stacks. If you are using sophisticated communication stacks or a complicated UI, then you should increase this to 20% or 25%. In dividing up the memory between stacks, you use some common sense. A core stack such as IAR RSTACK will usually use the fraction of the memory you used by general purpose stack such as IAR CSTACK. A 10 to one ratio isn’t a bad starting point.
Big tasks will only need more stack spaces with the task. Task that use communication stacks such as TCP/IP typically need a lot of task space and task that perform formatted IO also typically need a lot of stack space.
Let’s assume that you’ve got your “Hello World!” application compiled in the link and running on your tablet.
Slide 12: Monitoring Stack via Debugger
Before you run off and start writing all the necessary application code, it’s a great idea to set up your debugger to monitor your stacks. Now, different environments have different capabilities. Thus, I strongly recommend that you find out what your system can do and use it from day one. After all, there’s nothing as frustrating hunting down a bug which turns out to be a stack overflow.
The slide here shows IAR IDE configured to display the CSTACK usage. It’s quite handy as the bar graph shows the percentage of the CSTACK that is being consumed, while the data shows the values that are being pushed on to the stack.
I find these tools very helpful as they can quickly show me if my initial guestimates of the required stack sizes are in the ballpark or not.
Slide 13: Stack Underflow
Before we go any further on this topic, I’d be remised in not mentioning stack underflow. Stack underflow occurs when you pop more things up the stack than you put on it. If you are not doing any assembly language programming, then in theory, you should never have to worry about this. However, if you are doing assembly language programming, then there’s a very real possibility that you’ll be manipulating the stack pointer which could lead to a stack underflow condition.
In the next few slides, I’ll be talking about stack monitoring. I’ll be only considering the stack overflow case. However, if you are using assembly language in your application, you should seriously consider making the stack monitoring code double-ended such that it can detect stack underflow condition as well.
Slide 14: Runtime Stack Monitoring
It’s a great idea to build stack monitoring into your system at the start of the project rather than waiting till the end. The basic idea is that you first compute the worst case expected stack usage. I’ll be talking about how we do this in a few minutes.
Next, you add a safety margin which in practice is as big as you can make it with the memory you have available. You then arrange things such that if a stack pointer ever strays into the safety margin region, then you take remedial action. You can do the monitoring using software, hardware or a combination of the two.
I’m a big fan of the hardware-based approach so I’ll talk about that first.
Slide 15: Hardware-Based Stack Monitoring
Many more processes come with a memory management unit, an MMU, or a memory protection unit, MPU. The MMU offers a lot more functionality but the MPUs are a lot easier to set up. Regardless, the basic idea is to configure the MMU or MPU such that an access to memory at the end of stack is illegal. If this happens, indicating a stack overflow, then an exception is generated. Note that if the exception handler needs to use the same stack but it just overflowed, then you’re screwed.
This technique can be extended to multiple stacks in a couple of ways. If the hardware supports multiple protected zones, then by all means, just configure the MPU once and you are done.
Alternatively, if you are in the RTOS environment, then as part of the task switch, one can reconfigure the MPU on the fly to protect the stack of the task to which you are switching. If this is of interest to you, then there’s a nice write up of technique on Micrium’s website. https://www.micrium.com/detecting-stack-overflows-part-2-of-2/
I’d also be remised in not mentioning a new trick of those of you using Cortex-MX processes. You can set up the Cortex processors, data watchpoint and trace, DWT unit, to set the data watchpoint at the end of a stack switch. You then enable the debug monitor exception which will be triggered on the program addressed accessed. The DWT can be reprogrammed at runtime such that you can change it during a contact switch. It also works without a debugger connected so that you can use it in a production build.
Again, I must point out that if the exception that is generated is relying on integrity of a stack that has triggered the exception, then your wonderful stack monitoring system maybe worthless. To put it on the way, if you’re on a stack overflow condition, trust nothing. In particular, don’t rely on the stack, so that means no function calls.
Slide 16: Software-Based Stack Monitoring
If your microcontroller doesn’t have an MMU or MPU, or you find them too daunting to setup, then you can resort to the software-based monitoring.
The basic idea is to fill the stacks with a known bit fashion, aka, painting the stack, and then have code that monitors how much of that stack is being consumed by determining how much of the stack no longer contains the known bit pattern.
You’ll find plenty of papers on the web that describe intimate details to the technique. So rather than describing the details which are highly application dependent, I’ll address some of the issues.
This technique tends to suck up a lot of CPU cycles if you aren’t careful. One way to mitigate this is to keep track of the high water mark and to always start the search from there rather than searching the entire stack.
Another not so obvious problem of this approach is trying to switch up more contexts the stack monitoring code should run. For example, should it be contained within the task? If so, should it be the lowest priority task, the highest priority task, or something in between? You also need to consider the possibility that if the stack overflow occurs, could it disable to stack monitoring code?
For example, let’s say a stack overflow occurs that corrupts the TCB’s of the RTOS. Could this result in your task that it’s supposedly monitoring the stacks from never being run? If this issue makes you queasy, then an alternative is to run the stack monitoring code within an interrupt context. This works well right up until the stack monitoring interrupt this to make use of the stack that has overflowed.
There are ways around this by ways of writing code that does not use stack. The bottom line though is you need to write a code such that it’s guaranteed to work even in an overflow condition of a stack upon which it depends. It’s a nontrivial problem.
The third possibility if you are using another task is to integrate the stack checking code into the task switching code. This is pretty neat as it allows you to check that the task you’re about to switch into does indeed have the same stack. The way to do this varies between RTOS vendors. However, many of them offer a context which hook – which is nothing more than a pointer to a function that you register with the RTOS.
You can place your stack checking code into the function that you register as the context switch hook. The downside to this approach is that it can have a significant impact on your context switching times. As a result, you’re maybe attempted to conditionally compile in this functionality such that you can remove it into release mode.
Unless you absolutely have no choice, I strongly recommend that you don’t do this. I have multiple reasons for this. One, the least code typically uses different optimization settings. Optimization changes the stack usage. So what works in debug mode maybe wrong for release mode. So, unless you are completely anal about doing the complete stack analysis between each release, removing this protection because you don’t think you need it is really foolish.
Three, finally, it’s really hard to get a worst case stack analysis right. Willingly removing a safety belt on the assumption that you’ll never crash is a silly thing to do in a car and it’s a silly thing to do here as well.
Finally, you’ll see performance with this technique installing an additional benefit, namely, that it allows you to determine the actual stack usage of your product. The basic idea is to let your product funnel for a certain amount of time and then examine the high water marks for each of your stacks. You then assume that this represents the maximum stack usage, so all you need to do is add a safety factor and you’re done.
In my humble opinion, this is a complete fallacy as I will demonstrate shortly.
Slide 17: I’ve Overflowed a Stack – Now What?
So what you do is you detect either an impeding stack overflow or an actual overflow. I think it depends on whether you are actively debugging code or whether if it’s release code.
Entering debug mode and running on a debugger, then you simply go into an infinite loop with interrupts disabled. Note that this really needs to be an inline piece of code as you cannot rely on being able to make a function call.
If you will go release, running on a real live system, then live is really bad. You have no option other than to record the fault and reset the micro.
As for the debug case, don’t forget that you are now dealing with a damaged system and so assuming the function calling the work is full hardy. Instead, do everything inline including the microprocessor reset.
Slide 18: An Ounce of Prevention…
Up until now, I’ve been talking about how to mitigate the effective stack overflow. While debug techniques are all well and good, should we be able to guarantee that a stack overflow will not occur? The answer is absolutely. It is possible to determine the worst case stack usage of a system. Unfortunately, it’s not easy and it’s prone to error. Regardless, it needs to be done.
As and when you go through this exercise, you will discover that the ratio of the absolute worse case stack usage to a typical stack usage is enormous and it goes a long way to explaining why an embedded system that works for years with no issues suddenly fails.
So with this in mind, let’s talk about how you set about computing the worst case stack usage. To do this, I’ll break it down into steps.
Slide 19: Stack Usage of a Single Function
In order to compute our stack size, we have to at the very least be able to compute the stack usage of a function. However, this is not as easy as it might appear. Consider the example code fragment here. How much stack space does it take? Well, the answer depends entirely from the size to be addressed, which of their function needs to return, size of a pointer, the size of an int, how many of the automatic variables are stored in registers, how the compiler handles variables at limited scope and so on.
Furthermore, stack usage of this function will most certainly depend upon the compile optimization settings and it applicable, will depend upon on tool you then can use and also the memory model.
Thus, the bottom line is the only way to know what the stack usage of a function is, is to have the linker tell us. Fortunately, most linkers will do this for us today. The bad news is that you’ll typically have to haunt through the linker documentation to work out how to get this report. Regardless, if you are serious about doing a stack analysis, you absolutely have to work out how to get this information from your linker.
Slide 20: Stack Usage of a Recursive Function
The above is all well and good, but what about recursive function such as the one shown here? In this case, the total stack usage is usually the stack usage of a standalone function multiplied by the maximum recursion depth, and this is relatively straightforward.
However, when you have a cycle, A calls B, B calls C, C calls A, then the analysis becomes a nightmare. Indeed, it’s so hard that I know of no stack tool that can handle recursion without considerable help. It’s for this reason that I strongly discourage people from using recursion in embedded systems. It’s also the reason that all safety standards I’m aware of forbid the use of recursion.
Slide 21: Stack Usage of a Call Tree
Having worked out how to get stack usage for function, the next step is to calculate the stack usage of a single call branch.
The slide here is showing part of the call tree for the microchip TCP/IP stack. As you can see, it’s quite complicated.
At this stage, all we are interested in is computing the stack consumption along a single branch, for example, _DHCPSend to TCP_UDP_PutIsReady into _UDPSOCKETDcpt. To do this, we sum the stack usage of each of the functions that make up the call branch.
There are some exceptions to this particularly of one of the called function is inline. However, typically these sort of optimizations results in a reduction in stack size, and so we don’t care about them.
Having computed the stack usage of one call branch, you repeat the exercise for all of the other branches. Then the stack usage for call tree such as the one that starts with _DHCPSend is simply the maximum of all these branches. Even if the linker knows what function calls what, it’s normal for a linker or related tools to be able to tell you the stack usage of the call tree such as the one here.
Slide 22: Handling Indirect Function Calls
I’m a huge fan of using pointers to functions. Unfortunately, a lot of the automated stack analysis tools are unable to handle indirect function calls, i.e., calls made via a function pointer. To get around this, you can insert dummy calls as shown here. Then to perform a call tree analysis, you simply recompile the two stack analyses defined. At this point, a linker no longer sees it as an indirect call but rather a whole series of direct calls which you can now easily resolve into a call tree.
I hope it goes without saying that you should never ship a product with two stack analyses defined.
Thus with this technique, you should be able to get the link if it tells you the stack usage of a call branch even if it includes indirect calls.
Note that failure to do this will often give you a very misleading picture from a stack analysis tool. I’ve experienced cases where I computed worst case stack size jumped by over 1K and I had it in indirect calls using this technique.
I’d also be remised in not pointing out this technique produces that this information is nightmare and that you need to keep the two blocks into synchronization. Unfortunately, I don’t know of any way to mention this.
Many stack analysis tools seem to struggle with calls to assembly language routines. If your tool is forced into this category, then you can use a similar technique to the last slide. The first step is to determine the stack usage of the assembly language routines. You can either manually analyze the code or if you’re sure that you’re executing the worst case part, simply use the debugger to see what the stack usage is.
Having determined what the stack usage is, you then add dummy code as shown here to emulate the stack usage. As before, whatever you do, don’t release codes that do stack analysis defined.
Slide 24: Stack Usage of an ISR
Interrupt service routines can be an absolute nightmare to analyze. In general, the stack usage of an ISR, such as the one here, is the stack space is required to save the process context plus the stack space for any automatic variables, plus the stack space in the worst case call tree from within the ISR.
The trouble is, it’s really hard to tell how many registers will get stack within an ISR. Furthermore, sometimes, a linker will only tell you the stack usage handle itself and exclude the space required to stack the context registers.
In situations like this, there’s no substitute looking at a disassembly listing of the ISR that shows all the registers being saved/stored to determine the total stack usage. This is shown on the right-hand side of the slide where the registers being pushed onto the stack are clearly visible.
Slide 24: Nested Interrupts
Interrupts are said to last if an ISR that is currently running can be interrupted by another interrupt. This is often done either accidentally or intentionally when an ISR takes so long to run that it causes other interrupts to be missed.
From a stack analysis perspective, this is a similar problem to regression. If the nesting gets disbounded, then the stack usage is simply the sum of stack usage or the interrupts that get missed.
If the nesting is not bounded, then you have a nightmare on your end and you can easily overflow the stack, and for this reason, the interrupt nesting should only be done with extreme caution and a complete understanding what the worst case is.
Slide 26: Stack Usage of a Task
So what’s the stack usage of a task? Well, it’s the stack usage for worst case call tree in the task plus the worst case stack usage of any interrupts that can fire during this thread assuming of course that the interrupts stack usage is added to the same stack or thread, plus the stack usage associated with the task switch, again, assuming that for registers are, in fact, on the same stack of the thread.
Slide 27: Stack Usage of an Application
So what’s the stack usage of an application? Well, if each task has its own stack, then you don’t need to perform an application stack analysis. However, if all tasks use the same stack, such as when using OSEK, then you need to perform an application analysis. Typically, this involves combining the stack usage of each task. This can lead to some startling results as I will now demonstrate.
Slide 28: Stack Painting and Guessing
I think I’ve demonstrated that computing your stack size is hard work but it can be done. But isn’t it a lot easier to simply use the stack painting technique to determine the typical worst case stack size and then at a safety margin? Well, yes, it is. However, unless you really know what you are doing, it’s very easy to shoot yourself in the foot.
To illustrate what I mean, consider a system that is running a single stack preemptive offering system. It works in stacks, stack of task which is the lowest priority, stacks running and spawns all the other tasks.
At a certain point, one of these spawned higher priority tasks will preemptive the lowest priority task. At this point, stack will be at a level determined by where the lowest priority task is and it’s actually usually – all the registers necessary to preserve the context of the lowest priority task will then pushed on to the stack, hence the preemptive task starts running.
It consumes task space according to where it is and its execution. An interrupt may then occur which causes the process’ context to be pushed on to the stack. The ISR then runs and consumes more stack space and ultimately returns to the running task which runs to completion and then returns to the start of the task.
This cycle continues indefinitely with higher priority tasks preempting lower priority tasks with interrupts preempting even the highest priority tasks.
If you were to paint the stack and monitor it over, say, a 24-hour period, then most people would assume that the high water mark would be representative of the maximum stack usage, and would add say a 20% margin and declare victory.
Doing so would be so hard. To see why, let’s look at what the worst case stack usage actually is. To do this, let’s assume that we have 10 tasks numbered one to 10 and one is the lowest priority and 10 is the highest priority. Assume the lowest priority task is running and when it’s at its point of greater stack usage, it is preempted by Task 2. A context which is performed with Task 1’s registers being saved on the stack, thus the space taken up by the context switch needs to be considered.
Task 2 then runs into its point of greater stack usage where it is preempted by Task 3. Task 2’s registers are push on to the stack and Task 3 starts running. The above is repeated until Task 10 is running. When it reaches its point of greater stack usage, the interrupt that uses the most stack space then occurs. Thus, the worst case stack usage is the sum of the worst case stack usage for each task plus the worst case stack usage for the interrupts, plus the stack usage associated with the context switches.
Now, what is the probability that, one, each task is preempted at its point of maximum stack usage? And, two, is task preempted in strict ascending priority order such that all 10 tasks are sitting on this slide? I would say that the probability is very, very low. However, it is not zero. If you are making millions of tasks for example, that each is being driven for many hundreds of hours a year, then sooner or later one of them will hit the worst case stack usage. If you are to foolishly just paint the stack and assume the high water mark is close to the actual maximum, then you will all but guaranteeing disaster.
Question & Answer
Q: What effect does the optimizer had on stack size?
The answer, it can have any more enormous impact on stack usage. What makes it hard is that whereas some optimizations decrease the stack size, there are others that may increase it particularly when size optimization is used. Thus, you should always perform your stack analysis on optimized code.
Q: Do you know of any good stack analysis tools?
Well, I tried to show that performing an accurate stack analysis requires information that is highly tool chain dependent, hence you are really dependent on your tool chain vendor. I know that IAR and Keil both provide stack analysis tools and barely just tool sets. With IAR, it’s usually invoked from the Advance linker settings tab of the IDE. With Keil, you should look at the callgraph linke option that’s in place. Greenhills has a tool called gstack. That is okay. If you’re GCC, then you should investigate use of the fstack usage command. It will get you some of the information such as the stack usage of function but it’s not as good as the offerings to IAR etc.
You can perform a stack analysis using PC-lint, however, a PC-lint can’t know about details of register usage and so its responses tend to be conservative.
Q: How often do I do a stack analysis?
Well, I tend to do some rough and ready ones early on during development. Then prior to the first release, I perform a full point analysis. Thereafter, it depends on the changes. If changes have a minimal impact on stack usage, then I likely won’t redo it. Conversely, if major functionality is added to the product, then I have to bite the bullet and do the analysis all over again.
Q: What about the technique of just allocating a huge amount of memory for stack and not worrying about it?
Yeah, this is what I call the PC programming approach since Visual Studio will automatically increase an application’s stack size if it detects that is running low on space.
While I suppose this is okay if you’re running on a system with gigabytes of RAM. Still today, most embedded systems are resource constrained, also wasting memory because stack analysis is too hard just goes against the grain for me.
Q: Can I help you do a stack analysis?
Well, Barr Group has a lot embedded consulting work which covers a wide range of topics including stack analysis. So if you call the business development people, I’m sure they’ll be happy to help lighten your bank account.
Q: Do you have any tips for minimizing stack usage?
Well, that’s a great question. The short answer is yes. The complete answer would take another webinar. However, here’s a few things to look out for. Formatted IOs such as sprint uses often a lot of stack space. Making function calls with ISRs forces the compiler to stack a lot of registers, so avoid this if at all possible. Note that this is impossible. You can possibly get around it by the use of a software interrupt. I wrote an article on this topic a number of years ago called, I think, “Minimize Interrupt Service Routine Overhead”. So do you search on the web and you can find some useful techniques. Also bear in mind that to reduce stack usage, you only have to reduce the stack usage on the call tree that has the worst case usage. Very often, you can rearrange this call tree so as to significantly reduce the worst case. Another simple technique, can pay a big dividend, is to look at the data type sizes you’re using. For example, declaring in a way of 32-bit introduced on the stack or in a way of 8-bit introduced to do the job to potentially cost you a lot of stack space. Ditto for double versus floats.
Well, apparently, I’m out of time. So thank you for attending. I hope you found it useful.