Learn practical and easy-to-apply process improvements that even the smallest design teams can use to make firmware easier to code, debug and test, with a tools cost of less than $600.
Susan: Welcome everyone. Thank you for joining us for the webinar on “Inexpensive Firmware Process Improvements for Small Teams.”
Slide #7 - Process Investments During Coding
What do we mean when we talk about process? In simplistic terms, "process" is all the different stages from start to finish that an engineering team follows when they are producing a product. If you are following a more traditional process, you'll start with requirements; hopefully, clear requirements; move to architecture and design; from there to coding, co-construction stages; and from there to test and eventual release. Depending on the size of your team and the industry that you are working in, you may be working in an industry that is regulated you could be using a very lightweight process or a very heavy-duty formal process. You might be using an iterative more agile process where you cycle through all the different stages until eventual release. Regardless of the process you are using, there is one stage that cannot be skipped in any development process and that is the coding stage.
Slide #8 - Why The Focus on Code?
In the webinar today, we're going to focus on the co-construction stages of the software development process. Depending on the company that you are working for, you may – as an engineer, you may have no input into the requirements. You may be handed a spec and told to make it happen. If you are lucky, there will be a dialog with the engineering team when requirements are being gathered. And for some engineers – many engineers actually – coding is fun. So sometimes, we skip right by the architecture and design portion of a project and dive straight into code. Really all this is saying, I am not judging it … I mean, I am judging it actually, but I am acknowledging that coding has to happen for any team. So any improvements you make in this stage is going to help.
Slide #9 - Improvements For Every Team
Any team that does skip by architecture and design and doesn't get good requirements and clear requirements, and doesn't focus on test, absolutely this can spell disaster for a project. But for today, we're just going to ignore all of that. There are so many improvements that can be made in these other stages. For today, we're just going to act like those don't exist and focus on improvements in the coding stage. Really because at that level, every team small and large, regulated or not, you are going to code. So really this can touch every team, an improvement here can touch every team.
Slide #10 - The Big 4
So what are these techniques we can use to improve the quality of our code while we are coding? We are going to talk about four today. And each of these four have been around forever – for decades. The first is static analysis. These are tools that can be run on source or object codes and look for potential problems with the code prior to executing that code on the target. We have asserts, which many of you have seen as an assert macro in your development environment. Asserts can be used to implement the Design by Contract philosophy. We have coding standards, just straight up simple coding standards. And code reviews, having someone else look at your code, look for potential problems. And it is a surprisingly effective way of detecting things that as the coder – as the designer and coder yourself - it's just really hard to see.
Slide #11 - Why These 4?
Why are we going to talk about these four techniques specifically as opposed to any of the other many techniques out there? Each one of the four that we are going to talk about can be adopted easily and inexpensively. So they are ideally suited for small teams with a limited budget, but they can also be used by large teams who are flush in cash. It's a great fit for any team.
They are architecture agnostic. This just means that no matter what your architecture is, you can use these techniques. You could be running a very simple foreground and background loop in your design, you could be running a massively multitasking system using an RTOS, you could be using UML state machines, event-driven behavior. Each of these techniques can be adopted into any existing process.
And finally, these techniques work. Don't underestimate the power of the combination of these four when it comes to producing robust high quality code.
Slide #12 - Why Oh Why?
If these techniques have been around for decades and they have been proven to be effective, why are we talking about them? Why are we still talking about them? The Barr Group's 2017 Embedded Systems Safety & Security Survey exposed how many engineers don't take advantage of these techniques. We didn't ask about asserts, but we did ask about code review, static analysis, and coding standards. Nearly half of the people that responded to the survey are not using static analysis of any kind. Thirty six percent don't perform code reviews and 34% don't use coding standards.
A significant number of people, who responded to this survey, are working in industries that are safety-critical. A failure in their product could cause serious harm or death to the consumer. So it's dismaying that so many engineers choose not to take advantage of these techniques. And the goal of the webinar today is trying – for people who are not using these techniques for whatever reason – we are going to try and convince you that they actually are easy to do, and can be inexpensive to do, and are effective.
Slide #13 - Time Saving Techniques
I know that for some of you listening today, the idea of adding four new techniques into your already busy schedules just sounds ridiculous. But each of the techniques we are going to talk about today are intended to either keep bugs out of your code or to catch those bugs early in the process.
The reason for this is that bugs are expensive. They are expensive in time, energy, money, and frustration levels. So the earlier you catch a bug, the easier and cheaper it is to fix. The static analysis tools and the other techniques, they may potentially be tedious to install and configure. It's not that they’re hard to do, but it's not fun to do. I won't lie to you. But once things are set up for your environment, it's done. There is very little maintenance involved with maintaining these techniques. The only exception might be code reviews, which is going to take time from the people who are reviewing. But all of them are intended to eventually save you time. Once you are over the hump – you get over the hump of setting them up and adopting them – then you save time from that point forward for the entire duration of your project. And that adds up to a fairly significant amount of time.
Slide #14 - Static Analysis
So let's talk about one of my favorite secret weapons, and that's static analysis tools. These are tools that analyze your code – generally your source code – statically before it's executed on the target. And this is a great time to find a bug or a potential problem in your code is before you've taken the time to build it, download it, and run it on your target. What these tools look for is anything suspicious:
Suspicious use of the programming language being used, code that can be vulnerable, non-portable code if that's important to you.
It can identify unreliable programming practices.
It can look at individual functions or methods and analyze your complexity, the cyclomatic complexity.
High complexity functions tend to be harder to test and maintain.
If you are running it across your entire code base, it can identify unused constructs and redundant code. So you can clean those out of your code base that helps with test and maintenance.
And they do more.
These tools go beyond what a compiler needs or wants to do. They will identify things that your compiler will be completely happy with and that are also completely legal to do in the programming language. So if you are like most embedded software engineers, you are using C or C++. And while these programs are a great fit for the embedded environment, as we all know, they allow you to shoot yourself in the foot. So having a static analysis tool to help identify the more vulnerable parts of these languages is really helpful.
Slide #15 - Static Analysis In Action (1)
So take a quick look at this code and see if you see any issues with it. This idea of a structure overlay on memory-mapped I/O is really common for embedded software folks. And there are a couple of issues with this little snippet.
Slide #16 - Static Analysis In Action (1), Issues
Were you able to see the two issues that lint identified when it was run on this code? The loss of information – the 8 bits to 7 bits – and the too few initializers for aggregate could potentially cause you great grief if you downloaded and ran this code on the target.
And I am speaking from experience because I had a quite large memory-mapped I/O structure that I was statically initializing. This has been years ago. And I left off one initialization value. I downloaded the code to the target and ran it and debugged it. It was really difficult to debug because that one value was random. It just depended on what value was in memory at that time. And I had not taken the time to configure and setup lint before I ran my code. I did take the time to setup lint after this, and it immediately found the problem. It was the too few initializers for aggregate. And I was dismayed that I had spent so much time in the lab trying to debug this when lint was just – would have just told me what the issue was.
And it is the last time that I did not configure lint for my environment before I run my code on product. Once you have those experiences, you realize how much time you can save even with these small issues. And the compiler was completely fine with it because it's absolutely legal to do this in C or C++. So once you have that experience, you realize how much time you can save. It's pretty tough to go back to not using these tools.
Slide #17 - Static Analysis In Action (2)
Do you see any issues with the way that we're accessing the array in this code snippet?
Slide #18 - Static Analysis In Action (2), Issues
Lint has detected a potentially serious problem with this code and that is the out-of-bounds pointer access for accessing one element beyond the array that we've allocated. Out-of-bounds access in C and C++ is a really serious problem and the cause of many very difficult bugs because when you're accessing out-of-bounds, you have the potential to corrupt memory for subsequent access to that memory location. Some of the worst bugs are when that corruption is difficult to detect, when that corruption can look like legal values, but they actually are not legal values, they are incorrect data. Out-of-bounds access is something that you should guard against and using a static analysis tool to do that is a fantastic way to detect these things before you've actually had to debug these things.
Slide #19 - Cyclomatic Complexity
Beyond looking at source code for potential misuses of programming language, I call those tools lint-like tools, lint was one of the original static analysis tools created back in the 60s I believe with the Unix operating system. There are also tools that can look at cyclomatic complexity in code, and this is just a metric that looks at the number of independent paths through a program source code. This was created by Thomas McCabe back in the ‘70s. Even functions and methods that are lint free, if they're highly complex, they're really tough to maintain and test. So keeping your eye on the complexity of these functions is really important to do as time goes by.
Slide #20 - Static Analysis In Action (4)
If you've been coding for any length of time, maybe particularly in the embedded space, I know you have seen codes similar to what code you're looking at now. These are functions that have a switch statement with a massive number of case statements. Within each case statement you might have another switch statement, many if / if else. These functions tend to have a life of their own and kind of grow unchecked in certain programs. So we know when we look at this code that it's not great code, but exactly what does that mean and why? It’s one thing to say it's not great, but to give specific details about why it's not great is something else.
Slide #21 - Static Analysis In Action (4), Issues
I've run a static analysis tool on this code that looks for among other things the cyclomatic complexity, which you see here highlighted in red. The cyclomatic complexity of this function – and it's not even a large function – is 29. So what this means from a testing standpoint is, to even achieve statement coverage, if you're trying to have a high degree of code coverage, you would need 29 separate test cases to achieve that. Beyond the cyclomatic complexity, it was also looking at this code for things that you see, this comic contents, the number of function return points. All of these metrics are an indication of how tough this code is going to be to understand and maintain, to test and to review. And these are always good things to keep your eye on as you're developing. Otherwise you will have a function it is just going to turn into a monster before you even know it. So keeping your eye on these things by running a static analysis tool is a fantastic thing to do, particularly the cyclomatic complexity. That can turn into a big problem for your code as time goes by.
Slide #22 - Software Assertions (runtime)
If static analysis tools are my favorite secret weapon, software assertions run a really close second. If you're not familiar with runtime software exceptions, they are statements that you add to your code that are always expected to evaluate to true. So, simple enough. The concept is really straightforward. Oftentimes these statements are implemented using the assert macro that's likely to be provided by your development environment, but they can also be custom statements defined by you. If you look at the example, the square root function, this function expects its input to be a positive real number, and we have added an assert statement at the top of this function to guarantee that this is true. So if this assert statement succeeds, great, smooth sailing. The subsequent code does not need to do any further checks for the number being positive.
If it fails – and this is one of the most powerful design benefits of using asserts – if the assert fails, it never returns, so execution control never returns to the subsequent code in this function. If you embrace this aspect of asserts, the cleanliness it brings to your code and the cleanliness it brings to your design is profound. So instead of writing multiple lines of error checking code possibly more than one location in a function, you can have a single assert that guarantees a particular assumption. And that's it, it's in one location. So notice something here. Static analysis on this function would be completely fine with the input parameter being negative because it's a double. So the assert here is a fantastic complement to the static analysis tools because it can enforce assumptions that are completely legal to do otherwise.
Slide #23 - Assert Macro
Let's go behind the scenes of the assert macro I keep referencing. In general assert macros are going to do the same thing. They're going to take an expression, evaluate whether or not it's true. If it's true, no action is taken. If the expression evaluates to false, some action is taken. This may be as simple as exiting your program, which a lot of desktop environments will do. In embedded systems, the action you take can be a little more complicated. Your development environment probably provides you with an assert macro and take a look at that. If the action that is taken on the failed assertion suits your system, then by all means you're done, just use their assert macro. If the action that is taken does not suit your system, then feel free to write your own. So in this SYS_ASSERT macro, if the assertion fails, we're calling a system whole function. This system halt function is going to do whatever you need it to do in your system. You're actually going to write this function. The one thing it has to guarantee is that you do not return execution to the code subsequent to your assert macro.
Slide #24 - Failed Assert Handling
Let's talk a little more about failed assert handling. So these are the actions that are taken once an assertion fails. So in the previous example, the SYS_ASSERT macro on a failure is calling a system halt function, the SYS_HALT function. In general what you want to do in these functions is you want to halt all activity. Likely that is going to be by disabling interrupts. In this example, I am actually passing this SYS_HALT function, the file name and line number. This isn't required. And I know particularly with filenames they can get quite big. So if you don't have the memory to support that, that's fine, you don't have to use these. What you will want to do though is leave some sort of breadcrumbs behind because we have just discovered a bug that we need to investigate and fix. So however you manage it, leave yourself some information that can point you to the offending code. In some embedded systems, you might need to shutdown portions of your system safely. It may not be appropriate to just cause a reset at any point in time.
If you are working in a system where that is necessary, you probably have already written logic to go into safe mode, so just tap into that with your failed assert handling. Once you're ready to do it, cause a reset, maybe not a hard reset, but a high level of soft reset. You want to get your system back into a safe state. You've just discovered a bug you do not want to just proceed as if nothing has happened.
Slide #25 - Exceptional Conditions vs. Bugs
In order to use asserts effectively, one of the things you need to get really clear on is the distinction between exceptional conditions and bugs. Exceptional conditions are those situations that can legitimately arise in your system. There may not be a high probability of them arising, but they are legitimate. For these conditions, you need to have a recovery strategy, and you need to be able to cope with those and adapt to those. You will not assert on exceptional conditions. Bugs on the other hand, programming errors – you will assert on bugs. You do not want to have a recovery strategy for bugs in your code when the only thing to fix them is to actually go back and fix the code. These two things – getting clear on this distinction – allows you to very aggressively assert on what can only be a bug and leaves you free to handle those things in your system that you need to handle, that are legitimate to handle.
Slide #26 - Asserts in Action, Boundary Checks
We are now armed with an assert macro, a system halt routine, and clarity about the difference between an exceptional condition and a bug. So let's look at some asserts in action. One of my favorite uses of asserts is boundary checks. This is always going to be a bug. There is never a legitimate reason to access a variable out of range. So using asserts to stop these accesses in their tracks is a great thing to do.
Slide #27 - Asserts in Action, Hardware Registers
Another way to use asserts effectively in an embedded system is to look at the hardware registers that are provided. Often times the micro controllers that we are using are loaded with registers with many more bits than we would care to know about, but some of these bits hold invaluable information. They may indicate a potential problem in the system. Some of these problems may be those exceptional conditions we talked about earlier, but some of them may indicate bugs. If they do, using an assert to guarantee a particular state, a state that we expect in the hardware is a great way to detect problems that might be really subtle, really subtle timing problems.
For example, if you are using a UART and you are controlling the transmission of a packet and you overrun your transmit buffer, if you control that and you're overrunning your buffer, assert on that. You may have a very small timing problem that shows up as a random glitch that you can never explain. If you can actually assert on those assumptions, you will be able to detect that early and solve the problem.
Slide #28 - Be Assertive
Don't be shy about using asserts to enforce your design assumptions and also to detect behavior that you are sure could never happen. In this example, where let's say you have a state machine, you're handling the states. You handle all the legal states, the states you've defined, and then you've got a default statement that you should never reach. Number one, of course you want to have this default statement there, but why ignore this if it's an indication that something has gone drastically wrong in your system. Catch it and assert. You can just force the assert to fail by passing zero, or you can have a special assert that directly calls your system halt routine. So don't be shy about using assertions for all of these assumptions. Wherever you're making one of these assumptions in your design, enforce that in the code.
Slide #29 - No to NDEBUG
If you are already familiar with asserts, one of the things you found missing in the SYS_ASSERT macro is a conditional compilation on the NDEBUG constant. I advocate that you leave asserts in your production code, in your release code, so you use them throughout development, you use them during test and you leave them enabled in your production code. The reason for this is that, let's assume that you've been effectively using asserts, while you develop you've had a long and successful testing strategy. You've caught a lot of tough bugs and you've solved those and then you go to release. When you release the production code, you disable your asserts. Let's walk through the logic of that.
What you're saying, which I don't think anybody would really claim, is that you either have no bugs left in your code or the bugs that are left in your code are trivial and can't cause you any particular problems. Neither one of these things makes sense. You are going to have bugs in your production lease code. It's just a fact of life, at least at this point in time. And the bugs that you actually are going to find in the field are going to be those very bugs that you either couldn't imagine to test, or you literally just could not create the circumstances to get them to fail.
Your consumers of your product, whether it's humans or another system, they are going to be running the code much more than you ever have during your testing strategy. So you are going to remove your asserts that would potentially catch the bugs that are some of the worst bugs and absolutely are the most expensive bugs. These bugs have made it all the way through your development and testing. And they are in the field. So to find them and fix them there, very expensive. If you're working in a safety critical field, not only can they be expensive, they can be deadly. So there is no logic that leads you to removing a search from your production code that makes sense to me.
Slide #30 - Coding Standards
Now that you have embraced static analysis tools and runtime software assertions, let's discuss coding standards. If you are working in an embedded system, you are probably using the C and C++ programming languages. And these languages are a great fit for the embedded space, but if you have read the specifications for these languages, you know that they allow for wild and crazy behavior. You really can shoot yourself in the foot with these languages. So choosing and using a good coding standard helps you avoid the pitfalls of these languages. A good coding standard that is used by your whole team can result in fewer bugs and code that is easier to maintain and to test.
Slide #31 - Top Priority, Keep Bugs Out
What actually constitutes a good coding standard? It is important to get clear on this because discussions about coding standards can quickly devolve into arguments about style and personal preference, when the discussion should be about creating robust maintainable code. So, in choosing a standard, get clear that your top priority is keeping bugs out of your code. A coding standard that contains rules that help to do this is one that you want to be using.
An easy example of this is curly braces in C and C++. It is legal to omit a curly brace, let's say if you have an if statement followed by a single statement of code, you can omit the curly brace. And in the short-term that might seem convenient to do, but over the long run for maintaining that code and potentially enhancing that code – which means someone else or yourself six months to a year from now has to go back to that code and add statements – you need to be very wary and be aware that you need to add the curly brace or you may forget, which can turn into a pretty nasty bug to debug. Always using curly braces you just avoid this whole class of problems. A coding standard that has that in mind, where its top priority is creating rules that help you avoid bugs, is definitely something you want to be looking for.
Slide #32 - Second Priority, Easy To Review
Your second priority when choosing a coding standard is one that helps create code that is easy to read. You're not going to be the only person to ever touch your code over its lifetime. And if you're doing it right, you will have other people be reviewing your code and providing you feedback. If you're on a larger team, there may be other people that are doing the testing of your code and potentially the maintenance of that code. So over the lifetime of a code base, there are going to be many different people who come in to look at the code, to try and understand the code, whether it's just to review it and provide design feedback or to fix it in the case of bugs. Having a team use the same coding standard and the same style makes all of that review and maintenance easier and more effective to do.
Slide #33 - Third Priority, Automated Enforcement
Your third priority when choosing a coding standard is one that can be automatically enforced. A good coding standard that isn't actually used by a team is not a very effective coding standard. So when possible, choose a standard that contains rules that can be automatically enforced. This enforcement may very well come from the static analysis tools that we discussed earlier. These tools can oftentimes be configured to detect and report on specific coding styles. And for those working in regulated industries, using automatic enforcement provides the documentation trail and proof that the design practice is being followed. This can even be rolled into the release process for the code, so the code actually cannot pass its release parameters unless it passes the coding standards and the coding styles that are being used.
Slide #34 - Last Priority, Personal Preference
When discussing coding standards, please resist the urge to think that your personal preferences are important; if you cannot justify your personal preference with one of the other priorities – either keeping bugs out of code, making the code easier to read and review, or automated enforcement – then you need to take your personal preference and make it a low priority because it is. Having discussions about coding standards that keep in mind those top three priorities is going to allow your team to choose a coding standard that's going to work and work for the long run, not just for the short term and not for the personal preferences or convenience of individual programmers.
Slide #35 - Code Reviews
I'm sure that most people attending today have performed one or two code reviews in their lifetime and you probably have strong feelings about reviews one way or another. Some people love them because they find bugs that are tough to find otherwise, a certain class of bugs, and some people hate them. And I understand that, you are putting your code out into the world for other people to see and to potentially criticize. Unless you have an impervious ego, that can be unpleasant.
Here's the good news. If done properly, code reviews are well documented to be effective at finding bugs and great at transferring knowledge among team members. The simple act of preparing your code to be reviewed by other people can have a positive effect. If you have any pride, you are going to make sure your code is in good shape before putting it out there for review. Things that you may have tolerated in your code as the only viewer of your code, you may remedy if you know other people are going to see it.
Here's the bad news about code reviews. If you do not choose a code review process that works for your team, it can turn into a lot of wasted time and effort and have a bad effect on morale of the team.
Slide #36 - Code Review Treasures
If you're using static analysis tools and following a good coding standard, it's legitimate to ask why code reviews are even necessary. What are they going to find that the tools do not? Reviews are unique in that a good reviewer is going to know something about the system being developed. They are going to be familiar with the requirements and possibly the architecture and design of the system that they're reviewing, the portion of code that they're reviewing. This is an invaluable viewpoint. Static analysis tools can only tell you if the code is good, solid, robust code, it cannot tell you if there is: missing functionality, or a misinterpretation of a requirement, or a violation of the architecture, an architectural assumption. Let's say your team has spent a lot of time creating an architecture for handling errors in your system, when individual programmers begin coding, they may not uphold this architecture or they may misunderstand the architecture. Even if they write code that is lint free and follows a coding standard, it may be incorrect code. Reviewers familiar with the system will be able to see these very issues.
Slide #37 - Formal vs. Lightweight
Now comes the tough part: what code review process is going to work for your team? There is no single good answer to this. It depends on the individual team and the industry that you are in. If you are in a regulated industry, you may be required to perform formal, documented code reviews. Your needs are going to be different than a small team working in an unregulated environment. You may have to try several different approaches to doing code reviews before you find one that works, but do whatever it takes to get a good code review process in place. You cannot bypass the well-documented advantages to performing code reviews. My recommendation is to use the lightest weight process possible that works. And when I say works, I mean it is adopted and regularly used by your team. I have been to companies who have a well- documented code review process in place that can hand you the process document that tells you all about it, and not a single engineer is using it ever. So stop kidding yourself if this is your team. Ditch the review process that isn't working, and come up with a new one. This can be anything from over-the-shoulder reviews, pair programming if you're into that, because any reviews are better than none.
And I'm a huge fan of the online tools that are becoming more popular. This allows a reviewer to be halfway around the world and able to do a review on your code. If your company has multiple offices, you don't have to rely only on the team in your office. You can extend that invitation to people across the globe. And some of these tools are quite inexpensive, and for small teams, you may be able to get free versions of these tools that have a maximum number of developers. There are so many ways to do code reviews. Find one that actually works for your team and take advantage of it.
Slide #38 - Best Practices
Regardless of the process you decide to use for code reviews, there are some best practices which you can follow. First things first. Avoid reviewing code that has not been run through a static analysis tool and which does not adhere to a coding standard. Reviewing is time consuming for the reviewer. You do not want to waste their time finding problems that static analysis could catch or that a good coding standard wouldn't allow. Get that done before you pass off your code for review.
It's essential to keep reviews short and sweet; all the research points in that direction. So keep reviews to, if possible, less than an hour, one and a half hours at the most, and review somewhat slowly. You do not want to be speeding through a review. So the general idea is that about 500 lines of code per hour is about right. Speeding through a review just to say you've done it and check off, you know, your checklist, kind of defeats the whole purpose of having the code review in the first place. And also do not have a multi-hour code review of any kind. Just avoid that. You're going to make everybody hate those code reviews, which is going to make them less effective.
As a reviewer, obviously, you're going to be providing feedback and hopefully, you'll be kind when you provide this feedback, but resist the urge to solve the problem. You need to provide details of the issues that you identified, but just refrain from solving the problem. That is for the designer and the programmer of the code to do after they have considered your feedback. They may discuss a potential solution with you, but don't get caught up in solving the problem in the midst of the review. Your job is simply to review and report any issues.
It is also really useful to track the resolution of issues that are raised during a code review. For review process to be highly effective, people need to see the difference it makes, so tracking changes that come about because of their feedback is very useful.
And finally, I prefer reviews that don't include any managers; any managers of the people being reviewed. If you want honest feedback from the peers of the person being reviewed, having a manager in the room can really dampen that. So, sorry, managers, if you want effective reviews, you're just not invited to the party. You can see the results of the reviews which hopefully will be fewer bugs, but you have to stay away from the actual review time.
Slide #39 - Apple's Epic Fail (#gotofail)
I have been giving you examples of these techniques in action. A nice way to put these in perspective is to look at some real world bugs that caused a lot of damage and that were extremely expensive for the companies involved and see how these techniques would have worked on some of those issues. I'm sure many of you heard of Apple's goto fail problem. It was in their Secure Socket Layer code, which is horrifying that the word "Secure" was involved in any way of the code you're about to look at. They released a security update, and it was a big - it was a big deal.
Slide #40 - #gotofail Root Cause
It was eventually discovered that the code you're looking at right now was the offending code. This is a function that was doing some error checking – it was attempting to do some error checking – using the goto fail. There was an extra statement afterwards. So as you can see, really, honestly. easy to see failure here in the code.
Slide #41 - #gotofail with Lint
I mimicked the code, the failed code, so I could run a static analysis on it, and I did. And as you can see, lint screams bloody murder about this code in every way. There's a lot of offending statements here. The gist of it is that lint never would've allowed this code to go unnoticed. In fact, even the compiler would have complained about this code, to tell you the truth. So somebody was not paying attention to their code, but lint, it just makes it so obvious that this is a problem that needs to be fixed.
Slide #42 - #gotofail with Coding Standard
If you use a decent coding standard, it also is going to talk about multiple things in this little snippet of code. It’s going to say, use the curly braces. It's going to talk about not using the goto statement. Assignments shall not be made within an expression. So all of this code would have been just much higher quality code if you were going to follow any basic decent coding standard.
Slide #43 - #gotofail with Code Review
And any reviewer worth her salt is going to notice probably just the gotos honestly, but the goto fail, goto fail? If they even do a cursory code review on this code, they're going to detect this.
Slide #44 - #gotofail Conclusions
So what happened here? How does such a severe bug in an extremely important part of the code, Secure Socket Layer, find its way into the field, all the way into the field, absolutely affected consumers? Was management to blame? Were the engineers to blame? Testers? What happened here?
Everyone was to blame for this. I personally think that engineers hold a special responsibility for this. When there are such simple inexpensive techniques that exist that have been around for decades, and they're not being used and these bugs find their way into the field, it's on us, that's on us. So, yes, you can blame management, you can blame the testers, but really, take responsibility for bugs of this kind.
Slide #45 - Obstacles
We have talked about four techniques today, none of which are technically challenging to use. Static analysis tools may take some effort to configure, but using them is simple. You run them, you review the output and make changes to your code to improve it.
The one big obstacle to adopting any of these techniques is going to be psychological. If you have never run static analysis tools on your code base, they may report hundreds or even thousands of errors and warnings. This can be overwhelming and unpleasant. These tools may report that coding habits you have been using for years – and successfully for years, probably – are wrong and should be changed. That's a tough pill to swallow, and if you cannot overcome this obstacle, you will not be able to take full advantage of the benefits of these techniques. If you stick with your old habits, you may end up in a situation where you do not have a choice in the matter.
Software engineering and certainly embedded software lately is getting more scrutiny as it finds its way into more and more devices in the field, and a lot of these devices are safety critical. So, courts are now allowing code reviews, hostile code reviews. If you're familiar with the Toyota unintended acceleration lawsuit, Toyota had to allow their code to be reviewed by the opposing counsel. So this evidence is now finding its way into courtrooms, so you're going to have engineers on the stand, testifying about code quality.
And if you're working on medical devices, you are probably familiar with the FDA's forensic lab. They have a lot of static analysis tools that they use to analyze code in a device. The FDA cannot require that you use static analysis, but they can analyze your code if something goes wrong in the field, and if they detect the problem with their static analysis tools, then you're in trouble.
And at a certain point, it's just a competitive edge. As a company, if you're producing high quality code and avoiding expensive bugs, you're going to be more competitive than your rivals who are releasing high profile bugs into the field. These bugs are expensive, but it can also affect your reputation. So just from a competitive standpoint, embracing techniques that allow you to produce higher quality code makes sense.
Slide #46 - Next Steps
If you are going to adopt any of the techniques we have discussed today – and I hope that you do – you may need to do it in stages. I recommend bringing in any new technique and making it a policy that is used on all new code and without exception. So get over the hump quickly on any new code that's in development. And I do recommend taking a hard line on this because you want this technique to just be adopted as quickly as possible, so you can see the benefits, so the benefits become apparent quickly.
On an older codebase – and I'm sure a lot of you are dealing with legacy systems – I suggest running static analysis tools on this codebase. It'll help you identify high risk modules. You can look at each of these and decide whether or not it's going to be worth it to go back and fix these modules, to re-factor these modules. You might find some severe bugs that you really do want to fix or you might find some issues in a particularly important part of the system, Secure Socket Layer perhaps. You can selectively make those decisions. This is a tough one. It's scary to go back into old code that no one is familiar with and make changes there, but you may find that you need to if you run into some pretty significant issues.
Something else that is really helpful in making changes to a process is to find a champion or two who really embrace the new techniques and the changes. Let them get good at the tools and the techniques so they can share it with the team, they can be the champions for this change. If you are going to have champions, it's a really good idea to empower them. They are going to need that backup because some people are going to resist these changes, and it's just the truth. So it will be a challenge for these champions to make changes to that process. They'll need some backup, they'll need to be empowered to do this.
Slide #47 - Tools
There are lots of great companies that work on static analysis tools and online coding review tools. Some of these tools will fit into your existing development environment. They can be a plug-in in to your IDE. Some can be tightly integrated into your source code control system. And some are just stand-alone tools. So lots of great options, you just need to find the tools that suit your team and your system.
Slide #48 - My Tools, $597.95
Before we wrap up, I'm going to tell you what my tool kit is. This is the tool kit I own. No matter what situation I go into, if I go into a company and they do not have any static analysis tools, I always bring my own. They may have some and I will certainly use theirs. If people don't have them, I use my own. I just never go into a situation where I'm hobbled by not having access to static analysis tools. My lint tool is from Gimpel Software, it's called PC-Lint. I have a one-loc license, it's $389. It is the best deal I have ever … it is the best money I've ever spent on any tool and software. Resource Standard Metrics does cyclomatic complexity, the code comment quality, all of those things that I talked about earlier. And I use the Embedded C Coding Standard from Barr Group. It has those priorities that I like, the keeping bugs out, et cetera. This tool kit? $597.95 It is tough to argue with that price tag for the benefits that you can get.
Webinar Q&A
Attendees at the live webinar had the opportunity to ask technical questions related to the presentation. Below are a few of the questions from that session.
Q: Instead of forcing the CONSTANT to be on the LEFT of comparators, why not set LINT (or COMPILER) to ERROR when an assignment is found in a comparison instruction? That is, fail to compile code with 'secret' assignments in the IF statement…
A: This is a great idea. I would still recommend using the coding standard recommendation (constant on left-side of comparison). If people are not using lint or paying attention to compiler warnings, they can still benefit from the practice.
Q: Does PC lint allow for easy configuration to enable/disable warnings?
A: Yes. There are many options for enabling/disabling warnings/errors. This is part of what you will do when you configure static analysis tools for your environment - determine what warnings/errors make sense for your project.
Q: Do you suggest reviews before or after unit tests?
A: If you are practicing test-driven development and writing your unit tests prior to writing your code, I think you can legitimately wait to do reviews until after the unit tests run successfully. If you are not consistently writing unit tests, or writing them after you code, I would not wait for these to be complete before getting reviews going.
Q: Are static analysis tools useful for interrupt based architecture?
A: Yes. Static analysis tools are architecture agnostic and are run on code before it is executed on a target. Because of this, they generally cannot alert you to “runtime” problems (memory leaks, deadlocks, etc.) but you may be able to find additional tools (dynamic analysis) that do just that.
Q: Just a note: turning on more or all compiler warnings and setting warnings as errors is a great pre-lint step. Highly recommended.
A: I could not agree more.
Q: Do you see a future when "software engineer" becomes a licensed professional title, so these best practices are enforced?
A: This may very well happen. Particularly if the software engineering profession does not improve its track record for reducing bugs before releasing products. High profile bugs, some of which have been deadly, are finding their way into courts and into the public consciousness.
Q: Any thoughts on TDD/unit testing as part of this base set of tools? We've been finding it quite effective finding errors early in the process. (Lint, Unit test, then compile)
A: First, I am impressed. If this is your process, you are way ahead of the game. I did not address unit tests today only because I had to narrow the scope to fit within an hour. I highly recommend TDD and unit testing!
Q: What do you think about anonymous code reviews?
A: I stick with my “any review is better than no review” statement. The only drawback to anonymous code reviews is that the reviewer may not know enough about the system to find the “code review treasures”. But they still may be able to find bugs and that is worth it.
Q: Code reviews of unit / other tests
A: I would first use a code reviewer’s time to review the code under design. If you have a well-functioning code review process, and the reviewers are willing, definitely have them review the unit/system tests.
Q: Can the asserts be used in place of - or to supplement - a test driven development approach?
A: Asserts and TDD are great companions! One will not replace the other. You want to be able to test that your asserts can actually fail by running tests that cause them to fail.
Q: In terms of coding standards, how do you deal with third party code/libraries?
A: This is a very tough question. Not just with coding standards but with static analysis tools. I have found very few vendors who consistently follow coding standards and use static analysis tools. If you have the option, choose vendors who do follow these practices. They will advertise this since it distinguishes them from the competition. If your vendors don’t support these practices, put pressure on them and ask them to. In the meantime, at least be aware of the issues with any vendor code you are using. Assuming you have source code, you can attempt to run your static analysis tools on it. This is a big order, but something you may need to do depending on the industry you are in. If you find severe issues with vendor code you may need to take steps to remedy this: 1) avoid using this code or 2) fix it. Both of these options are less than ideal. Changing vendor code, especially if you are going to need to get updates for this code, is not something to do lightly.
Q: What about MISRA coding standards?
A: MISRA C/C++ are guidelines whose sole purpose is keeping bugs out of code. I highly recommend reviewing/following them. They do not provide stylistic rules which is something that is also important (code that is easy to read/review/etc.). If you follow MISRA, you should also choose a style guideline for your team.
Q: Why does the failed assert cause a system reset?
A: This is part of the “fail hard”, “fail fast” or “preemptive programming” design philosophy. If you encounter a bug (or a hacker!), you want to fail in a controlled, known manner. You do not want to be at the mercy of meandering/malicious bugs. In many embedded systems you may not be able to simply reset at any time. You will likely need to take steps to put the system into safe state first. This “safe state” is unique to each system and has to be carefully designed for each project.
Q: Asserts in release code: just allow device to reset?
A: Yes. If you have embraced the idea of using asserts and you are using them properly (to safeguard against bugs or malicious activity) you want your system to take quick, controlled action. The alternative is to either ignore these bugs or write code that tries to “accommodate” these bugs (i.e. defensive programming). Neither of which I recommend.
Q: Do you recommend Design by Contract?
A: I do recommend it. For those not familiar with it, the “design by contract” theory was developed by Bertrand Meyer along with the Eiffel programming language which I believe has native support for it. The idea is to design formal, precise contracts between software modules. These contracts consist of pre-conditions, post-conditions and invariants. These contracts are then guaranteed to be upheld in the code. This guarantee can be accomplished by using the asserts we talked about in the webinar. There is a great blog by Miro Samek on the Barr group website called “Design by Contract (DbC) for Embedded Software” if you are interested in this. You can also visit the Eiffel software website for more information.
Q: If you had to choose 1 of the 4 to use, which one would it be?
A: That is a cruel question because I think these 4 techniques work so well together. But if I had to choose, I would say static analysis tools. If you have any momentum from watching the webinar, use it to choose and configure static analysis tools for your project. They can take more effort than the other 3 techniques to get going, so getting over the hump quickly with these tools is a great thing to do. Once you have your static analysis tools in place, quickly adopt the other 3.