One of the biggest challenges when architecting an embedded system is partitioning the design into its hardware and software components. Partitioning decisions must typically be made early in the design of a product. The consequences of hasty or biased decisions or lack of proper analysis can include, in the worst case: higher BOM cost, time-to-marked delays, or even an inability to meet requirements.

Slide 1: To C or Not to C?

Good afternoon and thank you for attending Barr Group's webinar "To C Or Not To C: Software Partitioning In Embedded Devices". My name is Jennifer and I will be moderating today's webinar. Our presenters today will be Michael Barr, Chief Technical Officer at Barr Group, and Tom Brooks, Principal Engineer at Barr Group.

Today's presentation will be about 45 minutes in length after which, there will be a moderated question and answer session. To send a question during the event, please type your question into the rectangular space near the bottom right of your screen and then click the ‘Send’ button. Please hold your technical questions until the end when the presenters will be able to address them.

At the start of the webinar check to be sure that your audio speakers are on and your volume is turned up. Please make sure to close out all background programs and turn off anything that could affect your audio feed.

I am pleased to present Michael Barr and Tom Brooks webinar on To C Or Not To C: Software Partitioning in Embedded Devices.

Slide 2: Michael Barr, CTO

Thank you Jennifer, and thank you everyone who is attending today's webinar. We’re glad to have you and hope you will learn a lot. I’m Michael Barr; the CTO or Chief Technical Officer of Barr Group and one of the Cofounders.

I am an experienced embedded software developer myself; having degrees in electrical engineering and more than 20 years of experience writing embedded software and consulting and training other – consulting with and training others who do.

I’ve also been a university professor and an Editor in Chief and columnist for the Embedded Systems Programming magazine. I'm a speaker around the world of conferences such as the Embedded Systems conference.

I’m also the author of three books and more than 70 articles about how to write a reliable embedded software and on related topics. And also an expert witness who has provided courtroom testimony on a number of topics including patents and software copyrights as well as the Toyota Unintended Acceleration litigation.

Slide 3: Barr Group - The Embedded Systems Experts

The Barr Group, our company, we are the Embedded Systems Experts in a nutshell. We help companies around the world make their embedded systems both safer and also more secure. So one aspect of that is making them more reliable so that they perform as they’re supposed to repeatedly, and another aspect of that is making sure that the result is a safe system and a secure system.

We do this specifically in three different ways. One is by training others like we are doing here today. Another is by providing consulting services. And finally, we help companies build their products taking on everything from the mechanical, electronics, and software to just assisting with one or more of those.

You can find us on the web at

Slides 4-5: Upcoming Public Courses

Before we get to today’s course, I just want to alert you that we regularly do trainings both at private companies and you can see on our website a list of all the courses that we offer. And we also have public trainings including some upcoming boot camps. These are our weeklong, four and a half day, hands-on, intensive software training courses.

The course titles, dates, and other details for all of our upcoming public courses are available at the training calendar area of our website.

Slide 6: Tom Brooks, Principal Engineer

And without further ado, I want to turn the presentation over here in a moment to today's course trainer and that’s Tom Brooks. Tom is a Principal Engineer at the Barr Group. He has an engineering degree from Virginia Tech and more than 15 years of experience designing embedded systems with a particular specialty in focus on the software-hardware interface. And he’ll be talking to you today about when to put certain functionality into hardware and when to put it into software. And Tom, please take it away, and thank you.

Slide 7: Overview of Today’s Webinar

Thank you Michael. I’m excited about presenting today’s webinar. Let’s get started. Today we’re going to be talking about partitioning embedded design into their hardware and software components. Where some functionality is obviously hardware and some is obviously software, there are many times that a function to be implemented as either hardware or software, and that would be the subject of today’s webinar.

We’re going to examine what factors should be considered when making these crucial decisions, and these decisions are crucial. They are often made at the onset of a project, and their effect persists throughout the lifecycle of the project. So often they’re made for the wrong reason. I've often seen designs put into hardware simply because a developer is more familiar with HDL, or put into software because the developer wants to reuse a piece of code.

We’re going to use a couple of case studies to illustrate some concrete guidelines for making these decisions the right way. This presentation is geared towards embedded software engineers, but it won’t hurt if you have a little HDL programmable logic background.

And just so we’re all talking the same language, when I talk about software, I'm really talking about low-level software, commonly referred to as firmware. I use software here because HDL is often referred to as firmware as well and I want to make the distinction clear. Software for the purposes of this presentation is C or assembly level code running on a processor. Hardware refers to gates either a programmable logic or casting in silicon. Hardware is often described by a hardware description language, HDL or Verilog.

Slide 8: What is programmable logic?

And since we are talking about programmable logic to represent our hardware, this slide gives us little background what exactly we mean by a programmable logic. It’s essentially hardware that can be programmed. It has the same functionality as hardware that is casting in silicon with a traditional IC such as a Microprocessor. These devices tend to be expensive, often very expensive depending on the functionality, volumes etc.

And there are four main vendors: Xilinx and Altera tend to be the higher end vendors whereas Lattice and MicroSemi tend to both focus on the lower end, higher volume margins. On the two main types of programmable logic, CPLDs and FPGAs, just for simplification purposes CPLDs are very low end programmable logic intended for, you know, what’s called ‘glue logic’ applications. FPGAs are higher-end with much more logic and capabilities. They are ranging from a few thousand logic gates to tens of millions of logic gates.

CPLDs tend to be non-volatile, meaning they retain their program between power cycles; whereas FPGAs are usually volatile and have to be loaded with a new program every time there is a power cycle.

Slide 9: Why Use Programmable Logic?

This slide gives a bit of foreshadowing for the rest of the presentation. Why do we want to use an FPGA? We know it has some of these major drawbacks. Cost: FPGAs are very expensive and cost is always an issue with embedded design. It has to use an external flash to program the FPGAs. This has security as well as implementation consequences. And they’re sort of a pain to develop. We are talking more about this in detail later on, but generally speaking, it requires a very specialized skill set that is not necessarily intuitive.

But FPGAs and programmable logic in general has some undeniable advantages. First and foremost is performance. Dedicated hardware to perform complicated computations is really at the heart of value add for programmable logic. It also has flexibility advantage. What I mean by that is FPGAs allow embedded designs to be adapted to many different hardware architectures.

A processor and consequently, the software that runs on it, is limited by hardware platform. For example, a processor may be limited because it only has two I²C ports. With an FPGA you can add as many I²C ports that you have I/O available. FPGAs have dedicated signal processing capabilities that far exceed that of off-the-shelf ESP, and another big advantage for FPGAs is ASIC prototype. Just about every ASIC that is designed is prototyped with an FPGA for testing purposes.

Slide 10: What does HDL look like?

The next two slides give a quick overview of HDL basics or the groundwork for hardware side of our discussion. Here we see what HDL code looks like. Remember that we are designing hardware.

One thing that is difficult software engineers to grasp when transitioning to hardware development is that hardware is, for lack of a better phrase, always running. Software starts at the top of the routine and works its way downward. With hardware you are designing in pieces that run in parallel with each other. It’s sort of like a massive parallel processing application.

Here we see we have two different hardware blocks. We have an And_Gate and then a Flip_Flop. And the And_Gate is always running so if either one of the inputs A or B change, then the output of the And_Gate changes. Similarly, with the Flip_Flop as soon as the clock goes writing it, then we clock in the input and put it on the output. This kind of gives an overview of what the hardware HDL in this case, VHDL looks like.

Slide 11: How is HDL “Compiled”?

So how is HDL code ‘compiled’ (to use the software term)? The goal here is we are going to start with an HDL file and result in a programming image that we can load into our programmable logic device.

And the first step is that we take the HDL, we run it through a process called synthesis, which takes our human readable HDL program and turns it in to a netlist. The netlist is a file that contains logic gates, connections and hard IP.

After that the netlist is then mapped into the target device which are mapped into look up tables, embedded memory and other various pieces that are specific to the target device. And then the connections between the components are then routed. A static timing analysis is performed. The placements are changed to meet timing. So this is it, but the static timing step is an iterative process. If we don't meet timing, we go back and we reroute, replace in order to improve timing.

Once that step is complete, then we create a programming image which is then loaded in the flash. The other thing I mention here is that the HDL is simulated prior to the synthesis step. This is sort of akin to unit testing in software so we test the HDL; verify that it is functional before that we do that. Simulation step takes a dedicated piece of software simulation to it. And then you can see the image there. That’s what the resulting design looks like. That’s a only place we write the programmable logic.

Slide 12: Why use Processor/Software?

So we’ve talked a bit about why you would use HDL or programmable logic for a particular application. Let’s ask the converse question, why you would use a processor or a software for a particular application?

One of the big advantages software has is that software is ubiquitous. There is a wealth of software available in the forms of operating system, freeware etc. Definitely software engineers are more available than hardware engineers. Another big advantage that processors have over programmable logic is that processors are cheap, comparatively speaking with programmable logic device.

And software has a couple of drawbacks. There is limited performance compared with HDL and limited flexibility. Software is limited by the hardware that it runs on.

Slide 13: So how do we decide which to use?

So how do we decide which one to use for a particular application? This presentation intends to address that very question. We will examine several case studies implemented in both FPGA logic and software running on an embedded microprocessor. For the FPGA implementation we use a Xilinx Virtex-6 and for microprocessor we use the Xilinx MicroBlaze processor, which is a soft core implemented in the FPGA logic.

We use several metrics to determine the results. We use performance, development schedule, cost, maintainability and power. Just as a disclaimer here, we’re not endorsing a specific FPGA vendor. We use Xilinx in this case, but the presentation is intended to be vendor agnostic so we could easily use another FPGA vendor such as Altera.

The MicroBlaze is a soft processor which is it means that it is a processor that is actually compiled into the FPGA logic. This is certainly not an overly powerful processor, but it does well for the purposes of comparing hardware versus software. It also provides an easy integration of hardware and software components.

Slide 14: MicroBlaze Processor

A bit more about the MicroBlaze Processor for those who may not be familiar with it; it’s a RISC based DLX architecture, 32 bit. As I mentioned before, they are implemented in the FPGA fabric and it runs at a maximum frequency of 200 Megahertz although in our case we’ll be running it at about a 100 Megahertz.

It’s only not as powerful as a COTS hard microprocessor such as a ARM processor in an embedded design but the general concept is still valid between hardware and software. We may be able to multiply the performance characteristics of a hard processor by a certain factor. It does the – because it is a soft core in the FPGA fabric, we do have a interface that can be used for hardware offload compiled into the FPGA device. And just to mention the alternatives since we are being vendor agnostic, the Altera alternatives for your MicroBlaze Processor is the Nios II product.

Slide 15: Case Studies

So we are going to examine two case studies for this webinar. The first one is a State Machine. We will be controlling a stepper motor with a state machine and very specific well-defined timing for this particular application.

And the second one is going to be a Network Stack. It would be a UDP network stack running on top of internet protocol, and the timing is not strictly defined for this one. So the network stack certainly has performance requirements, however the real-time hard deadlines are less strict. A network implementation cannot have hard deadlines because networking is by its very nature not strictly defined.

Slide 16: Motor Controller

Our first case study is the motor controller. We have a PWM, a Pulse Width Modulated signal that controls the speed of the motor. The motor is monitored by an analog signal that’s converted into a digital signal through an A/D converter and then fed back into our PWM controller. The signal that drives the motor is 100 Kilohertz, varies from 10% to 90% in terms of duty cycle, which controls the speed.

The design considerations that we have are responsiveness to our feedback circuit, the cost of our implementation, reliability, and maintainability.

Slide 17: HW Design Block Diagram

Here we see the details for the hardware implementation for our PWM controller. We have two majors states, the PWM is either high or it’s low. On the left-hand side we see the transition from the low to high state. So for our counter, which is constantly running, it is equal to our clock cycle period on the way transition from low to high.

Then to transition from back from high to low, we take the feedback from our ADC controller. We multiply to have the scale factor, and then we check to see if it’s equal to our counter. This sets our duty cycle to values between 10 and 90% total period of the clock cycle.

Slide 18: HW Design Implementation Details


The details for this hardware implementation, it takes around 200 Flip Flops for this implementation, around 300 Look Up Tables inside the FPGA, which accounts for approximately 0.5% of the particular Virtex-6 device that we are using the 130T.

In terms of performance, we can run this thing up to around 250 Megahertz and the feedback loop therefore updates within 4 nanoseconds. And this particular implementations takes around 2 mW of power.

Slide 19: SW Design Flow Diagram

Here we see the software implementation for our same function. We initially set the PWM circuit high and wait for the counter to be equal to the percentage set by the A/D feedback circuit. Once it is equal to that percentage then we set our PWM circuit low, and we wait for the signal to be equal to the period of our 100 Kilohertz clock.

Once it is, then we set, we get our data from the ADC, we set our percentage based on the A/D feedback, and we set our PWM circuit high and then wait for the PWM circuit to meet that percentage, and it continues a loop constantly updating our PWM circuit based on our ADC feedback.

Slide 20: SW Design Implementation Details

For the software implementation details for this particular function, this particular routine took 112 instructions. Because we were running the MicroBlaze at 100 nanoseconds, the response time is therefore 1.12 microseconds. The power consumed by the MicroBlaze for this implementation is 24 milliwatt. Now it is necessary to point out that this is the power that the entire MicroBlaze Processor consumes. If the MicroBlaze was doing other functionality then the power consumption would not increase.

Slide 21: Performance Analysis

For examining the performance of the two different implementations, our hardware design can update based on the feedback within 4 nanoseconds because the design can run at 250 Megahertz. The software takes a bit longer. It’s in the hundreds of nanoseconds, in this particular case 1.12 microseconds.

Slide 22: Development Time Analysis

For examining the development time or the schedule that it takes to implement these two different options: the hardware took me around 4.5 days of implementation, 4 hours of coding, 8 hours of simulation which is as we discussed already, equivalent of unit test within 3 days of implementation. Implementation was longer because of the build time that the FPGA tools took.

The software development time was around 2 days, around 8 hours of coding, and then 8 hours of unit test, which included my compile time. I want to point out here that the hardware development time of 4 hours for coding is only less in the software development time because hardware development is a bit fresher on my mind. So it took roughly half the time.

If software were more fresh in my mind then I think the software and hardware coding times would actually be equivalent. There is nothing inherent about hardware coding that is less time consuming than software development.

Slide 23: Cost Analysis

So let’s look at the cost now of the two different implementations. NRE cost is directly proportional to development time. So it follows then that our hardware, because it took longer to implement, is a more expensive NRE cost than is the software. In our example it took the hardware 4.5 days to implement whereas it took the software around 2 days to implement.

And let’s also look at the software design as a unit cost. We can take software and port it to inexpensive microprocessor, that is a lesser expensive microprocessor. And just to give you a few numbers here, the FPGA prices range from $50 to $10,000 where your typical embedded processor ranges from $5 to $200.

Slide 24: Lessons Learned

So what have we learned from this particular case study? The software implementation was quicker to implement, and it therefore follows that it’s cheaper to implement. Hardware is more expensive but it is better performing, and in this particular case, actually results in a lower power design.

So I think the conclusion that we draw up on this case study is that, unless the hardware solution is absolutely required for our performance reason, we would take the software implementation in this particular design.

Slide 25: Second Case Study – Network Stack

Let’s move on to our second case study, which is a Network Stack by design. What we are trying to have here is a data acquisition that has to occur every so often and we’ll say in this case 13 microseconds. And a UDP Ethernet packet will be sent to the data acquisition logic and it will contain parameters that are used for that acquisition things like sample rate, sample size etc.

Slide 26: Network Stack – HW Implementation

Let’s first look at the hardware implementation for the Network Stack. We are going to literally use a giant state machine to handle the network implementation. This is going to be a very cumbersome large design, but it will also be very efficient. We can easily saturate throughput of 10 Gigabit/second or even higher 40 Gigabits or 100 Gigabits/second can easily be saturated with a hardware implementation such as this.

Slide 27: Network Stack – HW Block Diagram

Here we see a block diagram of the hardware implementation. On the right we have our Ethernet interface. The data comes in from the wire, goes through the Ethernet MAC where it is then passed to an embedded memory. The state machine then pulls the data from the memory by addressing into it the correct data. The parameters that were going to use to the data acquisition circuit are then passed down to a register holding the data acquisition parameters that are used to actually do the data acquisition itself.

Slide 28: HW Design Implementation Details

This particular hardware implementation took around a 1000 Flip Flops to implement, around 1500 Look Up Tables, an embedded block memory to hold the data as it came in off of the Ethernet, which resulted in approximately 2% of the Virtex 6 130 device.

Performance, again, we see a very high performance with this design running up to 150 Megahertz and it can easily handle throughputs of 10 Gigabits/second. In our case we are going to be – we don’t need to get 10 Gigabits/second. We use 1 Gigabit Ethernet link.

Slide 29: Network Stack - SW Implementation

Returning our attention now to the software implementation for our Network Stack, we can state upfront that software networking is a well covered problem. There is lots of stack software available either through OS. Certainly, FreeRTOS has something which is what we used in this case.

But it’s not terribly efficient. So your normal off-the-shelf network stack at the very high end is going to get around 30% of available bandwidth. So in the MicroBlaze case we see significant – running the FreeRTOS we see significantly less than that. I have seen cases running on a ARM Cortex A9 core where we are able to get around 300 Megabits/second on a 1 Gigabit Ethernet interface using off-the-shelf network stack.

Slide 30: SW Implementation Details

So in our implementation, we are going to be using FreeRTOS running on the MicroBlaze Processor. Using this we get around the 100 Megabits/second of throughput. The footprint, the UDP stack adds around 2 kilobytes of space memory requirement and using this information, this allows our parameters to be updated every 8 microseconds.

So our software implementation meets our requirement of updating every 13 microseconds but just barely. If the microprocessor is doing something else, other tasks for an RTOS or servicing interrupt, our requirement maybe in jeopardy.

Slide 31: Network Stack Performance Analysis

So analyzing the performance of the two implementations, our hardware design can easily handle the performance requirement. We can even do things much faster. It takes approximately 4 clock cycles or 32 nanoseconds for this hardware to be able to pull up the relevant fields and feed them to our data acquisition.

Software obviously isn’t as fast. However, most embedded applications don’t need a very high throughput network stack. The software in that in most cases will work fine for networking application.

In our specific example, software alone can handle our throughput but just barely. So assuming 100 Byte packets the round throughput we need is 60 Megabits/second which is under our 100 Megabits/second that we were getting with FreeRTOS on the MicroBlaze. The critical number here really for these sorts of applications is 10 to 20 microseconds. If it’s going to be much faster than that, you are going to need a hardware implementation. If it’s slower than that your software will probably be fine.

Slide 32: Development Time

Turning our attention now to the development time, hardware design again takes longer than our software development time. This is particularly true for iterative designs, which networking implementations tend to be. You make a mistake, you go back fix it, rebuild and that’s where hardware really eats up a lot of time.

This particular design took a week to do. Very straight forward design, still took a week to do. Using the software implementation our software – using the off-the-shelf RTOS – we were up and running within a day.

Slide 33: Lessons Learned

Lessons learned from our networking example here: basically unless the blazing fast performance is needed, we should use software for networking application. There is lots of off-the-shelf software. Iterating in software is much easier than iterating in hardware. And lots of folks have software networking expertise.

Slide 34: Performance Conclusions

So try to draw some conclusions from our case studies, looking first at performance. Performance is hardware’s biggest advantage; and perhaps its only advantage, although in some cases maybe it would get a bit less power out of hardware design. In general, if your timing requirements are less than 100 nanoseconds, if they’re in the 10s of nanoseconds, you’re going to want to consider using a hardware implementation.

But software, particularly well written software, can have good enough performance. As the software performance problems are well understood. For example, if you need a network interface, that’s in the 100s of Megabits/second or 10s of Megabits/second, software is probably your best bet. And then you have some standard interfaces that are slow low throughput such as I2C, SPI, CAN and MDIO. Those are prime candidates for a software implementation.

Slide 35: Development Time Conclusions

So looking at development time for hardware, development time for hardware is heavily influenced by the build time so for FPGAs particularly if they get more complex, your build times increase exponentially. It’s not uncommon to have several hours build times. I have even seen build times stretching the 12 to 24 hour range. And the build times can vary from run to run. They are based on the random seed value. So many cases you will see one that takes 2 hours one time, it may take 5 hours the next time.

Software development time is a better understood problem. It’s not dependent upon build because builds are usually very quickly for embedded systems, you know, usually in the several minutes your builds and the build times are usually very consistent. And you can see on this slide I have an example of a design I did recently and this is actually one of the quicker ones I have done. To build an FPGA it takes, you know, about two hours even for a simple FPGA.

Slide 36: NRE Cost Conclusions

If we are looking at NRE cost, hardware development cost can be much higher than software development cost. Often two to three times as much as we discussed that cost is proportional to the development time.

In addition, hardware development tools can be very expensive. Synthesis, Place and Route, Simulation tools can run you $5-$10,000.

Software development tools are less expensive. You can get software development tools for free although the best embedded engineers often buy some tools. Green Hills has a saying that says, “you can afford free software”, and that’s something I very much believe in. None the less software tools tend to be cheaper than hardware development tools.

Slide 37: Recurring Cost Conclusions

Looking at recurring cost conclusions, generally speaking, the same functionality written in software takes less gates than a similar implementation in hardware. This is – results in a smaller programmable device being used which then results in a cheaper cost. It’s a common cost reduction technique to move functionality from hardware to software and then shrink your die size that results in a cheaper bomb cost.

In addition, FPGAs can be hardened in the hard silicon into an ASIC and software can then be offloaded to a microprocessor. We discussed earlier microprocessors, hard microprocessors, are much cheaper than FPGA costs.

Slide 38: Maintainability

So when considering maintainability or the ability to maintain your design in through production and into the future, one thing to consider is who is going to be maintaining the design? If you put something in software, typically, it’s going to be easier to support in the future. Software talent is more available than hardware talent. A recent study found that embedded software engineers outnumber hardware engineers 4 to 1.

The other thing I point out here is that hardware designs are very tightly coupled to the tools and that’s less so with software. So a software tool change can force a costly unexpected design changes in the future. Well written software is much less vulnerable to tool churn.

Slide 39: Reliability

Reliability is a big push now and when we consider programmable logic versus hardened silicon, generally speaking programmable logic isn’t as reliable as cast silicon. Programmable logic are typically very big bits of RAM, which means they are susceptible to bit errors. There are some build options in building FPGA to protect against this but at the expense of using extra resources inside the FPGA.

One thing I point here is that a software running on a soft core in an FPGA is even more unreliable than just logic running on an FPGA because it’s susceptible both due to bit errors, the RAM bit errors and software bugs.

Slide 40: Power

Power is a bit tricky to draw some hard and fast conclusions on. As we saw on our first case study, the hardware implementation actually was less power than the software implementation. Generally speaking, power is directly proportional to the number of gates. So if you can implement a function in pure gate in less gates than it takes to have a processor do the same function, you are going to get a better power performance in hardware.

The flipside of that coin is that if your processor can do many things at once then you are going go get a power savings by doing them in software rather than having dedicated hardware functions everywhere to do them. So this is a bit of – it’s design specific as to which implementation is going to have a better power characteristic. Usually the software running on a microprocessor is going to have better power characteristics than a hardware – many different hardware implementations in the same device.

The other thing I point out is that software can be put into an idle mode a little easier than hardware can. There are techniques to do this in hardware, but they are very tricky things like clock gating etc. Otherwise hardware always runs and therefore takes more power.

Slide 41: Why to HW Builds Take so Long?

We’ve talked a lot in this presentation about FPGA builds taking a very long time. So why do they take so long? Well, first and foremost, FPGA build processes have a lot to do. They have to turn the human readable HDL into gates, netlists, and they have to place and route thousands if not tens or hundreds of thousands or even millions of components inside of an FPGA. Then they have to run through an iterative timing analysis and tweak the design to meet timing and they of course run through design rule checks.

The other thing I would say also is that the FPGA vendors haven’t been vest about investing in their tools to attack these issues. That is changing something. Xilinx just released their Vivado product, which is a more efficient way of doing place and route over the older ISE tools.

And the other thing I will say is that tactics do exist to decrease build times. There are things such as core planning, effective timing constraints, etc., that you can implement to decrease your build times, but that’s a very large topic and perhaps we will discuss that in another webinar or training session in the future.

Slide 42: What about combining HW and SW?

So we’ve talked a lot about hardware and software as a mutually exclusive, but they can be combined especially when you’re developing inside FPGAs. You can start with a software implementation and then if it proves inadequate, you can add a hardware offload.

Examples of things you might do with a hardware offload are DMA engines, special instructions or multipliers. The FPGAs have hardened multipliers inside them to be used for DSP applications but they can be used for anything you want.

Slide 43: FPGA Architecture

So here we see a typical architecture of an FPGA that make use of an embedded microprocessor. The microprocessor is instantiated and handles only slow IO operation, a debug interface, and can control the hardware intellectual property, which is responsible for the very high performance IO. And there is nothing prohibiting you inside of an FPGA from adding multiple microprocessors. If you are running out of utilization, or your microprocessor is at its max capacity, add another one.

Slide 44: Hard Solutions

The FPGA vendors are recognizing this software/hardware interaction is becoming a preferred solution to solving embedded problems and they are adapting to this reality by providing hardened processors within their FPGA devices. Xilinx has their Zynq products, Altera has their SOC product, and Microsemi has their ProASIC product. All of these FPGAs have hardened ARM processors in the IC.

Here we see the block diagram for Altera Cyclone SOC product showing a hardened ARM Cortex A9 processor surrounded by an FPGA fabric. The same guidelines that we’ve discussed throughout this presentation hold with a hardened microprocessor solution. You just will get a little more performance out of the hardened ARM processors than you would with a soft processor embedded into the FPGA fabric.

Additionally, the processor being hardened frees up the FPGA fabric to do dedicated hardware tasks. Hardened core are generally a bit more reliable than a soft core. The other thing I point out is that these hardened cores inside the FPGA devices aren’t entirely new; Xilinx had a PowerPC core as a part of their Virtex-4 family some number of years ago.

Slide 45: Design for Change

So whether or not you are using hardware of software to design a particular function, it’s a good idea to design it for change, and the way that we go about that is we add an abstractions layer. In the case of software you would do that through an API or if you do it in hardware, you use it using the hardware abstraction layer. This way you can change the function down the line even in switching it between software or hardware even late in the game, end of production and even after production.

This also aids in your testing. You can have solid unit testing of your hardware or your software and it is generally considered to be a good design practice.

Slide 46: General Guidelines

Just to kind of conclude here with some general guidelines, slow speed interfaces should absolutely be done in software. General networking should be done in software unless performance is extremely critical.

Hardware can be used to offload software to increase efficiency when you are increasing performance particular implementations that can be used in hardware DMA engines, IP checksum, multipliers, etc. And hardware should only be used where absolutely necessary. Some example applications are very high end signal processing, very high throughput and high performance applications.

Slide 47: Final Conclusions

If you take anything away from today’s webinar, take this away. If you can do it in software, you should do it in software. And we’ve discussed all the myriad of reasons why software is generally preferable to hardware implementation.

Another way of saying this is only put into hardware what absolutely needs to be put into hardware. You should have a very concrete and justifiable reason for putting a particular function in hardware. If you do not have that concrete and hard reasoning, just do that in software.

Question & Answer

Q: Wouldn’t moving to a smaller FPGA result in smaller build times? These build times seem ridiculously long.

Actually, the opposite often happens. With a smaller FPGA, the tools have to work harder to fit a particular function into the available logic. And yes, the builds times are ridiculously long. Which is why software is often the preferable solution.

Q: What can be done in FPGAs to address safety-critical requirements?

There are three major ways FPGAs fail: Logic errors, asynchronous timing issues, and single bit errors. Each has their own techniques for mitigating them which could fill up an entire webinar, but a few are:

  • The tools provide a method for duplicating logic with a voting scheme on the output. If a single bit error is detected, it is ignored. This has the obvious drawback of much higher logic utilization.
  • Proper state machine design and handling of asynchronous signaling can prevent unintended behavior. Encoding state machines as one hot with default condition handling is one such technique.

Q: Can software design processes be applied to HDL design? Specifically, things such as code reviews, revision control, and parallel development?

Yes! In general, if a process is considered a best practice for software, it’s a best practice for software.

A code review can even yield benefits in the case where the reviewers don’t know HDL. The act of presenting the design and reviewing can be very valuable.

Q: What is the difference between VHDL and Verilog? What should I learn?

VHDL is a bit stronger typed than Verilog. It is therefore often used as the language that people learn with because it won’t allow you to make mistakes that Verilog will.

That said, Verilog and VHDL tend to be used in different markets. VHDL tends to be used in many government and military applications whereas Verilog tends to be used in large silicon development. There also a bit of a geographical difference where silicon valley and the west coast use more Verilog and VHDL is used in the east coast.

Q: What do you think of high level synthesis tools such as Vivado HLS or Altera OpenCL?

In general, I’m not a big fan of these for embedded designs. They end up causing “gate bloat”. A phenomenon where more logic than is required is used because of the further abstraction. Because FPGAs are so expensive, this is ends up also bloating the cost of your design.

They do have their place however. They are good for algorithm development and research-y based projects. But I wouldn’t recommend using them for embedded designs.

Q: Is hardware typically more secure than software/firmware?

FPGAs have the inherent issue that their programs are stored in an off-chip FLASH. However, the vendors have done a good job of providing security options to encrypt programming images. So in general, I’d say yes, hardware is more secure than software/firmware.

Q: Are there "open source" hardware design tools? e.g., something like "gcc for hardware"?

Unfortunately no. FPGA vendors have kept a tight lid on their internal architectures, so you are locked into using their tools to develop.

Q: What is a "killer app" (use case) for programmable logic in hardware?

FPGAs are very good at signal processing and very high performance computing. Some examples include, DNA mapping, high end Radar, massively parallel super computing, parallel sonograms.

They are also very good at ASIC prototyping, embedded systems with tons of IO, and data acquisition.

Q: As you mentioned, it's easy to find off-the-shelf software libraries/components, including many free & open source offerings for networking, crypto, operating systems, graphics, etc. Are there lots of easy to find and use open-source hardware building blocks?

In general, no. Some resources do exist, such as However, you need to be pretty weary of using these as they are often design submitted by users and vary in quality. Many large corporations have their own libraries, but are reluctant to share them because of intellectual property reasons. Individual designers will often have their own toolkit they use as well.

Q: Have you (personally) ever been in the middle of a design, and swapped out hardware for software, or vice versa? If so, can you talk about that a bit?

The one example that comes to mind is some years ago when a design needed a DHCP client to obtain an IP address. This was before embedded software cores were prevalent, so the DHCP engine was implemented in hardware. This was enormously expensive in terms of hardware and took a long time to development. Soon there after, the microBlaze was released and so I migrated the DHCP design to software running on the microBlaze. This reduced the logic utilized to the point where we were able to shrink the die size and save around $100/chip.

Remember from the presentation that this is the type of design that is ideal for a software implementation. Its network based, not performance intensive, and is iterative in nature.

Q: What recommendations do you have for a software person who wants to dive into programmable hardware? Can you recommend a good book? A good evaluation board/kit? A good online course? A website?

The best VHDL book (and HDL book in general) is the "Designer's Guide to VDHL" by Peter J. Ashenden:  It is a fantastic reference but can be a bit hard to get into for a beginner.

For a true beginner, there is an open source book called "Free Range VHDL":

My key advice for a beginner is to jump right in with a development kit and start playing. I personally like the Avnet Microboard because it has an Ethernet interface on it which really opens up a lot of options.