Designing firmware-accessible debugging resources into embedded systems provides a valuable supplement to hardware test and analysis tools. Think ahead about what could go wrong during hardware testing or firmware integration, and add the hardware resources needed to troubleshoot those issues.
After a presentation at a conference, an attendee told me about an ASIC design team comprised of young engineers that he led. When they had completed their design, this team leader posed this question: "Six months from now, when you get the silicon back and it does not work, what are you going to need to diagnose and solve the problems?" The team went back to their desks to review the design in this new light--finding much to reconsider.
Too many engineers assume that nothing can go wrong. This is the mindset that produces software functions that do not check the validity of their parameters and hardware modules that do not synchronize incoming signals to the clock.
The team leader with whom I met had taught his team a very important principle: Think ahead about what could go wrong during hardware testing or firmware integration, and add the hardware resources needed to troubleshoot those issues.
Such built-in debugging resources would supplement—rather than replace—hardware test and analysis tools. Hardware engineers are good at troubleshooting chips by mounting them to test fixtures and attaching probes. However, when the chip is mounted inside a prototype device, it is often at the expense of not being able to attach the test fixtures. Unfortunately, some problems will not reveal themselves until the chip is inside the actual device running the actual firmware. For situations like this, it's important for hardware designs to anticipate the need to troubleshoot during firmware integration.
This article will discuss some useful firmware-accessible debugging resources that can be designed into Field-Programmable Gate Arrays (FPGAs). Such resources are like having a logic analyzer built into the hardware. These troubleshooting resources can also be used with other flavors of chips—ASICs, ASSPs, and SoCs—but should be considered during the design phase for these devices to avoid the overhead of chip respins. FPGAs, however, offer the flexibility that allows you to add debugging resources to the chip at the time you encounter a problem
Signal Level Registers
Tip: | Provide read access to view the current state of key input and output signal pins. |
FPGAs contain input, output, and bi-directional pins for a variety of purposes, including data, addressing, control, power, and ground. Many times, those signal pins are connected to logic within the block to control handshaking, timing, proper waveform manipulation, and other tasks as necessary. Sometimes, signal pins connect a block to an external device to allow the two to communicate.
When interface problems turn up in testing, it's often helpful to be able to determine if the expected signal levels are being applied to the interfacing pins. To address this need, registers can be added to each FPGA block to track the current levels of signal pins. These signal level registers would allow firmware to monitor what is happening at the interface. Providing registers that track both output pins driven by a block and input pins driven by the external device aids in pinpointing the source of errant signals. Registers that track bi-directional pins are particularly useful for analyzing protocol problems.
State Machine Registers
Tip: | Provide a register that shows the current state of each state machine. |
As a general rule, firmware does not need to know the current state of a state machine in an FPGA. However, knowing the state can prove useful under certain circumstances. For example, one particular device my project team was testing did not respond to aborts issued by the firmware. Lacking tools to determine the state of the device, we had to resort to reviewing the Verilog code and making educated guesses about where the device might be stuck. We eventually determined that a state machine was hung up waiting for an external interrupt without also checking for the abort bit.
In this situation, if we had had a register showing the current state of that state machine, we could have discovered much sooner that the device was stuck waiting for an external interrupt that was never going to occur. Faster diagnosis of the problem would have let us start working on a resolution sooner.
The range of states for a typical state machine can be represented with only a few bits, so a single 32-bit register can track multiple state machines in an FPGA block.
MSB | State Machine Register | LSB | ||||||||||||||||
Bits | ... | 16 | 15 | 14 | 13 | 12 | 11 | 10 | 9 | 8 | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
R/W | ... | - | Z | Z | Z | Z | - | - | - | Y | Y | Y | Y | Y | - | X | X | X |
X = Current state of X state machine |
Y = Current state of Y state machine |
Z = Current state of Z state machine |
Figure 1. Register for three state machines
Figure 1 shows how three state machines can be represented in one register. In this example, the X state machine has three bits, the Y has five bits, and the Z has four. Note that the state bits for each state machine are nibble-aligned. This makes it much easier to read the hex value and get the current state for each state machine without any mental bit shifting.
As a further enhancement, an FPGA block could maintain a buffer of the last few machine states so that firmware could determine the path that led to the current state. Rather than filling up a buffer when the state machine is spinning in the same state, a counter could keep track of the number of clock cycles elapsed while in each state. With this additional detail, firmware could retrieve a trace that looks something like this:
State | Count |
1 | 1 |
2 | 1 |
3 | 104 |
5 | 1 |
8 | 23 |
Table 1. Sample state machine trace
Event Counters
Tip: | Provide firmware-readable and resettable event counters to track the occurrences of key events in the hardware. |
For troubleshooting purposes, firmware sometimes needs to peek under the hood of an FPGA to monitor internal events. For example, a laser printer generates a horizontal sync pulse to set the size of the paper to be printed. When the printer is operating properly, firmware has no need to know how many pulses occurred. But if something is wrong, there may be useful information gleaned from knowing how many pulses were generated. Maybe only enough pulses for a Letter-size sheet of paper get generated when you are trying to print on a Legal-size sheet. In this case, having a pulse counter on the signal that firmware can read and reset would help solve that problem.
Providing registers to track the occurrence of particular internal events on a chip can answer crucial questions during troubleshooting such as "Did the event occur?" and "How many times did it occur?" Each counter should allow firmware to reset it before the designated operation starts and subsequently read that register to determine if and how many times the event of interest occurred.