Debugging: Being detective

Debugging is like a detective work. It’s an iterative process to eliminate the suspects using the clues to reach the cause. The detective work requires…

Debugging is like a detective work. It’s an iterative process to eliminate the suspects using the clues to reach the cause. The detective work requires one to think from multiple angles.  Search for clues. Take a route guided by the clues available. Sometimes it can hit the dead-end. Come back and restart. Take newer route guided by new clues.

Screen shot 2016-05-26 at 1.53.35 PM

 

Prerequisites for productive debugging

Debugging is tough because it takes multidisciplinary understanding to crack open the root cause leading to failure. When the test fails, in order to productively debug it effectively the knowledge required can be overwhelming. In reality many manage with far less knowledge than desired resulting in longer debug cycles.

Debugger needs to understand:

  • Design specific
    • Understanding requirement specification either in the form of standard specification or custom specifications
    • Design under test (DUT)’s implementation. Treat every DUT a transfer function transforming input to some other form of useful output
    • Understanding of the test bench architecture
    • Understanding of the test’s intent and expectations
    • Testbench logging
  • Generic
  • Environment specific
    • Simulator’s user interface understanding
    • Basics of the operating system’s user interface
    • Makefile, scripts and utilities

Preparation for debugging

Tests command lines in regression have minimum logging verbosity. This implies log files contain very little information. It’s not sufficient for debug. Run the test with the full debug verbosity. This will provide the additional information required for debugging.

Logging also needs to be architected in the test environment. Logging architecture is not just message verbosity classification. Most often this is a highly ignored area of the test bench architecture. It takes a heavy toll on the debug productivity.

Benchmark of good logging is that, it should be possible to accomplish first goal of isolating the issue between the DUT or testbench. The information required for this isolation is all based on requirement specification. Only when there is issue with DUT one should need waveform dumps. Waveform dumps generation takes longer simulation cycles.

Basic preparation for the debug is to have the logs with the debug verbosity and waveform dumps.

If regression status had achieved some level of stability in the past and it’s just recent changes that have lead to failures then one can also follow a comparative debug option. Where logs from the recent passing test can be compared with the current failing test to spot the difference. There can be many differences. Carefully eliminate the differences that do not matter. Focus on the differences that can potentially signal the cause for the failure. Look through the code commit history in the version control system to spot the culprit changes causing the problem. This can work as a very effective technique for the mature verification solutions with fewer changes. There are new tools being developed to automate this process. If there is no luck here proceed to some real debugging.

Definition of debugging

There is an unmet expectation leading to failure of test. Short definition of debugging is to find out why it’s not met. Expectations are not met because of mistakes.

Debugger task is to find out those mistakes. In order to find the mistakes he first needs to know, what are different types of mistakes possible? It’s almost impossible to annotate all. That’s genius of the human mind. It keeps on inventing newer ways of committing mistakes. Only an attempt can be made to classify them in broad categories for a domain. These categories act as usual suspects.

In order to zero-in on culprits among suspects clues are required. Clues guide debugger towards the right suspects. First clue is in the log file in the form of error signature. It’s a first visible manifestation of the mistake. Error signature is result of failure of check in test bench or assertion in DUT or simulator tool. This discussion will focus on checks in test bench because they contribute to bulk of failures. These checks can be broadly classified into three different types of checks in the test bench. These checks categories are the clues.

Process of Debugging

Now that we have suspects and clues, there are two ways to go about, first way is from failure error message at the end of log file to cause (back tracing) or second way is test intent to cause (forward tracing). Tracing is done using the log files, waveforms and debugger tools. It involves traversing information of different layers of abstractions to reach the actual mistake.

First step of debugging test failure is to isolate source of the mistake between test bench and design under test (DUT).  Mistakes in test bench results in generation of incorrect stimulus and/or incorrect implementation of checks. Mistakes in DUT implementation will result in incorrect response from the DUT to stimulus from the test bench.

In order to perform this first step, debug engineer should understand test intent and for given test what are the possible legal stimulus and expected response according to requirement specification.

Debug engineer using this understanding will have to figure out if the stimulus is generated and applied correctly to the DUT interfaces. Stimulus generation correctness is confirmed by using stimulus related information from log files. Stimulus application correctness is confirmed by looking at the corresponding interface signals in the waveform. If it is not correct then utilizing the understanding of the architecture of the test bench and stimulus flow through components isolate it to one the test bench components. Interactive debuggers can also be used to trace stimulus flow through test bench to isolate it to one of the test bench components and pinpoint to issues within the component.

If the stimulus is both generated and applied correctly to DUT interfaces, next step is to check correctness of response from the DUT. If the response from DUT is not correct then using the understanding the DUT micro-architecture, data flow path and control path isolate it to one of the blocks of DUT. Internal RTL debug is mostly based on the waveforms.

If the response from the DUT is correct and it’s collected correctly by test bench next step is to figure out why the check is misfiring in the test bench. This debug is similar to debugging incorrect stimulus problems in the test bench as described above.

Debug engineer while checking for correctness will have to trace stimulus and response through multiple abstractions and relate them across abstractions. For example a stimulus generated at the application level may need to be traced through transaction level to physical signal level. It should be thought out and planned out in the logging of the test bench components to ease this correlation and traversal. Recent developments in the protocol aware debuggers are easing this in standards based protocols debug. Bottom line is test bench should be architected for the debugging and be self sufficient in debug.

Does this understanding of mistake categories (suspects) or error message types (clues) can lead debug engineer directly to root cause? No. Let’s not create any false hope. These are not going to pinpoint the mistakes. However these will help debug engineer during detective debug work. So let’s look at categories of suspects and clues in bit more detail.

Similar Posts

Leave a Reply