Definition of sensible error handling verification plan

It’s very easy to identify loads of the error injection scenarios. Key is to focus on the right scenarios.  As we have seen in the…

It’s very easy to identify loads of the error injection scenarios. Key is to focus on the right scenarios.  As we have seen in the “Error handling verification of serial communication designsfailure to select the right error scenarios can lead to wastage of the valuable engineering resource and time on this activity.

Remember this the activity has high probability of going off track. A good plan is like a map that will keep it on track.

Next question, how do you know which scenarios are the right scenarios?

Definition of right scenarios for the error handling verification requires one to ask the following four questions and answer them:

  1. Is this error scenario defined in the specification?
  2. Does the DUT support this error scenario?
  3. What is the impact of this error scenario on end application?
  4. What is the probability of this error scenario happening in real life usage?
  5. How to prioritize the execution?

Objective of these questions is to direct the thought process towards right scenarios by minimizing the noise and enabling right prioritization of the scenarios in the verification plan. Scenario with has more “Yes” answers are higher in relevance.

In next few paragraphs lets look at why each of these questions are important and how to use the answers of these questions to build right verification plan and how to use it for prioritization of execution.

1. Is the error scenario defined in the specification?

One can go wild and annotate exotic error injection scenarios. But if you are working on the standard specification, then first answer this question “Is the error scenario defined in the specification?” for every error injection scenario identified.

Our focus should be on meeting the intent of the specification first. Which means the error injection scenario identified is valid only if the error detection and action on error detection is specified in the specification.

If error is mentioned in the specification, check is any reporting or recovery action defined? Many cases specification may indicate the system state will become undefined when this error scenario takes place. These cases are not as important as the ones for which specifically reporting and recovery mechanism is defined. Design state becoming unknown cannot be verified meaningfully.

Scenarios, which lead to System state becoming undefined, are usually due to clear violation of specification in peer design implementation. One general rule of thumb to remember is error injection in communication protocols is meant for detecting, reporting and recovering errors due to imperfections in physical line. The physical line is unreliable. Protection for this unreliability is built through the error detection, reporting and optional recovery mechanisms.  We do not have control over imperfect nature of physical line.  Typically specifications do not intend to protect the flaws due to incorrect logic implementations. This is something expected to be done right.

Hence unless specifically requested the error scenarios for which the detection and action on detection is not defined in the specification should not be added in the verification plan.

On request by designers or architects, if such scenarios are added to verification plan it should be tagged as implementation specific error handling scenarios.

2. Does the DUT support this error scenario?

DUT may not implement all the features defined in the specification. In such cases although the error scenario defined in the specification, contains error detection, recovery and/or reporting but it may not be relevant to current DUT verification.

Make a list of the error scenarios for which answers to both of the above questions is YES. These are the scenarios applicable to current DUT.

Among these scenarios, prioritize the error scenarios that involve a complex detection and recovery mechanisms.  One of the classic example of complex error scenarios is the one that involves retransmission based recovery logic. Retransmission logic can be fairly complicated. Typically it will also interact with the normal data transmission logic as well.  So it’s better to exercise the complex logic and/or the logic that affects or touches multiple parts of the design early. This provides the designers ample time to fix issues found. Typically these fixes have higher potential to break other functionality, which is not directly touched. Such cases being verified early provide the time required for regressing effort involved.  Goes without saying, prioritization becomes more effective by taking in to consideration the inputs from designers as well.

3. What is the impact of this error on end application?

Specification has defined it and DUT has implemented it. But the end application will never use the DUT in a mode where this specific error scenario is applicable. Such error scenarios can be deprioritized for the execution. Make list of the error scenarios that have answered YES to all three questions above.

Among these scenarios, prioritize based on impact of the error scenario for application. Some applications may have high tolerance for the error and for some other applications reliability, accessibility and serviceability (RAS) may be very critical. For example compare a desktop computer versus a server computer.  For server computer RAS is of utmost importance. Based on the type of application, impact of each of the error scenario for targeted application should be evaluated for prioritization of the error injection cases for execution.

4. What is the probability of this error scenario happening in real life usage?

This is another angle that is useful in prioritizing the cases. Specification may have defined it, DUT has implemented it and it’s applicable for the end application usage but still some errors may have very low probability of occurrence in real life. Such cases can be scheduled for the later execution.

5. How to prioritize the execution?

Execution should move from cases of the inner most circle to outer most circles. Each inner circles forms the sub-set of its outermost circle and grows in its relevance. These circles form the scenarios identified by the different criteria identified above. 

Error injection verification plan
Error injection verification plan organization for execution

 

Similar Posts

Leave a Reply