Tasks lists, detailed tasks lists are anchors for the development phase. A good tasks lists provides the more accurate forecasts of the development phase schedules.
The criteria to have start listing tasks is to put in place:
Verification plan consisting of test plan, checks plan and functional coverage plan
With these two documents in hand, its time to build a detailed task lists. Simple benchmark for granularity of tasks should be maximum man days. Maximum man days per task should not exceed 3 man days. Ideal granularity should be task lists of 1 man day. Create as detailed tasks lists as possible. These are as important as the verification plan and architecture. Good detailed task lists is half the job done.
Task lists should contain:
Each test should be part of the task list. Each test will have to be tracked for written and compiling, all unique variants of the test passing and seeded run for the test variants passing
Every check coding, infrastructure needed around the check for its implementation and verifying the check working should be part of the tasks lists
Each functional group coding, compiling, integration and sanity check of the integration should be items in tasks lists
For each test bench component development or enhancement, tasks with class or algorithm details should be added
Tool flow, regression flows and any automation requirement tasks should be made part of the tasks lists
Each task should be given priority and estimated man days
Sum total of all the man days of all tasks would be total minimum duration required. A simple division this total man days with the number of engineers available for the implementation would provide the initial insights into the times lines by which verification project can be completed. (more…)
Verification plan is process of translating the verification requirement specifications into verifiable description. A verifiable description has to be aligned to technology being used for the verification. This alignment helps improve the chances of success.
Verification plan is one of the key deliverables of the functional verification. Starting verification without good verification plan is strict no. Test bench architecture is dependent on the verification plan. So cutting corners on creating verification plan will directly affect testbench architecture and that will impact the functional verification quality.
One of the current popular approach to functional verificaiton is coverage driven constrained random verification. Verification plan aligned to this approach is primarily driven by three plans: Test plan, coverage plan and checks plan.
Fundamentally functional verification is about two things. First one is stimulus generation. Second is doing the response checks. In the constrained random verification approach, there is uncertainty about stimulus generated. Hence there is third dimension of functional coverage to confirm indeed all interesting scenarios have been generated. (more…)
Verification project management gave an overview of the six essential parameters and three phases of the verification project. This article will focus on applying the six essential parameters to regression phase which follows planning phase and development phase.
This is the final phase and climax of the verification. It’s a highly intense phase of verification. Regression phase duration cannot be controlled. It’s extremely difficult to plan for it and make this schedule bound. A general rule of thumb is, it will take about 1x-3x of the time it has taken for the development phase based on the quality of planning and development phase. Regression is highly reactive phase. Only quick response with the dynamic adaption is the way to closure. (more…)
Verification project management gave an overview of the six essential parameters and three phases of the verification project. This article will focus on applying the six essential parameters to development phase which follows planning phase.
Development is a steady phase of the verification project. Development schedule will remain mostly in control as long as the good job is done in planning. It should be done. A good task management system is central for the development activity. Verification team expands during this phase. Process should be put in place for productive absorption of engineers in to team. Development phase well executed will make verification team to look forward for regression phase. Development and regression phase repeat for every milestone.
Key objectives of this phase:
Get the internal BFMs development done
Get the test bench development and DUT integration done
Get the most of the test writing completed
Get the sanity tests passing for all major features
Development phase: Clarity
Verification team size starts growing. New members come in. Make sure the team is aware of the relevant parts of the specifications, verification plans and test bench architecture
Clarity about the verification strategy by every team member in verification team is extremely important
Specification experts, technical leads should present about the specifications, verification plans and test bench architecture to accelerate the ramp up
Test bench architect should have one-to-one sessions with the major block implementers to explain the abstract classes. Interfaces and functionality should be covered. The expectations should be clearly called out.
Technical lead or verification manager should then discuss the milestones and tasks lined for the milestones. The tasks should be assigned out engineers. Schedule buy-in should be obtained from engineers at ballpark level
Discussions about the abstract classes which architect has implemented for guiding the implementation and milestones should be series of discussion throughout this phase to maintain the clarity between architect, engineers and manager
Various processes, simulators, proprietary tools, pointers to documentations, tasks tracking system, bug modules, code repositories, recorded trainings, FAQs should be introduced and pointers should be passed to development verification team coming on board.
More importantly any internal methodology wrappers, reusable code libraries, code reuse from other projects should be clearly called out to the verification team
Our engineers are smart they will figure everything out. Sure, it will waste their finite precious energy on stupid things and then don’t complain about the delayed poor results where it matters. Don’t complain about the rules broken and stupid mistakes. Make it easy wherever possible so that engineers can stretch where it matters. Please do
High level verification language and verification methodologies are projected as big challenges in implementation, They are not as big challenges as establishing clarity on above points
Development phase: Metrics
Full development
Total milestones
Completed milestones
Percentage completion in current milestone
Total development man days pending
Tasks
Total tasks
Completed tasks
Pending tasks
Current milestone
Total tasks of this milestone
Completed tasks of this milestone
Total tasks per engineer this milestone
Total completed tasks per engineer this milestone
New tasks added to milestone
Development phase: People
Test bench development has some areas where you want to your expert engineers handling it. They are, in stimulus generation constrained random control event generation, in checks implementation scoreboard and bus functional models if they are being developed in house
An enthusiastic engineer should be encouraged to ensure productive development environment for everyone. This role can be played by the technical lead as well partly along with junior engineer. It involves hooking everything and getting simulations up and running, keeping check-in regressions healthy, setting up flows for compile, simulation, full regression ownership and any productivity enhancements. This is a key role and tough one. It should be well supported and appreciated by everyone in team
When team grows beyond four members create smaller teams with the enlarged scope of tasks under a theme. For example a layer ownership in protocol based BFM. Smaller teams or would call them tribes work very well. This also creates the support system by creating the knowledge pipelines. It helps in covering one another during vacations or attritions
Group the related tasks and help engineers build the expertise. They love to meaningfully develop themselves. Help them to develop mastery and become better
Verification lead should be available any time when needed by verification team for discussions, brain-storming sessions, to help engineers when they get stuck with some problems, clearing day-today issues or working knowledge related questions. This will ensure development engine keeps running at full steam
Load balance based on the tasks per engineer to avoid burn out of the engineers
Development phase: Tracking
Development is also creative work. Some of the blocks need time to build them well. To make the code reusable. Use the medium level of follow up for best results
A good task tracking system is key to development management. Task management system should be able to generate most of the metrics listed automatically
It’s not just sufficient to set up the tasks. Periodic scrub of the tasks is equally essential. Scrub to discover if tasks are blocked for information or dependencies. Keep clearing obstacles to ensure continued progress
Tasks lists are mainly owned by the verification manager or verification lead. Encourage team to add new tasks discovered to task management list. Task lists are also assets. Task lists provides good understanding of work to be completed and load on engineers. Tasks worked without being on tasks management system can mislead the total project work and individual engineer’s workload
Weekly meetings should be utilized as opportunity to assess how the closure on the current milestone is coming up. This should be used as opportunity to bring up the surprises and to identify the areas that need more attention
Not every development task or engineer requires equal attention. Identify critical tasks and complex tasks that need attention. Verification lead should pay more attention to those tasks
Verification leads should invest more time with the engineers who need it rather than disturbing star engineers. They will take it to closure to faster if they are left undisturbed and they will appreciate the autonomy
Blocks that are slipping schedules or getting many new tasks than average across other blocks are also candidates for the special attention. This is the area for architects to review as to what is missed in the initial assessment. It may be possible major chunk missed in architecture and engineers are discovering it in small pieces at time. Architects should analyze and get to root of it
Development phase: Review
Code reviews for every code check-in is great but may be an overkill. Of course nuclear weapon can also kill mosquito if you have that luxury. Code reviews at certain milestone points are very productive. Invest in the code when it has proven to be useful by meeting functionality of the milestones. Premature code reviews are waste of precious energy
Code reviews are best done as a code walkthroughs between developer and reviewers. This provides opportunity to developer to provide the context behind the functionality of the code. This ensures the time is optimally utilized for everyone and best from the code reviews can be extracted
Code reviews are also opportunities for providing short customized feedback to the developers on the areas for the improvements and learning’s
Code review actions should not be left floating in email or code review system. Code review actions which require more than couple of hours of time for fixing should go in the task tracking system
Alignment with the architectural intent, functional correctness, adherence to standard coding practices and ease of understanding and maintainability should be accessed as part of review
One of the ignored parts of verification planning is verification of checks itself. This should be questioned as part of the review
Development phase: Closure
All the developed code should pass the sanity tests. It’s understandable that the detailed verification will be carried out in regression phase. All code developed should be compiling and sanitized with the basic tests.
All critical code review actions should be addressed. If its not completed the same should be added to the task tracking system
Compile warnings and run time warnings should be periodically reviewed and should be fixed unless its an approved as exception. These lead to failures to be painfully debugged as programming errors
Any code limitations or parts of the features planned but not implemented should be documented. Tasks to fix these later should be added to the task tracking system
Some times even when milestone approaches not all the tasks planned may be completed. Make an evaluation if the milestone needs to be extended or remaining tasks be moved to next milestone. Some times, it may make sense to move remaining tasks to next milestone and adjust the schedule accordingly rather than squeezing it and doing a poor quality job in rush. This provides the closure and provides sense of progress
Verification project management gave an overview of the six essential parameters and three phases of the verification project. This article will focus on applying the six essential parameters to planning phase.
Planning is very important phase of the verification project. Verification plan and test bench architecture are created during this phase. It sets the foundation for the development phase and regression phase. Entering execution with weak plan is like taking a route of slippery slope. Once taken there isn’t much chance of recovery and it takes you down for deep fall.
Get the detailed tasks list for the test bench development
Milestones lists and division of the tasks per milestone
Next let’s look at what each of the key management principles identified in the verification project management translate to in the planning phase.
Planning phase: Clarity
Following four elements should be clear during planning phase. Overall work scope clarity needs to be established during this phase.
Scope understanding: Scope of the verification is based on the specifications. Key prerequisite for planning phase is specifications. Specifications mean both the standard requirement specifications and implementation specifications. Most of the today’s designs deal with multiple specifications. It’s very important to identify the list of all the specifications applicable.
Specifications understanding: Clear understanding of the specifications both the requirement specifications and micro-architecture specifications
Verification strategy understanding: A clear verification strategy needs to be established based on the specification scope and understanding. This forms the basis for the verification plan and test bench architecture creation
Team understanding: Clear list of the contact persons in architecture, design and verification team at one place
Planning phase: Metrics
Metrics during planning phase are meant to provide the idea about the completeness of the verification plan, test bench architecture and development task lists creation.
Total number of features
Verification plan completeness percentage
Number of tests identified
Number of checks identified
Number of the functional coverpoints identified
Test bench architecture completeness percentage
Overall architecture completeness
Major blocks of test bench
Total number of major blocks identified
Total number of major blocks for which interface and functionality is defined
Abstract classes
Total number of classes identified
Total number of abstract classes coded and compiling
Development tasks lists
Total number of blocks for which task list is identified
Total development tasks
Planning phase: People
Best of your experienced engineers should be working on verification plan and test bench architecture development. There have been misconceptions to get this done by the relatively inexperienced engineers with perception that all we are producing is a document finally. This is totally a wrong perception to follow
Technical leader of the project should be involved in this activity very actively. He should be one of the key contributor to verification plan and test bench architecture
Engineers with natural attraction for the engineering depth are well suited for this activity. Both the verification plan and test bench architecture requires someone to put his heart into activity and go deep in the details. Figure out as many things as possible
Engineers working on plan should be allowed to think theoretically enumerating all the verification scenarios without worrying about the resource and schedule content. Think theoretical and execute practical approach should be followed
Planning phase: Tracking
Planning phase involves one of the highest levels of creative work. Out of box thought process may be required to set the verification strategy. Good test bench architecture requires fairly deep and creative ideas. Make sure it’s not rushed through. It’s easy to rush and create half-baked plans as it’s difficult to judge if it’s complete in early stages of the project
Put your best team in action and trust them to use their best judgment. Agree we do not want paralysis by analysis situation but at the same time high pressure to close can result in premature closure. Premature closing of planning phase will result in costly reworks later. Its pay now or later with huge interest rates
Put your team working on planning and architecture in an island. Talk to them only when needed. Allow the team members to get into flow state for best work to emerge. Instead of weekly consider biweekly or monthly status tracking of the metrics identified.
Planning phase: Review
Verification strategy should be carefully reviewed. It sets the basis for the verification plan and test bench architecture
Verification plan structure review has to be conducted periodically. This ensures no big ticket items are missed
Review of the verification strategy, verification plan and test bench architecture should be conducted with the architecture and design teams
Verification plan should be executable. It should not be limited to a very high level plan. A junior verification engineer should be able to pick up the test plan and should be able to write a test, implement a check from checks plan and write cover points from functional coverage plan with very little interaction with the verification plan owner
Verification managers should pick a feature and question it for its consistency in thought process across all three test plan, checks plan and coverage plan
Architecture should be questioned for flow from normal operation, error injection, control event generation and processing, various checks implementation and ease of functional coverage collection point of view
Alignment of the verification plan and test bench architecture to the verification strategy should be reviewed
Review actions should be tracked and closed. It’s possible that certain areas are going to remain open for a while till the further understanding is developed. This is fine as long as they are clearly documented and tracked
Planning phase: Closure
All the specifications applicable are identified and made accessible in a global shared location for verification team
Code repositories for development, email aliases for the communication and discussions, Task tracking system for project and bug database are setup
All three plans test plan, checks plan and coverage plan constituting the verification plan in executable form should be checked-in under code repository
Detailed test bench architecture in place and checked-in under code repository
Reviews of the verification strategy, verification plans and test bench architecture completed. Major review items addressed and open items documented and put in a task tracking system for closure
Tasks list for all the development activities captured in the task management system. Tasks grouped under the functional milestones
Schedule forecasts made created based on tasks lists. Based on the forecasted schedule, deadlines and current team availability hiring or external contractors required for the next phases should be identified
Decisions regarding the exploration and evaluations of third party Bus functional model(BFM )s, verification solutions or tools should be short listed
Simulator licenses and compute resource requirements should be checked for sufficiency based on the forecasted verification plan’s requirements. If additional upgrades are needed the procurement process should be started
It’s possible verification plan and test bench architecture is not 100% complete. It’s hard to achieve that unless its repetition. Focus during planning phase should be on completeness of major sections in verification plan and major blocks in test bench. Second level of sub-sections in verification plan and details of blocks in test bench architecture can be complete any where between the 60-80 %. This provides good visibility and rest can be figured out as part of development phase. This achieves balance between waiting too long vs. starting too early
Verification project management is a set of process and principles to achieve high quality verification results. These process and principles are anchored around six essential generic parameters. Six essential generic parameters are Clarity, People, Metrics, Tracking, Review and Closure. Verification project consists of primarily three phases. They are planning, development and regressions. All six essential generic parameters are applicable to all three phases of verification.
This article is organized as:
Brief introduction to six essential generic parameters
Brief overview of three phases
Brief introduction on people management
Brief introduction on role of Verification manager
Applying these managmement concepts in all three phases
Clarity:Identifying the aspects that should be clear. It can be clarity of goals, tasks, timelines and process. Clarity is not absence of ambiguity but a constant battle against the ambiguity.This is asingle most important factor that keeps team motivation levels up and translates to results.
People: Right person for right job is half the job done. Bad engineers are rare. Most are just mismatches in assignments.
Metrics: Management guru Peter Drucker quoted “If you can’t measure it, you can’t manage it”. Metrics provide indication of how the project execution is progressing. Right metrics provide insights to improve the execution.
Tracking:Tracking is a process of collecting metrics and using metrics to tune the processes to achieve the desired results.Tracking provides the push required for closure.
Review: One of the popular Russian proverb says “Trust, but verify”. Review is a process to guard the quality in each of the tasks to achieve overall quality goals.
Closure:ABC of verification project management is “Always be closing”. FunctionalVerification is never ending process. Unless you close it, it will go on.
ABC of verification project management
Three phases of the verification project:
The three major phases of the verification project are planning, development and regressions.
Planningphase primarily focuses on creating verification plan and test bench architecture. Verification plan consists of test plan, checks plan and coverage plan. Based on verification plan and test bench architecture detailed task lists are created.
Developmentphase focuses on building the bus functional model(BFM), test benches and tests. HVLs and Verification methodologies play a major role in building these.
Regressionphase is about executing all tests and its variants to catch the issues and meet the coverage goals.
These phases do not have very precise boundaries. Only the focus moves from one phase to another during project cycle. The phase under focus gets more attention and resources.
Planning phase gets highest attention during the initial part of the project. Planning activity mostly completes about 70-80 % during planning phase. Remaining part of planning activity completes within first 50 % of the development cycle.
After the initial planning phase completes, the verification project execution is divided into multiple milestones for making feature verification time bound. Every milestone starts with brief planning activity mainly consisting of scheduling of tasks subsequently it’s dominated by development and regression phases. Milestones early in verification project are development dominated and towards end it’s dominated by the regression phase.
People management
People management aspects are not covered in detail as part of this article. This is the only paragraph on people management aspects. Quoting some of points from Dan Pink’s TED talk on puzzle of motivation. Extrinsic motivation works great for left-brain tasks made up of clear set of rules and have a single solution to problem. Functional verification does not fall in that category. It does not mean there aren’t any tasks of this nature. But it’s dominated by the tasks that require some level of creativity and ability to deal with ambiguity.
Extrinsic motivators of the type carrot and stick alone cannot provide results. Management is for compliance. Self-direction is for engagement. New age mantra for driving teams is to create environment of autonomy, mastery and purpose.
Ending this topic, by quoting Sahil Gupta, aspire to be a leader whose legacy is not just the products delivered but also the teams created. These form principles for people management.
Verification manager
Verification manager plays a key role in the verification project management. Look at the composition of the successful verification teams to understand the roles and responsibilities of the verification manager further.
One of the key responsibilities of the verification manager is to define, refine and improvise process to achieve sustained high quality verification results. We have developed a framework to help you do a quick audit of your constrained random stimulus. This will help you make your stimulus work hard for you and helping you meet your verification quality goals.
Take care of process and results will take care of themselves. – MS Dhoni (Captain of Indian cricket team)
Project manager is an owner. An owner who needs to ensure the quality of the verification meets the requirements and results are delivered in time. Project should not be organised around project manager. Project manager should organise project around sound principles and processes.
There are multiple phases of the verification project. Each phase has it’s own challenges. Each phase has it’s own of best practices suitable for it. Each of the generic essential parameters identified should be viewed in the light of requirement of verification project phases. Be it tracking, metrics used, reviews conducted, follow up styles should be adapted to meet the objective of respective phase.
Inspite of all the efforts some times its challenging to meet the verification quality goals. Our services can help you quickly spot the coverage gaps in your constrained random verification.
Let’s now look at how to apply six essential generic parameters in each of the planning, development and regression phases.
Debugging is like being detective. Debugging is iterative process of using the following clues to close on one of the suspects. Error messages in the log files acts as clues.
Error message is result of check failure. There are three broad categories of checks in test bench and BFM. Accordingly there are three different types of error messages relating to them. It’s the error message, which acts as first clue to start the debug process.
Term event below is used to mean any form of information exchange.
Three different error message types are, error messages from
Immediate check failure resulting from event in test bench or BFM or DUT
Timeout check failure waiting for event in test bench or BFM or DUT
Global watchdog timeout check failure waiting for the end of test
Ideally failures in the third category are sign of inadequate checks in the test bench and BFM. Price for this weakness is increase in debug complexity.
1. Immediate check failure
This is check failure immediately follow an event. Like checks done after receiving the packets. This category of failures provide clearly clue about mistakes. Check failure message calls out the expectation from event and what was actual event. This can ease the debug significantly. Sometimes this type of failures can be direct point to bug in the design. For example consider BFM flagging CRC failure in the packet received from DUT. Assuming BFM has clean CRC logic, it’s directly pointing at incorrect CRC implementation inside the DUT.
In spite of clarity of direction, it’s still advisable to check the configuration and stimulus for their correctness. For example from configuration check one may find out CRC was disabled in DUT but BFM was not configured for it.
Before filing the DUT bug:
Check if the configuration is legal,
Check if the stimulus is as per the specification
Check response is correctly detected by the BFM
2. Timeout check failure
This failure is not result of immediate event. This is result of some event in the past. For example, consider a stimulus generation, which has to wait for response before generating the next stimulus. When the response never turns up the timeout check failure error will result. This check is failing for stimulus event generated in the past.
Ideally every wait should be covered with the time out whether it’s required by specification or not. Because any wait can potentially become infinite wait. As additional safety measure also put a debug verbosity print indicating what event is being awaited.
On the timeout check failures waiting for event things do before filing the bug are:
Check if the timeout value configured was correct, most of the times shorter value leads to false timeouts. Too large value of timeout leads to wastage of simulation cycles.
Sometimes it’s possible the event being awaited happened before the wait thread started, look for it in the logs and waveforms.
Check if the stimulus provided was timed correctly
3. Global test bench watchdog timeout
Ideally only the timeout due to end of test (EOT) conditions should be covered this timeout. EOT is made up of end of stimulus generation and end of stimulus execution.
Timeout for “end of stimulus generation” should be implemented in the stimulus generators or test. When its not done it will be caught by watchdog timeout. Penalty is longer simulation time to failure and harder debugs.
Timeout due to “end of stimulus execution” is right candidate for this timeout. Any end of stimulus execution, which involves multiples interfaces, may not be possible to predict and set up specific timeout. This type of waiting for settling down of multiple interfaces interaction can be caught by this time out. For example, waiting for scoreboard to signal the end of test.
On the watchdog time out failures things do before filing the bug:
Check if the timeout value configured is sufficient. As the development progresses this timeout value will have to grow
Check if the timeouts are due to end of stimulus generation. This can be done checking for expected stimulus from test and cross checking in the log files if the specified stimulus is generated. If not look for the wait condition in the stimulus generation sequences, which are not guarded by timeout. Add the time out and follow steps suggested in Timeout check failure waiting for event debug
If the stimulus generation has competed. Timeout is due to stimulus execution buckle up for hard ride. Take it step at a time. Understand the test intent and the DUT responses expected for the stimulus provided. Check them one by one to see where it has lost the link in the chain. These failures in well-architected test bench will be multi interface interaction issues. Adding specific timeout checks may be complicated and may not have sufficient ROI. If a check can be added to prevent the pain of debug go ahead and add it in the appropriate test bench component.
Debugging is like being detective. Debugging is iterative process of using the clues to close on one of the suspects. Error messages in the log files acts as clues.
Goal of debugger is to find those mistakes which are manifesting as a failure. Although not all the mistakes can be annotated following are three major categories of the mistakes. These are the usual suspects. Closing on culprit among suspects using the clues is the goal of the debug process.
Following are the three broad categories of the mistakes or the usual suspects:
Debugging: The usual suspects
1. Misunderstanding of requirements
Misunderstanding of the requirements can lead to mistake in DUT, test bench or bus functional model implementation. Misunderstandings in one of the design or verification area will result in failure.
The misunderstandings could be simple byte packing order in a packet or complex behavior during a corner case error recovery scenario. If there is same misunderstanding in both design and verification it will not result in error. That is reason why there is emphasis on keeping design and verification teams separate.
Misunderstanding of requirements apart from resulting in incorrect implementation can also result in missing or partial implementations.
Many a times at the start of the development not all the possible cases are thought out. Only some of them are implemented. As the development progress, they are rediscovered again through painful debugs.
Misunderstanding requirement can manifest in many forms. Some of them are incorrect constraints, a flag set and forgotten to reset, inconsistent updates to a data structure, missing condition and incomplete understanding of possible cases, one extra or one less iteration etc.
Interactive debuggers bundled with the simulators are also very useful in debugging this type of errors.
Sometimes, misunderstandings in the requirements have to be resolved through the discussions between design and verification teams. Resolutions should be viewed from the point of view of the how it affects the final application. Resolution in case of ambiguities should help the end application to meet its objective.
2. Programming errors
Bulk of failures are contributed by this type of mistakes. It’s close to impossible to annotate all the programming mistakes. It could be as simple as incorrect data type usage leading to data loss, which may be simple to spot. Others can be premature termination of threads, which may almost seem like well planned conspiracy against developer.
Programming errors are due to misuse of the language constructs, verification methodologies and reusable components. Current popular HVL like System Verilog has LRM spanning over five hundred pages. It takes long time to master it. System Verilog is a HVL built on HDL with OOPs support poses its own challenges in understanding when HDL domain constructs interacts with HVL domain constructs. For example, System verilog threads concept is from HDL world and does not behave in OOPs friendly way.
HVL programming also involves dealing with concurrency and notion of time. So even simple programming such as setting a flag variable, is no longer just about setting a flag, it should be set at right time by the right thread. Add to it another dimension of the object-oriented programming. Dynamic objects getting created and destroyed. Setting it in right time using right thread in right object. Too many rights make it difficult it get it right.
Current popular verification methodology such as UVM has more than three hundred reusable classes to digest. It’s certainly not easy to master these. Concepts like phasing become complicated due to legacy phasing and new phasing concepts operating together. Some concurrent, some bottom up, some top down can only make one fall down.
Most of the code written is by copy and paste. That’s because lot of it is just boilerplate code. This also increases the chances of mistakes, which are hard to notice.
Incorrect usage of reusable verification components is another source. Insufficient documents and examples for the reusable code makes reuse highly buggy in nature.
Even when there is programming error, it does not jump out as programming error. It’s hidden behind layers of the translations.
Thought process for verification engineer starts with the understanding the application world. Application world is abstracted to the test benches. Test bench implementation is mapped to verification methodology base classes and HVL code. Now there are series of the transformation have taken place.
Debugger will have to peel these layers one by one to discover the issue. It requires one to map the problem symptom showing up at different level of abstraction to programming mistake deep buried somewhere.
Typically programming error debugging can be done effectively with the interactive debuggers provided by simulator. Those allow the classic software debug environment such as ability put breakpoints, single stepping, being able to see variable values, object contents, active threads visualization etc.
Also simulators provide switches that can dump additional debug information to provide insights into problem. For example incorrect constraint usage failures are assisted by providing the information about various class property values being dumped into log at the point of constraint failure.
3. Operational environment problems
These are set of mistakes in using the operational environment setup. These could be mistakes committed in the Makefiles used for building, compiling and simulating code, scripts for productivity, setting up libraries of reusable internal or third party vendors components, simulators and other tools etc.
GNU make issues can manifest as new code changes not reflecting in simulation. Leading to same error showing up again even after fix. Check the code picked up by the compile to see if the new changes are reflected. Linking issues can show up at times due to issues unknown. That’s why a good clean target is as important as build targets. This will ensure many unproductive issues are kept away. Makefile and rules organization can reach crazy levels complication. One simple point to keep in mind is inside all the make black magic, two important commands can guide debug. They are command for compile and command for simulation. Make utility provides special switches to gain additional insights. Make is a different world by itself.
Perl, Python or TCL scripts used for productivity can report incorrect data or do an incorrect generation. Always know a way to create results or generate them manually. Manual results can be used to match with the data reported or generated by scripts to gain insights for debug.
Rare but at times the simulator’s mistakes may also get discovered. Simulator behavior may not be in compliance with the LRM. These can be hard to debug and lengthy to resolve.
Debugging is like a detective work. It’s an iterative process to eliminate the suspects using the clues to reach the cause. The detective work requires one to think from multiple angles. Search for clues. Take a route guided by the clues available. Sometimes it can hit the dead-end. Come back and restart. Take newer route guided by new clues.
Prerequisites for productive debugging
Debugging is tough because it takes multidisciplinary understanding to crack open the root cause leading to failure. When the test fails, in order to productively debug it effectively the knowledge required can be overwhelming. In reality many manage with far less knowledge than desired resulting in longer debug cycles.
Debugger needs to understand:
Design specific
Understanding requirement specification either in the form of standard specification or custom specifications
Design under test (DUT)’s implementation. Treat every DUT a transfer function transforming input to some other form of useful output
Tests command lines in regression have minimum logging verbosity. This implies log files contain very little information. It’s not sufficient for debug. Run the test with the full debug verbosity. This will provide the additional information required for debugging.
Logging also needs to be architected in the test environment. Logging architecture is not just message verbosity classification. Most often this is a highly ignored area of the test bench architecture. It takes a heavy toll on the debug productivity.
Benchmark of good logging is that, it should be possible to accomplish first goal of isolating the issue between the DUT or testbench. The information required for this isolation is all based on requirement specification. Only when there is issue with DUT one should need waveform dumps. Waveform dumps generation takes longer simulation cycles.
Basic preparation for the debug is to have the logs with the debug verbosity and waveform dumps.
If regression status had achieved some level of stability in the past and it’s just recent changes that have lead to failures then one can also follow a comparative debug option. Where logs from the recent passing test can be compared with the current failing test to spot the difference. There can be many differences. Carefully eliminate the differences that do not matter. Focus on the differences that can potentially signal the cause for the failure. Look through the code commit history in the version control system to spot the culprit changes causing the problem. This can work as a very effective technique for the mature verification solutions with fewer changes. There are new tools being developed to automate this process. If there is no luck here proceed to some real debugging.
Definition of debugging
There is an unmet expectation leading to failure of test. Short definition of debugging is to find out why it’s not met. Expectations are not met because of mistakes.
Debugger task is to find out those mistakes. In order to find the mistakes he first needs to know, what are different types of mistakes possible? It’s almost impossible to annotate all. That’s genius of the human mind. It keeps on inventing newer ways of committing mistakes. Only an attempt can be made to classify them in broad categories for a domain. These categories act as usual suspects.
In order to zero-in on culprits among suspects clues are required. Clues guide debugger towards the right suspects. First clue is in the log file in the form of error signature. It’s a first visible manifestation of the mistake. Error signature is result of failure of check in test bench or assertion in DUT or simulator tool. This discussion will focus on checks in test bench because they contribute to bulk of failures. These checks can be broadly classified into three different types of checks in the test bench. These checks categories are the clues.
Process of Debugging
Now that we have suspects and clues, there are two ways to go about, first way is from failure error message at the end of log file to cause (back tracing) or second way is test intent to cause (forward tracing). Tracing is done using the log files, waveforms and debugger tools. It involves traversing information of different layers of abstractions to reach the actual mistake.
First step of debugging test failure is to isolate source of the mistake between test bench and design under test (DUT). Mistakes in test bench results in generation of incorrect stimulus and/or incorrect implementation of checks. Mistakes in DUT implementation will result in incorrect response from the DUT to stimulus from the test bench.
In order to perform this first step, debug engineer should understand test intent and for given test what are the possible legal stimulus and expected response according to requirement specification.
Debug engineer using this understanding will have to figure out if the stimulus is generated and applied correctly to the DUT interfaces. Stimulus generation correctness is confirmed by using stimulus related information from log files. Stimulus application correctness is confirmed by looking at the corresponding interface signals in the waveform. If it is not correct then utilizing the understanding of the architecture of the test bench and stimulus flow through components isolate it to one the test bench components. Interactive debuggers can also be used to trace stimulus flow through test bench to isolate it to one of the test bench components and pinpoint to issues within the component.
If the stimulus is both generated and applied correctly to DUT interfaces, next step is to check correctness of response from the DUT. If the response from DUT is not correct then using the understanding the DUT micro-architecture, data flow path and control path isolate it to one of the blocks of DUT. Internal RTL debug is mostly based on the waveforms.
If the response from the DUT is correct and it’s collected correctly by test bench next step is to figure out why the check is misfiring in the test bench. This debug is similar to debugging incorrect stimulus problems in the test bench as described above.
Debug engineer while checking for correctness will have to trace stimulus and response through multiple abstractions and relate them across abstractions. For example a stimulus generated at the application level may need to be traced through transaction level to physical signal level. It should be thought out and planned out in the logging of the test bench components to ease this correlation and traversal. Recent developments in the protocol aware debuggers are easing this in standards based protocols debug. Bottom line is test bench should be architected for the debugging and be self sufficient in debug.
Does this understanding of mistake categories (suspects) or error message types (clues) can lead debug engineer directly to root cause? No. Let’s not create any false hope. These are not going to pinpoint the mistakes. However these will help debug engineer during detective debug work. So let’s look at categories of suspects and clues in bit more detail.
Full regressions are ideally are required for every change. But it’s not practically possible in many real life use cases. Role of regression is to keep all passing tests in passing state they have to be run periodically and check their status. If some of the tests found failing, fix them and get back to passing state. This process is called regression. Typically combination of full regression and check-in regressions are used to maintain the health of overall regressions.
Before a test is declared as passing it should be exercised in all the applicable configurations and test benches. If there are multiple test benches care should be taken to exercise in all the different test bench areas. After all these qualifications if the test is still passing it becomes part of the periodic regressions. Basically regression is a process of protecting what has been already built.
Full regression is run with regress list. A regress list contains the series of commands that can be run in a batch mode to run all the qualified tests. It will contain both the directed and constrained random tests. Unless it’s strictly directed test, it should be seeded appropriately. A strictly directed test has fixed configurations and fixed stimulus sequence. In a constrained random environment it’s rather hard to write a strictly directed test. Directed tests of the constrained random environment randomize the configurations but keep the stimulus sequence fixed. Based on the state space coverage a test is providing the test should be seeded. Seeding constrained random tests helps extracts full value from it.
Typically full regressions are run with the seeds. Number of seeds across tests can be scaled according to value of test, license and compute resources availability.
Who should run the full regression?
Typically single owner in verification team owns full regressions. The results published by the regression owner should be used as official number for tracking status. Regression phase project management is really challenging. Key decision on exceptions to rules laid have to be made to meet the critical milestones.
Considering the sheer size of the full regressions it’s not feasible for everyone to run full regressions.Only exception for this rule is for the major code changes or close to critical milestones, even the developers doing the code changes can run the full regressions to gain additional confidence in their changes.
Full regressions result reference should be from single source to maintain consistency and validity. Like a legal document, it should be stamped and signed by full regression owner.
How periodically full regressions should be run?
Given the compute capacity, total licenses available, total number of tests in regression will set the limit on minimum duration required to run the complete regression. This is a real constraint. Let’s call it as physics limit. Everything else should be worked around it to meet the requirements.
Next important factor in deciding frequency should be driven by the rate of the new developments. If the new code changes are happening once every month then it’s sufficient to run it once every month. This may happen at the later phase of the project close to tape out. During the peak of development, rate of code change will be far faster with several code changes coming every hour. When the rate code change is far faster than full regression time then it becomes challenging to decide the frequency of full regressions.
During development phase, attempt should be made to run as frequently as possible guided by the full regression physics limit. What happens when the full regressions are delayed? Check-in regression will attempt to hold the fort but it’s on best effort basis. It’s possible that some check-in can cause the damage to full regression results. Earlier this damage is detected better are chances that it can be contained and recovered. A very simple solution might be to just back out the culprit changes. But as the time passes and more code changes are committed to development branch the impact of damage increases. It becomes complicated even to back out the changes.
Now when the physical limit for full regression is very large to be useful then full regressions can be broken into multiple levels. Group the tests based on the importance of the configurations and features. These subsets should be divided till the time limits reach a useful point. There on these subsets can be run at acceptable frequency. Important levels be should be run more frequently. It’s like choosing to save most valuable possessions in case of emergencies.
When to enable coverage in full regressions?
Answer depends on the phase of the project. At early development phases this is not required. Coverage of any form is not seriously analyzed during early phases of development. So early phase full regressions coverage need not be enabled. Enabling coverage hits on the run time performance. Enable it when it’s planned to be looked in to. Generating coverage reports that are not analyzed is waste of time and resources.
At the later stage when development matures coverage should be enabled. The run time hit it takes has to be taken into consideration during planning regression duration. Frequency of enabling coverage can reduced compared to frequency of full regression based on the granularity of the coverage tracking.
A simple rule to figure out when to start enabling coverage is, unless bugs rate starts showing the downward trend coverage analysis is an unaffordable luxury. When the bug rate comes under control, coverage convergence process can be started. That’s when it makes sense to enable the coverage to find holes and tune stimulus to uncover bugs in those areas. Bottom line, do not enable unless there is bandwidth available to use the coverage data generated.
What do with the failures of full regression?
Debug. Considering the total failures process has to be defined to make debug productive.
The sheer size of failure numbers can become scary at times. Point to keep in mind is it’s not about total failure numbers but the debug effort involved which is more important. Debug effort involved is not directly proportional to total failing numbers in many cases. This is because the unique failures within total failures may be lesser. Debug effort is required for debugging the unique failures and not all the failures.
It’s important to group the failures with same failure signature. Among this group debug the failure that is showing earliest in simulation time. Picking the one showing earliest makes it faster to simulate the failing scenario with the debug verbosity for logging and waveform dumps.
Unique failures have to be assigned out. Till they are debugged either they can be excluded from further regression or excluded from analysis for failure debug assignment. Failures debugged should be classified as design, test bench and bus functional model issues. Bugs should be tracked through a bug tracking system. The failing tests should be re-enabled in the regression after the fixes are committed in development branch.
Extra points for making list of failures expected to pass before starting next full regression. This gives an idea of what to expect. It also gives window to pull in any additional fixes to make the best of the next full regression.