Author: admin

  • Functional coverage for Micro-architecture – Why?

    Functional coverage plan in order to be effective has to take into consideration two specifications. They are:

    In principle verification teams readily agrees to above. But when it comes to defining the coverage plan it does not reflect.

    In general functional coverage itself receives less attention. On the top of that among the above two, requirements specifications coverage ends up getting lion share. Micro-architecture implementation coverage gets a very little attention or almost ignored.

    For some it may look like an issue out of nowhere. They may argue, as long as requirements specifications are covered through functional coverage, micro-architecture coverage should be taken care by code coverage.

    Why do we need functional coverage for micro-architecture?

    We need functional coverage for micro-architecture specifications as well because interesting things happen at the intersection of requirements specification variables and micro-architecture implementation variables.

     

    Requirements and Implementation variable intersection
    Requirement and Implementation variable intersection

    Many of the tough bugs are hidden at this intersection and are caught very late in verification flow or worse in silicon due to above thought process.

    How? Let’s look at some examples.

    Example#1

    Design with pipeline, combinations of states across stages pipelines is an important coverage metric for the quality of stimulus. Just the interface level stimulus of all types of inputs, will not be able to provide idea about whether all interesting states combinations are exercised for the pipeline.

    Example#2

    Design with series of FIFOs in the data paths, combinations of FIFO back pressures taking place at different points and with different combinations is interesting to cover. Don’t wait for stimulus delays to uncover it.

    Example#3

    Design implementing scatter gather lists for communication protocol, not only the random size of the packets are important but the packet sizes colliding with the internal buffer allocation sizes is very important.

    For example let’s say standard communication protocol allows maximum payload up to 1 KB. Internally design is managing buffers in multiples of 256 bytes then multiple packets of size less than or equal to 256 bytes or multiples of 256 bytes are especially interesting to this implementation.

    Note here from protocol point of view this scenario is of same importance as any random combinations of the sizes. If the design changes the buffer management to 512 bytes the interesting packet size combinations changes again. One can argue constrained random will hit it. Sure it may but probabilistic nature can make it miss as well. If it misses its expensive miss.

    Covering and bit of tuning constraints based on the internal micro-architecture can go long way in helping find issues faster. Note this does not mean not to exercise other sizes but pay attention to sensitive sizes because there is higher likelihood of hard issues hiding there.

    This interesting intersection is mine of bugs but often ignored in the functional coverage plan due to following three myths.

    Myth#1: Verification engineers should not look into design

    There is this big taboo that verification engineers should not look into the design. Idea behind this age-old myth is to prevent verification engineers from limiting the scope of verification or getting biased from design.

    This forces verification engineers to focus only on the requirements specification and ignore the micro-architecture details. But when tests fail as part of debug process they would look in to the design anyway.

    Myth#2: Code coverage is sufficient

    Many would say code coverage is sufficient for covering micro-architecture specification. It does but it does it only partially.

    Remember code coverage is limited as it does not have notion of time. So concurrent occurrence and sequences will not be addressed by the code coverage. If you agree above examples are interesting then think is code coverage addresses them.

    Myth#3: Assertion coverage can do the trick

    Many designers add assertions into the design for assumptions. Some may argue covering these is sufficient to cover micro-architecture details. But note that if intent assertion is to check for assumption then it’s not same as functional coverage for implementation.

    For example let’s say we have simple request and acknowledgement interface between the internal blocks of design. Let’s say acknowledgement is expected within 100 clock cycles after request assertion. Now designer may place assertion capturing this requirement. Sure, assertion will fire error if acknowledgement doesn’t show up within 100 clock cycles.

    But does it cover different timings of acknowledgement coming in? No, it just means it has come within 100 cycles. It does not tell if acknowledgement has come immediately after request or in 2-30 clock cycles range or 30-60 clock cycles range or 61-90 clock cycles range or 91-99 clock cycles range and exactly at 100 clock cycles.

    However there are always exceptions, there are few designers who do add explicit micro-architecture functional coverage for their RTL code. Even there they restrict scope to their RTL sub-block. The holistic view of complete micro-architecture is still missed here.

    Micro-architecture functional coverage: No man’s land

    Simple question, who should take care of adding the functional coverage requirements of the micro-architecture to coverage plan?

    Verification engineers would argue we lack understanding of the internals of the design. Design engineers lack understanding of functional coverage and test bench.

    All this leads to micro-architecture functional coverage falling in no man’s land. Due to lack clarity on responsibility of implementation of micro-architecture functional coverage it’s often missed or implemented to very minimal extent.

    This leads to hard to fix bugs only discovered in silicon like Pentium FDIV bug. Companies end up paying heavy price. Risk of this could have minimized significantly with the help of micro-architecture functional coverage.

    Both verification and design engineers can easily generate it quickly using library of built-in coverage models from our tool curiosity.

  • PCI Express PIPE interface functional coverage

    What started off, as “PHY Interface for the PCI Express Architecture” was soon promoted to “PHY interface for the PCI Express, SATA and USB3.1 and SATA”. It was primarily designed to ease the integration of digital MAC Layer with the mixed signal PHY.

    PIPE Interface for MAC and PHY Integration
    PIPE Interface for MAC and PHY Integration

    Today as of October 2017 latest publicly available PIPE specification is version 4.4.1. All the waveforms and pictures are sourced from this specification. If we go back 10 years in time to 2007, PIPE specification was at version 2.0.

    Version 2.0 was for PCI express only. It had only 5 contributors and 38 pages. Whereas version 4.4.1 has 32 contributors and 114 pages. It supports not only PCI express, SATA and USB3.1 as well. Version 4.4.1 has had 5x growth in contributors and 3x growth in number of pages compared to version 2.0.

    It’s also indication of rise in complexity of PCI express as well. PCI express has been one of the leading high-speed serial interface technologies. Hence there are multiple IP companies helping with the adoption of the technology. Apart from Big 3 there are also many vendors who provide the PCI express IP solutions. Some provide both the MAC layer and PHY while other provide one of them and partner with complementary vendors to provide the complete solution.

    PCI express even at the PIPE interface level has quite a bit of configurability. This comes out in terms of width, rate and PCLK frequencies. Also some companies do make certain level of customization to this interface to support custom features.

    To accelerate the simulation speed many verification environments for the PCI express controllers support both the PIPE and serial interface. Parallel PIPE interface runs faster than serial PHY interface. PHY would also require mixed signal simulations as well to verify the analog parts in it.

    Considering the complexity there is lot of focus on verifying the MAC layer and PHY layer independently. This creates challenge as to what to verify when they are put together. Obviously it’s not practical to run again all the PHY and MAC layers tests put together on integrated design.

    Vendors would make attempt to verify all configurations but do all the configurations get equal attention is something difficult to assess.

    When the PCI express IPs are bought from the 3rd party vendors it’s a challenge as to what to cover for integration verification.

    PIPE interface coverage can be one of the very useful metric to decide on what tests to run for integration verification. This becomes even more important if the digital controller and PHY are sourced from different vendors.

    One can say simple toggle coverage of all the PIPE signals should be sufficient to ensure the correct integration. Yes this is necessary but not sufficient.

    Following are some of the reasons why just toggle coverage will not suffice.

    • Toggling in all the speeds

    Some of the signals of PIPE interface are used only in specific speeds of operation. So it’s important to check for the toggle in appropriate speed. Simple toggle coverage will not indicate in which speed the signals toggled.

    Some examples are link equalization signals of the command interface signals such as RxEqEval, RxEqInProgress or 130b block related signals such TxSyncHeader, RxSyncHeader are only applicable for the Gen3 (8 GT/s ) or Gen4(16 GT/s) speeds etc.

    • Transition or sequence of operations

    Some of the signals instead of just individual toggle coverage require transition or sequence of events coverage to confirm the correct integration.

    Some examples are six legal power state transitions, receiver detection sequence, transmitter beacon sequence etc.

    • Variation of timing

    Some of the signals can be de-asserted at different points of time. There can be multiple de-assertion or assertion points that are valid. Based on the design it’s important to confirm the possible timing variations are covered.

    For example following waveform RxEqEval can de-assert along with PhyStatus de-assertion or later.

    • Concurrency of interfaces

    PIPE provides 5 concurrent interfaces, transmit data interface, receive data interface, command interface, status interface and message bus interface.

    Simple concurrency like transmit and receive taking place at same time cannot be indicated by the toggle coverage.

    • Combinations

    Some signals, which are vectors, may not have all the values defined to achieve the toggle coverage.

    LocalPresetIndex[4:0] for example has only 21/32 valid values. Here toggle can indicate connection but whether both the digital controller and PHY can handle all combinations of values together is not confirmed.

    Also behavior of some key signals in different states needs to be confirmed to ensure the correct integration. TxDeemph[17:0] values will have to be covered based on rate of operation as toggle of all bits may not be meaningful.

    • Multiple lanes

    PIPE interface allows certain signals to be shared across multiple lanes. In multi lane case combinations of the lanes active coverage becomes important.

    • Multiple times

    Some of events or sequences are not sufficient just covered once. Why?

    For example in the following waveforms the InvalidRequest must de-assert at the next assertion of RxEqEval. So multiple RxEqEval are required to complete the sequence of the InvalidRequest

    Putting all these together results in functional coverage plan containing 88 cover items.

    PCIe PIPE functional coverage plan pivot table

    With our configurable coverage model we can easily customize and integrate all the above covergroups in your verification environment.

    We offer service to tell you where your verification stands from PIPE integration coverage point of view.

  • Verification plan debate – Part IV

    Continued from: Verification debate Part III

    Ananta could not sleep although he was really tired. His body was tired but his mind was still racing at full throttle. He started reflecting.

    He was pondering on how sheriff was able to get his hands on escaped convict so quickly? How it prevented breach of the confidential informers data which could have lead to further crime.

    Action of patrol teams in motion, teams that were stationed at sensitive points, investigation and emergency response handled this case had made deep impression on Ananta’s mind. Their contribution was evident and appreciated.

    However the informers who played equally important role in the whole process almost went unnoticed. This was one of the missing pieces of puzzles he felt. He continued to reflect on this fact.

    What would be informers in the context of functional verification? He had a sudden flash of insight. He finally figured out it was coverage plan. Coverage is like your informers network. You need omnipresence of coverage by combining both the code and functional coverage.

    With coverage we gain clarity as to where we need target our stimulus and checks. Without it we are blind.

    He thought is it just me who had ignored the coverage plan or rest of world is doing the same?

    He thought of doing a quick check. It was already late in night.

    He messaged his friend Achyuta, Are you awake? Hey what’s up Ananta came back a quick reply.

    Ananta replied can you send me number of page views on the blogs of verification planning that you had pointed me earlier at once if convenient. If inconvenient, send all the same.

    In next 10 minutes following were the number sent out by Achyuta. All these blogs got online almost around same timelines. Being read by readers over a year now. Here are the page views statistics:

    From August 2016 – September 2017 duration pageviews:

    Page tile Page Views
    Test plan 799
    Checks or assertion plan 354
    Coverage plan 271

    Test plan has 200% more views compared to checks or assertion plan. Test plan also has close to 300 % more views compare to coverage plan.

    This data makes it clear thought Ananta. He smiled, I am not alone.

    Coverage plan is given least importance among three. Remember coverage plan is doing the job of informer. At times you can raid and even capture criminals without informers but you would lack the accuracy and speed of response.

    With all three verification plans getting right level of attention, we also should be able to bring down the bugs rate.

    Ananta got excited and he had to share these realizations with Achyuta. It’s the debates with him that had aroused his curiosity in first place so he had share it with him. He called him up. He knew it would interest him as well.

    Achyuta looks like finally I understand the riddle of verification plan now. He shared his findings connecting all the dots in single breath.

    We are excessively focusing on stimulus. Our focus on checks or assertion is lacking. Our focus on the functional coverage is even lower. That explains bugs rate. We are working hard and we are making our machine farms work even harder by running regression with thousands of tests. We are not doing it effectively. What is happening in regression is staying in regression. We need more transparency into effectiveness of our regressions.

    Many times, our best players are fighting hard on wrong battles. Our ad hoc approach is taking toll on them. Let’s accept it we only have handful of A+ players. We need to invest part of their bandwidth on strategic tasks of writing and maintaining the verification plan.

    Verification plan is not something that is written once at beginning of project and finished for good. It evolves. This is a plan to cope with the changes not a line set in stone. We are innovating and by definition it means things will change and evolve. No one knows requirements completely from start. We have to deal with ambiguity. Our only chance is we have verification plan that can evolve and adapt to these changes. We need to provide the equal importance to all three plans.

    If we do get these three plans right then these are three strings that can be pulled to control the entire execution of the functional verification. It can greatly streamline execution and enable us to adapt to changing priorities without dropping any balls on the floor. It will also bring in a good level predictability and ease to make informed decisions to on what features to cut down to meet schedules.

    Bravo Bravo shouted Achyuta being unable to control his excitation.

    This woke up his wife who looked straight at him with her red eyes in half asleep.

    Achyuta’s heart skipped a beat. There was moment of silence. Ananta was quick to sense it.

    He said, I am coming back this weekend anyway so let’s catch up at our usual coffee place we have lot more to talk. Good night, I mean Good morning.

    See you soon…

     

  • Verification plan debate – Part III

    Continued from: Verification debate Part II

    Few months had already rolled. It was bright sunny day.

    Ananta sensed tensed activity at the sheriff’s office. He asked one of the officers. Officer indicated that one of the high profile criminal has escaped during transit. Office was bustling with activities. There was sense of urgency everywhere.

    Even among this distressed situation sheriff seemed to be calm and composed. He was messaging and calling all the time while other officers were busy collecting information about incidence and getting it under control.

    Just within few hours sheriff called for meeting with his key deputy officers. He provided them with the exact location where to find the escaped convict. He also indicated that this house is located in secluded locality near highway and there is underground passage from house to highway. So we should first set up the team on highway and then attack on the house. They did that and grabbed the convict in matter of few hours. This impressed Ananta further. He thought I am at right place.

    Ananta went in and congratulated sheriff on this success. He asked how was he able to pinpoint the location so quick?

    Sheriff said it’s my informers network. We have a widespread and deep network of informers. We get a very good intelligence on various happenings in county. This has been one of our key strengths helping us maintain the low crime rate in our county. We do invest, grow and guard our valuable informers network.

    Interrogation of the escaped convict captured went on during most of the part of the day.

    Sheriff came to Ananta with the expression of melancholy in early evening. Asked him if he knew anything about blocking hacking efforts?

    Ananta asked what happened?

    Escaped convict had some confidential information, which he has passed on. It was location of our hidden machine guarded data center far away from city limits. Looks like there is already a physical breach and they have got access to serial and usb debug ports of our main machine. By the time we reach there they will hack it. Considering the holiday seasons our key specialists are out. Is there anything you can do help us?

    Ananta asked what is stored on these machines?

    It’s very confidential. We have information about our informers on that machine we need to protect it at any cost. We cannot afford to lose it.

    Ananta said let me see please take me to your team working on it. He joined the team working on blocking the hacking attempt. He got a quick brain dump. The experts working indicated, hacking is going on with the FPGA based custom designed system. They will accelerate by running hacking algorithms on custom hardware quicker.

    Ananta asked which port are they trying to break in first? It’s serial port because that’s simplest. When he looked at the schematics of the machine he found a small relay near serial lines. He asked what is it used for? We had plans to allow remote control of machine power but we did not use it. We just have LED attached to it.

    Can we control relay remotely?

    Yes said the expert. Then let’s start switching it on and off. What do we get by doing that? It might induce some emi noise on the serial lines. They immediately acted and started program to turn relay on and off.

    Over here in the data center the mastermind Professor Buggart and his associates were trying to break in with the serial port. The glitches on the lines due to relay switching on and off sent their UART IP’s receive state machines to unknown states resulting in hang. His associate quickly generated the waveforms from embedded analyzer in his FPGA board. Figured out that noise on the line is leading to false start bits. They had not completely covered these cases in verification. We have bug here said assistant.

    We don’t have time for these bug fixes shouted professor. You guys had said the code coverage was 100 %. Why are we having this issue? Assistant said that’s something can discuss later. Now let’s try the USB port.

    They reconfigured the FPGA with USB IP and started trying out with the USB. The link came up and Professor Buggart was very happy. Now quickly start copying the files.

    Here at sheriff’s office hack prevention team had figured out USB port was compromised. Ananta asked what speed is USB port operating? 3.0 replied one of the experts. Lets quickly create a large file with the repeated control symbol values from the training pattern of the USB3x link training. Through a script they quickly created the file of gigabyte in size containing these control symbols and uploaded to machine. Name it as highly_confidential.txt

    Here in the data center. When the hackers tried copying the file their USB IP crashed. Professor Buggart was very upset. He shouted, what is happening here? His assistant replied our LTSSM is getting falsely triggered and moving to recovery state during data transfer. Looks like some of our control symbols decode was not validated with LTSSM states. We did randomize the data in our verification but we are not sure how why this is happening. Professor Buggart kept repeating you guys said code coverage was 100 % even on USB IP. Before assistant could answer the sheriff’s team nabbed them.

    Sheriff thanked his team and Ananta for his help.

    One of the experts on prevention team asked to Ananta how did you think of those cases to prevent hacking? Oh! These are some of the corner cases that verification teams often ignore. I have also figured them out hard way replied Ananta. Also when you guys said they are using FPGA based systems, I guessed FPGA based IP developers generally rush to FPGA rather than verifying it sufficiently in simulation.

    Ananta was satisfied with the results and contributions he had made. He was tired with the hustle of the day when he reached back his hostel.

    Ananta’s internship was also coming to an end. He reflected deeply on his experiences that night. Find out what insights he gained in the last part of the series.

    Conclusion: Part IV.

  • Verification plan debate – Part II

    Continued from: Verification debate Part I

    Good morning Sheriff.

    I am Ananta. I was selected for citizen internship of 6 months.

    Welcome Ananta. Come let’s catch up over a coffee.

    Sheriff was an old man in his early fifties. He was tall guy with piercing sharp eyes. His gray hair and scars were proof of experiences he has had. His presence was something that you could not ignore.

    At breakfast table, sheriff asked how do you earn your living?

    I verify hardware designs for my bread and cheese, replied Ananta.

    Well, I am curious what brings you here? I don’t think we do any of those things here.

    Ananta said, I have been following your work for quite some time. My interest is to understand how are you able to bring down the crime rate so fast? I want to study your methods.

    How is it going to help you young man? Asked perplexed sheriff.

    Ananta said there is some similarity in what we both do. You catch criminals we, catch bugs.

    What bugs? exclaimed sheriff with surprise

    Ananta said oh no! not that bugs. When we say bugs in our professional world, we mean the parts of design violating the requirement specification. It causes malfunction in the operation of the systems using these designs.

    Umm! I get it, requirement specifications are like laws then, said sheriff. Since bugs violate laws, you guys treat them as criminals. So you want to bring down the bug rate in your designs by learning my methods.

    So that makes you sheriff of the verification land he said jokingly. We enforce laws and you folks enforce requirement specifications.

    Exactly! said Ananta. He could not hide the expression of appreciation for the old sheriff who caught up with the concepts so quickly. He was now convinced that his in right place and slight doubts that he had seeing old man in person had vanished.

    Sheriff introduced Ananta to all his staff. Asked him to start looking at some of the solved cases as to how his team has solved them.

    Alright…. said Ananta half-heartedly. He was excited and eager to see something live and be part of it. He knew experience is the best teacher.

    Sheriff went back to his table and immersed himself in the files in front of him. He spoke over the phone to his staff on the status of various cases. Planned out the tasks for all the officers, quickly scanned the newspaper to see any other news of interest. He then went for rounds of his jurisdiction himself.

    The Sheriff was busy with some important cases and couldn’t give much time for Ananta.

    Ananta was not guy who would sit in corner and wait for things to happen. He had a mission. He started observing and making notes.

    Ananta figured out that patrolling is backbone of police operations. It consumed most of the resources. So he first spoke to patrolling head.

    Ananta asked what are some of the patrolling strategies used? Officer answered we have few officers doing regular circuits or passes through key areas called a beat.

    Hmm Ananta’s verification mind started correlating. This sounds like directed test cases.

    Officer said we also have some police cars cruising randomly through city streets supposedly create the feeling that police are everywhere. This method is the most controversial and questionable in terms of its effectiveness.

    Ananta chuckled. Officer asked what happened? We also do something similar in our profession. We also have same challenge. Goes without saying constrained random tests passed through his mind.

    Suddenly phone started ringing. It was an emergency call. Officer excused himself with his team to attend to their duties. Ananta was reminded of tanked weekly regressions, where all of his team would rush to debug and fix it.

    There was curious constable looking at Ananta from a far corner.

    Ananta waved his hand and walked to him. He introduced himself. He asked politely what are your responsibilities, sir?

    Constable said myself and some of my colleagues are stationed at some key locations. These include airports, railway stations, bus stands and some sensitive spots around the city. We just patrol small area around those key locations on foot. We keep vigil in those sensitive areas everyday. We also keep an eye on arrival and departures of new folks to our county.

    Thank you said Ananta. Okay, these folks looks like are doing job of checkers and scoreboards of verification environment.

    Days were rolling. Ananta was studying some of the cases. He was also studying various operational plans and tasks lists made in the office by sheriff. He was amazed at the level details and wanted to talk to sheriff about it.

    Ananta finally caught up with sheriff on that evening. Ananta shared about his learning’s so far.

    Sheriff said yes we do carefully plan our patrols and pick locations where we station our people. Both are very important activities for us and glad you have learnt about it. He showed details on how these plans are laid out with the map of his county. It was very detailed and well thought out.

    Ananta appreciated the details.

    Sherif said please note detailed does not mean they are lines in stone. We have designed our plans such that they can easily adapt to changes. Which is also equally important as details.  It certainly created great impressions in Ananta’s mind.

    It was clear to Ananta these plans looked at like test plan and checks plan part of verification plan. But Ananta was more curious about the worth of time invested in planning.

    He asked sheriff what is your general view on planning? Do things go according to plan?

    No, said sheriff in his deep voice. Old man quoted Dwight D. Eisenhower saying, in preparing for battle I have always found that plans are useless, but planning is indispensable.

    We have learnt from our experiences, no plan survives contact with the reality.

    So our plans are really plans for adapting and improvising. It’s only this type of meta-planning that allows us respond quickly and achieve our objectives under dangerous circumstances.

    Grave danger? Ananta asked

    Sheriff questioned back is there another kind?

    Wow, that’s great insight. Yes, importance of planning was now clear to Ananta.

    Now it was clear to him why executable, traceable and trackable verification plans were emphasized so much by Achyuta. We must also build our verification plans as meta-plans that have adaptability built into them noted down Ananta.

    It was already late and sheriff drove Ananta back to hostel where he was lodged.

    Something exciting which Anant was waiting for long is about to happen in next few days.

    To be continued in : Part III

  • Verification plan debate – Part I

    Achyuta and Ananta were close friends over decades and practicing functional verification.

    Achyuta always takes a carefully planned approach while Ananta believes in jumping to action as soon as possible. Except for this key difference both were well matched in their capabilities. Both had their own share of success and failures.

    Ananta had shining career and team always appreciated his brave fire fighting spirit of jumping right into execution. Not only he jumped in but also produced results very quickly. Ananta in past had a good success rate with  ASICs he worked on making to production on first pass. He took pride in it.

    However off late Ananta was not feeling the same high. It was not without a reason. There were series of bugs found right after the RTL freeze. This had upset Ananta. He had carefully tracked the bug trends, regression results and code coverage closely. Based on that he had decided they had reached the RTL freeze quality. Now this rise in bugs inspite of all these measure had upset him. He was worried have I lost my midas touch?

    Ananta called his old buddy Achyuta for dinner to talk about it. Achyuta in usual methodical way asked Ananta, did you guys write verification plan or just jumped right into execution in your usual style? Ananta said dear Achyuta I have changed my ways a bit by being with you for so long. I have learnt there is thin line between being bold and reckless. We did write the test plan before starting execution. That’s quite a progress for Ananta, Achyuta thought to himself.

    Ananta, do you know test plan alone does not complete verification plan? asked Achyuta.

    What else? Said Ananta slightly annoyed.

    Verification plan is made up of three plans: Test plan, Checks or assertion plan and coverage plan. All three are equally important for successful verification closure.

    Oh Achyuta, now don’t start that one again. Cut it short man, said Ananta.

    No it’s not just me check out these blogs on verification plan, test plan, checks plan and coverage plan if you want.

    I have looked at them. To me it’s evidently the theory of some arm-chair lounger who evolves all these neat little paradoxes in the seclusion of his own study. It’s not practical. Just put him on our current project and I would lay a thousand to one against him if he can rescue it. We are finding bugs after bugs. You may loose your money Achyuta remarked calmly. It’s not about him we are discussing methods here.

    Anantha verification requirements have changed a lot in last 10 years, said Achyuta. We cannot dismiss methodical approach any more.

    Forget it, said Ananta. All these are easier said than done. All these theories melt in heat of execution.

    Achyuta just put brakes on his tongue which was about to slip into debate. He thought this is not a right time for this debate.

    Ananta also changed the topic and said anyway reason I called you today was I am thinking of taking sabbatical for next 6 months.

    Huh! What do you plan to do?

    You see I have laid my eyes on this new sheriff in the town for quite some time. After his arrival the crime rate has gone quite low. I have even heard that the criminals who rarely escape are only talking about their luck for rest of their life.

    What does all this has to do with verification and your sabbatical?

    I need some fresh thoughts. So I am thinking of taking internship at his office.

    Have you gone crazy exclaimed Achyuta.

    No, I have given fair bit of thought and I am starting tomorrow. Ananta seemed very determined. Achyuta being good friend understood him and said, call me if you need anything.

    Sure, will do said Ananta.

    What Ananta learns in sheriff’s office?  Part II

  • Pentium FDIV bug and curious engineer

    According to wikipedia Pentium FDIV bug affects the floating point unit(FPU) of the early intel Pentium processors. Because of this bug incorrect floating point results were returned. In December 1994, Intel recalled defective processors. In January 1995, Intel announced “a pre-tax charge of 475 Million against earnings. Total cost associated with the replacements of the flawed processors.

    What went wrong here?

    According byte.com web page archive, Intel wanted to boost the execution of their floating-point unit. To do that they moved from its traditional shift-and-subtract division algorithm to new method called SRT algorithm. SRT algorithm uses a lookup table. Pentium’s SRT  lookup table implementation is matrix of 2048 cells. Only 1066 of these cells actually contained the valid values. Due to issue in script loading the lookup table 5/1066 entries were not programmed with valid values.

    SRT algorithm is recursive. This bug leads to corruption only with certain pairs of divisors and numerators. The “buggy pairs” are identified. They always lead to corruption. At its worst, the error can rise as high as the fourth significant digit of decimal number. Chances of this happening randomly, is about 1 in 360 billion. Usually the error appears around the 9th or 10th decimal digit. The chances of this happening randomly, is about 1 in 9 billion.

    Isn’t randomization on inputs sufficient?

    Here you can see even if we had constrained random generation on divisor and numerators how low is probability of hitting the buggy pair. Even if we create the functional coverage on the inputs how low is probability that we would specifically write coverpoints for the buggy pair.

    Of-course now we have formal methods to prove correctness of such computationally intensive blocks.

    We are already convinced that constrained random hitting on this buggy pair has very low probability. Hence even if it hits this case it may take long wall clock time. Pure floating-point arithmetic verification approach alone is not sufficient.

    For a moment if you think what could have helped maximize the probability of finding this issue in traditional simulation based verification?

    Simulation world is always time limited. We cannot do exhaustive verification of everything. Even a simple 32-bit counter exhaustive verification would mean 2 ^ 32 = 4G of state space. That’s where the judgment of the verification engineers plays major role as to what to abstract and what to focus.

    Story of curious engineer

    Let’s say a curious verification engineer like you looked inside the implementation. He tried to identify the major blocks of design and spotted the lookup table. His mind suddenly flashed insight  floating-point inputs must use all valid entries of the lookup table. This is one more critical dimension for the quality of the input stimulus.

    Let’s say he thought of checking that by writing functional coverage on input address of the lookup table. He thought of adding a bin per each valid cell of matrix.

    This could have helped maximize the probability of catching this issue. There is a notion that verification engineers should not look into design as it can bias their thinking. While the intent is good but let’s not forget we are verifying this specific implementation of requirements. Some level of balancing act is required. It cannot be applied in purity.

    (more…)

  • Functional coverage – Value proposition

    Functional coverage complements the code coverage by addressing its inherent limitations. This blog will help you understand what is the key value proposition of the functional coverage. This in turn will help you achieve the maximum impact from it.

    Functional verification has to address verification requirements made up of both requirements specifications and implementation.

    Functional coverage’s primary purpose is finding out has the functional verification done its job well. The means used for functional verification to achieve the goal doesn’t matter in this context. It can achieve it with all directed tests or use constrained random or use emulation.

    When the functional verification relies on the constrained random approach functional coverage helps in finding out the effectiveness of constraints. It helps in finding out the constraints are correct and they aren’t over constraining. But this aspect of the functional coverage has become so popular that the primary purpose of it has become overshadowed.

    Functional coverage focuses on following three with possible overlap between each other to figure out whether the functional verification objectives are met:

    • Randomization or stimulus coverage
    • Requirements coverage
    • Implementation coverage

    In the following sections we will briefly discuss what each of these mean

    Randomization or stimulus functional coverage

    Uncertainness of constrained random environments is both boon and bane. Bane part is addressed with the functional coverage. It helps provide the certainty that randomization has indeed hit the important values that we really care for.

    At a very basic level it starts with ensuring each of the randomized values are covering the their entire range of values possible. This is not the end of it.

    Functional coverage’s primary value for the constrained random environments is to determine its effectiveness in randomization. Functional coverage quantifies the effectiveness of randomization. This does not directly say anything about effectiveness or completeness of the functional verification. This just means we have enabler that can help achieve the functional verification goals.

    It’s the requirements and implementation coverage that really measures the effectiveness and completeness of the functional verification.

    Requirements functional coverage

    Requirements specification view of functional verification is looking at design from the end application point of view.

    Broadly it looks at whether test bench is capable of covering the required scope of verification from application scenarios point of view. That includes:

    • All key atomic scenarios in all key configurations
    • Concurrency or decoupling between features
    • Application scenario coverage
      • Software type of interactions:
        • Like polling versus interrupt driven
        • Error recovery sequences
        • Virtualizations and security
      • Various types of traffic patterns
      • Real life interesting scenarios like:
        • Reset in middle of traffic
        • Zero length transfer for achieving something
        • Various types of low power scenarios with different entry and exit conditions
        • Various real life delays(can be scaled proportionately)

    Bottom line is requirement specification functional coverage should be able to convince someone who understands requirements that design will work without knowing anything about implementation. This is one the primary value that functional coverage brings over the code coverage.

    Implementation functional coverage

    Implementation related functional coverage is highly under focused area. But remember it’s equally important one. Many verification engineers will fall in trap that code coverage will take care of it. Which is only partially true.

    The implementation means the micro-architecture, clocking, reset, pad or any other analog components interface related aspects.

    Micro-architecture details to be covered include internal FIFOs becoming full and empty multiple times, arbiters experiencing various scenarios, concurrency of events, multi-threaded implementation, pipelines experiencing various scenarios etc.

    Clocking coverage is whether all the specification defined clock frequency ranges are covered, for multiple clock domains key relations between the two clocks, special requirements such as spread spectrum, clock gating etc.

    For resets whether external asynchronous resets at all the key states of any internal state machines. For multi-client or multi-channel where they are expected to operate independently if one is in reset whether other can make progress etc.

    Pads and analog blocks interface coverage is very critical. Making sure all the possible interactions is exercised whether effects can be seen or not in the simulation is still important.

    Combination of the white box and black box functional coverage addresses both of the above.

  • Functional coverage debt

    We have heard technical debt but what is this functional coverage debt? It’s on the same lines as technical debt but more specific to functional coverage. Functional coverage debt is absence or insufficient functional coverage accumulated over years for silicon proven IPs.

    Can we have silicon proven IPs without functional coverage? Yes, of-course. The designs have been taped out successfully without coverage driven constrained random verification as well. Many of the implementations, which are decades old and still going strong, have been born and brought-up in the older test benches.

    These were plain Verilog or VHDL test benches. Let’s not underestimate the power of these. It’s not always about swords but also about who is holding them.

    Some of these legacy test benches may have migrated to the latest verification methodologies and high-level verification languages(HVL). Sometimes these migrations may not be exploiting the full power the HVLs offer. Functional coverage may be one of them.

    Now we have legacy IPs, which have been there for some time and have been taped out successfully multiple times. Now mostly undergoing bug fixes and minor feature updates. Lets call these as silicon proven mature IPs.

    Silicon proven IPs
    Silicon proven IPs

    For silicon proven mature IPs is there a value in investing for full functional coverage?

    This is tough question. The answer will vary case-by-case basis. Silicon proven IPs are special because they have real field experience. It has faced real life scenarios. Bugs in this IPs have been found over years the hard way through the silicon failure and long debugs. They have paid high price for their maturity. They have learnt from their own mistakes.

    Bottom-line is they have now become mature. Irrespective of whether proper test cases were written for the bugs found in the design or not after bug discovery, now they are proven and capable of handling various real life scenarios.

    So for such mature IPs when the complete functional coverage plan followed by their implementation is done the results can be bit of surprise. Initially there can be many visible coverage holes in verification environment. This is because such IP may have relied on silicon validation and very specific set of directed test scenarios to reproduce and verify the issue in test bench.

    Even when many of these coverage holes are filled with the enhancement to test bench or tests, it may not translate to finding any real bugs. Many of these hard to hit bugs have been already found hard way with silicon.

    Now the important question, so what is the point of investing time and resources for implementing functional coverage for these?

    It can only be justified by calling it paying back technical debt or functional coverage debt in this case.

    Does every silicon proven mature IP need to pay back the functional coverage debt?

    It depends. If this IP is just going to stay as is without much of change then paying debt can further pushed out. It’s better to time it to discontinuity. The discontinuity can be in the form of major specification updates or feature updates.

    When the major feature upgrades takes place the older part of the logic will interact with the newer parts of the logic. This opens up possibility of new issues in both older part of the logic, new logic and cross interaction between them.

    So why not just write functional coverage for the new features?

    When there isn’t functional coverage et all there is lack of clarity on what has been verified and to what extent. This is like lacing GPS co-ordinates about current position. Without clearly knowing where we are currently it’s very difficult to reach our destination. When it’s done in patchy way the heavy price may have to be paid based on the complexity of the change. For cross feature interaction type of functional coverage the existing feature functional coverage is also required.

    Major updates are right opportunity to pay the functional coverage debt and set things right. Build the proper functional coverage plan and implement the functional coverage ground up.

    There will be larger benefits for newer features immediately. For older features functional coverage will benefit the future projects on this IP as it will serve as reference about quality of current test suite. Even when the 100 % functional coverage is not hit, there is clarity on what is being risked. Uncovered features would be informed risk rather than gamble.

    Conclusion

    For silicon proven IPs the value of functional coverage is very low if the IP is not going to change any more. What was taped out just stays, as is then there is no need to invest in functional coverage.

    If the IP is going to change then the quality achieved is no longer guaranteed to be same after the change

    • With the functional coverage it ensures the functional verification quality carries over even with changes in design
      • Ex: A directed test that was creating scenario for certain FIFO depth may not create the same when FIFO depth is changed
        • Unless this aspect is sensitive in code coverage this can get missed without functional coverage in next iteration
      • Also for new feature updates it may be difficult to write functional coverage unless there is some base layer of functional coverage for existing features to cover the cross feature interaction

    For silicon proven IPs the functional coverage implementation is paying of “technical debt” in that project but for new projects

    • Provides ability to hold on to same quality
    • Provides the base layer to build the functional coverage for new features to utilize its benefits from early on
  • Why Silicon proven IPs are treated special?

    Silicon proven IPs do enjoy some special attention to envy of the one’s aren’t. Do they really deserve it? What is so special about them?

    Simulation however much we love and adore is simulation. It’s not real. There are lots of aspects not modeled in the simulation. We may take lot of pride in our latest verification simulation technologies, but still there are lots of details of real world that IPs get shielded in simulation environment. Lets call them simulation blind spots.

    Those blind spots can be categorized into into three major areas:

    • Electrical characteristics of physical communication channels
    • Quirks of analog parts of the design
    • Software application interactions

    Even beyond all these simulation has speed challenge. For larger designs the highest simulation time is limited by the highest tolerable wall clock times.

    Though there are technologies in development to manage these problems independently and in some combinations but there isn’t one clean solution that fits all. Current verification strategy is still work in progress to address these blind spots.

    Silicon is real world where all of it comes together. If the design IPs were to be real person, it would be very stressful for them to handle all of these together. That’s why many fail.

    First time silicon is like “meet the parents” moment for design IP. The design IP that hasn’t had this moment is still not in the circle of trust. There are many jinx moments for them.

    Design IP's "meet the parents" moment. First silicon experience.
    Design IP’s “meet the parents” moment. First silicon experience.

    Why are these blind spots such a big deal?

    Simple power on reset value for physical pad being incorrect can potentially leave a particular interface dead. Eye of signal can be real eye openers.

    Suddenly the clock tolerance of 4 % becomes highly meaningful when the input sampling of the signal starts failing to detect the right data.

    That status register which we never cared about becomes hero overnight when software polls that register to figure out why commands are failing.

    Crazy things happen in real world. Suddenly those streams of all 0s and all 1s for few milliseconds may seem like cyclone. That line in specification, which almost looked so harmless isn’t harmless any more when you start seeing the wake signals being beaconed at not so good time for your design. That clock gating and wake coming together is not so pleasant.

    That’s exactly the reasons why  silicon experience and exposure to real world for design IPs does earn them special place over the ones that aren’t silicon proven.