A Methodology for Timely Verification of a Complex SoC/CHIP

Guy Regev
May 1, 2019
12 min read

Updated: May 8, 2019

This paper presents a novel and alternative methodology of logic or functional verification of a system-on-a-chip integrated- circuit. This methodology was used by our company for a successful and timely tape-out of our SoC. We will show a complete verification methodology that resulted in a fully- functional first silicon and quick time to market. It is a methodology to be used only in a very highly programmable system on a chip or very configurable ASIC, that a hardware bug can relatively easily be mitigated via a software or firmware workaround.

Guy Regev

Peretz Landau

IEEE, ISOCC 2009

I. INTRODUCTION

As transistor sizes are shrinking exponentially in conformance with Moore’s law, silicon chips or ICs are increasing in density, complexity and in the amount of logic that can be run with the same rate.

As a result of this, many of the current ICs incorporate huge amounts of logic and contain logic that was previously spread over several different ICs. This growth introduced what the system-on-a-chip or SoC. The SoC is basically a complete system integrated on a single silicon chip or IC.

Unfortunately with the increase in density and integration, together with the shrinking sizes of ICs, requirements for a reduced schedule and the aggressive time to market constraints become apparent.

These requirements, along with the cost constraints pose a very difficult challenge to IC design teams. IC design teams are now dealing with huge amounts of logic to design, verify, and layout, without the added head count or schedule time required to support it.

Due to these constraints IC design projects face many challenges, one of which is in verification. How to verify such very large scale integrated SoCs is still under debate in the world IC verification community.

We will present the verification methodology which we used in order to verify our current 3G UMTS/HSPA+ modem SoC chip for femto-cell.

II. THE DIFFERENCE BETWEEN SoC AND ASIC

An ASIC (Application Specific Integrated Circuit) is usually a silicon chip that performs several dedicated functions, and is used for specific applications.

A SoC is a complete system on a chip. For example: a mother board containing a central processor (CPU) controlling several peripherals such as DDR, additional memories, USB, graphics accelerator, etc. is integrated into a single chip called SoC. A complex SoC usually contains at least one processor and many peripherals.

To make it even more complex, usually many of these peripherals are designed in-house and contain complex interfaces between each other (which may bypass the processor), in addition to a standard processor bus interface.

A SoC is usually a chip, which unlike an ASIC, is not functional without dedicated software which controls the peripherals, configures them and activates the chip correctly.

A SoC design group requires – in addition to the chip architecture group – a system architecture group which defines and designs the system. The size and complexity of a SoC poses many difficulties on verification.

The regular methods of exhaustively checking every possible combination of hardware states is not applicable in the case of a complex SoC due to design size. This translates into very long simulation run times, and time to market which usually puts stiff schedule requirements on the project.

III. THE PRC6000 SoC

The PRC6000 architecture block diagram is depicted in Figure 1. The PRC6000 SoC is a powerful, highly integrated baseband processor specifically designed for 3G femtocells.

The PRC6000 contains a 3G layer 1 powerful modem and a high speed MIPS CPU. These are surrounded by a large set of accelerators and peripherals. The PRC6000 SoC contains two major and big sub-systems:

1) FLE (Femto Layer Engine) sub-system which is powered by a fast DSP core controlling the communications accelerators designed specifically to support 3G cellular base- station communications.

2) A CPU sub-system powered by a powerful and fast CPU that is responsible for Layer 1 control, and higher layers of protocol stack and application.

The FLE sub-system is controlled by a fast DSP core. The PHY block handles the physical parts of the UMTS and GSM layer1 while the Transport Layer block converts physical data to/from transport data. The data path of the Transport Layer block is directly connected to the CPU sub-system matrix (for shorter time delays). The DSP I/F (interface) block enables control of the FLE from the CPU sub-system and also allows for direct access of the DSP to the external DDR2 memories.

The CPU sub-system is controlled by a MIPS32 24Kc 600MHz processor. The CPU is aided by a DMA controller (DMAC) to control and pass data to/from a set of peripherals and external memory controllers. These peripherals are divided into fast peripherals operating on the AHB bus and slow peripherals operating on the APB bus.

The blocks inside the FLE phy and transport sections are not described due to Intellectual Property reasons. They consist of the cellular modem and are very complex.

In addition, they are all connected to the DSP via the z- bus and the bus bridge. They interface with each other through proprietary interfaces. The CPU sub-system also interfaces directly with blocks inside the FLE sub-system through standard interfaces, as well as a dedicated proprietary interface between the two sub-systems.

Due to the large size of the design, its complexity, limited engineering resources and demanding schedule, a non-traditional verification methodology had to be used.

We will describe the methodology and tools we developed in the following sections.

IV. THE METHODOLOGY

As complexity and size of ICs in general and SoCs in particular increased, many verification methodologies and industrial dedicated tools were developed. The question “Which methodology to pick?”, still baffles many verification engineers and VLSI experts.

Each methodology comes with a cost in terms of resources, schedule, and computing resources [1]. The verification team has to factor in all of the constraints and still come up with a methodology that will lower the risk of taping out a chip with critical bug(s).

The traditional methodology is to write a test plan to cover all possible permutations of every state in the design and write dedicated direct tests to cover all of these. Adding code and functional coverage, as well as exhaustive random tests probably achieves almost 100% coverage.

The major drawback of this ideal system is that it is not applicable in verifying SoCs. SoCs are too big and complex to cover all permutations of all states. In addition to this,

most SoC projects are very limited in computing resources and engineering resources (man-power) and just cannot afford this kind of methodology. However, they also cannot afford the impact of a re-spin (another tapeout) as a result of a critical bug found in first silicon.

For these reasons, we developed a system/sw based verification methodology. The following sub-sections describe in detail the methodology we used to verify and tapeout a fully-functional chip within a very tight schedule – a single calendar year from conception to tapeout.

A. SystemBasedVerification

Unlike many verification methodologies which are either functional driven, feature driven, or both, we use a system driven methodology. The system driven methodology begins with the work of the verification team.

Our verification engineers are basically system design engineers. The same system design engineer, who defined the sub-system of a single block, will also be the one who verifies the block. The ASIC designer of the unit works hand in hand with the system engineer, who is responsible of writing the test plan and executing it. This is different from most silicon companies, in which a dedicated verification team is usually in charge of the verification and the system team is in charge only of defining the system.

Since the system engineer is system and software oriented, the test plan he is responsible for will aim to verify the system and not the hardware, i.e. functional or feature coverage is not the metric in this case, but system coverage is.

So, basically, the tests are directly aimed to verify the system’s functionality and not the hardware functionality. We use this method to narrow down the permutations we need to check, since, instead of verifying every possible combination in our hardware, we only aim to verify those that are used in our system.

By using this method we significantly reduce the amount of tests needed, debug time, and the verification schedule. We also narrow down the “real” bug rate since we are specifically aiming to find relevant bugs and not other bugs which we consider to be “non-real”, in the sense that they do not “really” impact the system functionality or will ever occur under real system operation conditions.

B. SWdrivenflow

Almost every SoC operation relies heavily on SW. This infers, that the SoC consists of many hardware peripherals which the DSP or CPU programs and controls, and which serve as co-processor accelerators, whose purpose is to offload the DSP or CPU from performing tasks which are very mips (million instruction per second) demanding and can be more efficiently performed by hardware.

Our verification methodology and tools were developed with real/operational software design in mind. Since the system designers know how to control and configure the system, and since main vehicle for doing that is the DSP core in the FLE sub-system, the block level tests were written as close as possible to operational software/firmware. This provided our verification with two main advantages:

1) It was afterwards feasible to use some of the code written for verification as real operational code during FW development.

2) Higher priority bug detection was possible – those which will prevent the system from working under operational conditions, i.e. bugs which will prevent us from using the operational firmware, which we write and control.

On top of making our verification flow FW/SW oriented, our system design is very robust in terms of preparing software hooks that will help us circumvent hardware bugs using software. This enables us with higher probability to productize the first tapeout of the SoC silicon even if we find bugs in silicon.

With this design methodology in mind, we will show how to prioritize the tests, in order to come up with a verification process which is highly manageable in risk, schedule, and effort.

C. The Test Plan

“The increasing complexity of today's designs has only served to further intensify the pain of functional verification. Any strategy for success here must include a verification test plan - one that trades brute force with finesse. In so doing, not only is the pain reduced, but additional benefits are quickly derived, such as greater predictability, more aggressive innovation and late stage spec changes which can be made with confidence.” [2]

For these very reasons, we decided to construct our test plan as follows: The system designer – who is now functioning as the block verification engineer – will first write a verification plan.

This verification plan will be divided into a number of sections denoting a different system mode/scenario. Each section contains as many tests as necessary in order to properly cover the particular system mode/scenario. Logically, the system engineer will emphasize in the test plan those tests, which cover system scenarios that the SW cannot execute in case of a crucial bug.

Moreover, the test plan will include as many tests as possible which are based on real stimuli and reference vectors which comply with the standard on which the system is based (3GPP standard in our case). This is taken from test equipment which is qualified to for use specifically for 3GPP-3G communications.

During the verification cycle, the test plan will go through several reviews and be approved prior to execution.

D. Setting The Test Priorities

After the test plan is completed, it would go through test prioritization. Setting the test priorities is never a simple task and much responsibility lies on getting the priorities right since the priorities set the order of the test plan execution and debug.

The priorities range from “must have” tests which are the highest priority tests to “nice to have” tests which will be written only if the schedule allows for it. We used the following guidelines to prioritize the tests.

The test is rendered “must have” if the following conditions are met:

1) The test is designated to check a crucial system scenario, process or feature.

2) This scenario must be hardware exclusive, i.e. if SW can replace the hardware in this scenario, or circumvent a bug in this hardware, this test will not be given the highest priority but one below it instead.

The test is categorized under “nice to have” if the test aims to check a non-crucial system scenario and this scenario can be executed in SW.

All the tests with priority above “nice to have” must be written and executed by order of priority, within the verification schedule.

This method gives us the advantage of developing, in the right order; the most important and focused tests first, in order to check those parts of the system’s hardware that cannot be worked-around in the case of a silicon bug.

The resulting test-suite (after test plan implementation) is very comprehensive, and covers the hardware not only in terms of the traditional code/functional metrics but also in terms of system coverage, which make it more robust and more efficient both in test length and effort of test development.

E. Bottom Up Methodology

We decided to use a bottom up methodology for our verification environment. Therefore, we started by verifying a system block according to the test plan. We decided to use SystemC for our test-bench environment [3] as depicted in Figure 2 for the following reasons:

1) We could not afford to wait for system or chip level integration before we began the verification process. We wanted to start verifying the block immediately after (and sometimes even before) RTL0 – our first RTL readiness milestone, was declared.

2) No less important than the first reason, is an easy migration to system level (DSP level) tests. Since we write our tests in C++ we prepared several user functions for the purpose of writing and reading system-block registers, and for driving and sampling interface signals. Any code that was related to DSP transactions was easily translated into real DSP code once we moved it into the integrated FLE sub-system verification environment.

3) We wanted the code that was used for verification to be as close as possible to the real FW operational code we would write for the DSP. Writing in C and C++ directly accomplished this goal, resulting in code that can later be used as the baseline for operational FW. We also used Matlab vectors which were proven by the standard by using pre-validated vectors taken from 3GPP 3G compliant test equipment.

F. IntegrationTests

As shown in Fig. 1, our SoC contains few internal and proprietary interfaces which we needed to thoroughly verify. For interfaces between different system blocks inside the FLE sub-system, we used the same method as depicted in Figure 2, with the exception that those two system blocks were wrapped by a SystemC testbench. We tested these in the block level environment, and then easily ported the environment to system level in the same way used for a single system block.

The proprietary interfaces which cross the FLE sub- system boundaries, such as chip IO interfaces or interfaces between the FLE sub-system and the CPU sub-system were verified directly in the full-chip level environment by writing dedicated tests which exhaustively check them.

G. Common Library

During RTL development, we used a rich set of common RTL library modules, which we developed in-house. Elements such as smart registers, up-dn counters, PN generators, barrel-shifters etc., were pre-designed and pre- verified on a stand-alone basis. During RTL coding, the use of this common library was mandatory for all designers. By pre-verifying the most common RTL building blocks, we limited the amount of low-level hardware bugs. In addition, even if a bug was found at a low-level common module, fixing it only had to be done in one place – the common library – in order to solve it across the entire design.

H. Use Of Verified IPs

Our CPU sub-system was IP-core based. That is, we integrated the MIPS 24K processor core with pre-verified IP cores (the encryption blocks are the exception to this, as they were developed in-house), using standard industry bus architectures such as OCP/AHB/APB. This sub-system, was easier to verify, since, it resembled the “classical” system, consisting of the integration of IP components on a standard bus architecture [4].

Our basic assumption for verification – which is true by definition – is that IP-Cores are pre-designed, pre-verified hardware blocks [5]. Our verification strategy in the CPU sub-system was therefore much less complicated than the FLE sub-system case. We therefore focused our efforts almost entirely on integration testing, whilst verifying the inter-block integration, thus saving significant verification resources, time and effort.

V. SUMMARY AND CONCLUSIONS

The methodology we described in detail enabled us to efficiently verify our very complex SoC, which is composed of two major sub-systems and some complex proprietary interfaces. By utilizing this methodology, we succeeded in taping out our very complex SoC IC within one year of conception, and having it functionally correct the first time.

We attribute much of this success to our chosen verification methodology with emphasis on the following main aspects.

1) The system-based verification as opposed to functional or feature based verification. It is significant to note, it is the use of our system engineers as verification engineers forms the basis of our methodology. It is what allows for our methodology to be a truly practical and efficient methodology that produces very efficient results.

2) Our test writing (as close to real FW as possible, and using standard based vectors as stimuli and reference) and test prioritization method, which enabled us to pin-point system bugs and detect them within our schedule.

3) The flexible SystemC based verification environment, which enabled a very easy and automatic migration from system-block level tests to sub-system and full-chip level environment.

REFERENCES

[1] Bacchini, Francine; Hu, Alan J.; Fitzpatrick, Tom; Ranjan, Rajeev; Lacey, David; Tan, Mercedes; Piziali, Andrew; Ziv, Avi, “Verification Coverage: When is Enough, Enough?,” Design Automation Conference, 2007. DAC '07. 44th ACM/IEEE, pp. 744 - 745.

[2] Bacchini, F.; Malik, S.; Bergeron, J.; Foster, H.; Piziali, A.; Mitra, R.S.; Ahlschlager, C.; Stein, D., “Building a verification test plan: trading brute force for finesse,” Design Automation Conference, 2006 43rd ACM/IEEE, pp. 805-806.

[3] Yarom, I.; Glasser, G., “SystemC opportunities in chip design flow,” in Proceedings of the 2004 11th IEEE International Conference on Electronics, Circuits and Systems, December 13-15, 2004, pp. 507- 510.

[4] Deshpande, A., “Verification of IP-Core Based SoC’s,” in 9th International Symposium on Quality Electronic Design, March 17- 19,2008, pp. 433-436.

[5] Guy Mosensoson., “Practical Approaches to SOC Verification” Verisity Design, Inc.

About the Author:

Guy Regev is a co-founder and Managing Partner of AlephZero Consulting. He is a veteran of the IC design Industry, with over 20 years of both management and hand-on expertise across all technical disciplines of chip and FPGA design. Extensive experience managing cross-functional HW/SW/FW international projects as well as hands-on experience through all aspects of the chip design/FPGA and productizing, with a proven track record of successful, time-crunched tape-outs and market intros of flagship products. He is also an Expert Witness for cases that involve hardware, chip, IC Design, SoC Design or FPGA design, as well as EDA tools, software, and embedded firmware. More about him at: https://www.guyregev.com/

Alephzero

A Methodology for Timely Verification of a Complex SoC/CHIP

About the Author:

Recent Posts

Comentários