Reduction in ATE Test time for Core Wrapped blocks by Avoiding Q->SI atspeed timing

Guy Regev
May 1, 2019
7 min read

Updated: May 8, 2019

Sreenath Mandagani Guy Regev

CSIG/NPG/CID/Wireless Access, Intel

December, 2016

What is the problem being solved by this work, or future challenge addressed?

Increasing design sizes and scan shift power limit is forcing designers to test physical blocks separately on ATE. Current DFT insertion tools offer a core wrapping feature which is used to wrap physical blocks, in order to a) test them stand-alone and b) to enable testing of faults between blocks. This feature is also used to avoid interface X sources. The main issue with current solutions is that, it needs designers to close at-speed timing for Q->SI paths for all these registers to avoid X sources. This issue pertains to all projects which have their blocks core wrapped using current art DFT insertion tools. Q->SI paths at-speed timing closure where the registers are physically far apart in the block can over-constrain a high speed design, and will hurt the real functional timing arcs, moreover, Q->SI timing arc between two different synchronous clock domains is difficult to time. Not closing timing for Q->SI paths will create X sources which will increase ATE vector count, test time and cost. Our main challenge was to implement an automatic work around scheme which current DFT insertion tools support while not disturbing any existing scan insertion flows.

What specifically is new in this work?

We present an innovative solution for this problem by offering an automatic flow that generates customized structure of shared wrapper cells that is used for Core wrapping solution. Current DFT insertion tools are also discussed in comparison to our solution.

Abstract

Current solutions have 2 built-in cell types which are used to wrap the ports of the physical blocks. A “dedicated wrapper cell” which is in general used to register slow input/output ports, here we break the functional timing path by creating new register cell and test the faults. A “shared wrapper cell”: If we want to test the path for real speed, we will have reuse the existing functional registers in the wrapper chains as “shared wrapper cells”.

Current tool Implementation of Shared Wrapper Cells

There is an enhanced core wrapping feature which reuses functional registers as shared wrapper cells, based on user controlled parameters like combinational depth limit, max fanout limit from the ports. This feature is also a must use for SLOS designs. Functional registers talking with input ports will be stitched in input wrapper chains with IWRAP_SE as scan enable and functional registers talking with output ports will be stitched in output wrapper chains with OWRAP_SE as scan enable.

Figure 1 - Current Implementation of Shared Wrapper Cells. (If IWRAP SE is high, we have to close unnecessary Q to SI Path for high speed to avoid X sources)

Fig 1 shows the structure of shared wrapper cells implemented by the toold. Here we can see that when the block is in INTEST mode (test for targeting faults inside the block), we have to keep IWRAP_SE to 1’b1 during capture to avoid X sources from the ports. This creates Q->SI timing arc which needs to be timed and closed for functional speed. For Stuck-at (usually relatively slow) the Q->SI timing arc is similar to Shift Q->SI in most of the cases, so it is not an issue.

But for transition faults Q->SI timing closure for a high speed clock can be big issue. Closing these non-functional, but full speed paths can prove difficult, especially if the chain registers are physically far apart due to functional reasons, and touching them or moving them for test purposes might hurt functional timing. If we plan to mask the flops for transition or allow X sources into the system, we increase the patterns coun

t & lose coverage, and the whole intended benefit from the core wrapping solution is not completely utilized.

Custom Structure for Shared Wrapper Cells

Fig 2 shows the custom implementation for the shared wrapper cells, here there will be an additional mux in the SI path to break the Q->SI timing arc and loop the existing (previous) value of the flop for transition fault testing scenarios where X sources need to be avoided. Scan enables will be used like a regular scan enable where it will be 1’b0 in capture when we want to avoid Q->SI at speed arc. Depending on Synchronous Launch on shift (SLOS) or Launch on capture (LOC) method of at speed testing in the design, we can implement the changes either closer to each and every shared wrapper cell in case of SLOS or can have single circuit for whole wrapper chain and save gate count in case of LOC.

There are couple of ways to implement the discussed custom shared wrapper cell implementation:

1. By defining a custom cell to replace the default shared wrapper cell implemented by the tool. This particular method has tool limitation since the tool won’t allow us to use cell swap feature combined with their max_reuse feature. We are in the process of working with the tool vendors to incorporate the change for next tool revisions.

Fig3 shows the structure and the commands for defining this.

2. In order to work around the tool limitations mentioned above we developed an automatic insertion flow described below:

Custom Modification of Shared Wrapper Cells. (Shared wrapper cell holds value avoiding both, need for closing Q-> SI atspeed timing arc & avoid X sources) — Figure 2 - Custom Modification of Shared Wrapper Cells. (Shared wrapper cell holds value avoiding both, need for closing Q-> SI at-speed timing arc & avoid X sources)

Figure 3 - Optimized Custom Shared wrapper cell. (Default structure used for SLOS designs from TCL script)

TCL Script Based Automated Solution to avoid Q->SI path:

TCL procedure script is developed and is sourced after the execution of <insert_dft> command in DFT compiler. The rest of the existing scan insertion flow (script(s)) needs no change for our solution to work. The TCL Procedure automatically collects the shared wrapper cell information in the design and implements the fix as shown in Fig2, while using user provided Intest and Extest enable pin path information to make all the necessary connections. All the new cells are implemented in the same hierarchy as the shared wrapper cell, which makes it easier to close timing. Scandef will automatically contain (“remember”) the new mux structure and can be used for scan chain reordering with no problems. The TCL script can support any process technology. Designs with existing false path to the SI pin of the shared wrapper cells in functional timing mode will be modified to enable new loop back path timing arc and also path from scan enable pipeline register in case of SLOS based designs. TCL script will automatically take care of these requirements with modify_si_timing switch.

Usage: q2sifix # Fix Q->SI transition path for core wrapper (border sealed) block

-intest_enable {Required INTEST_MODE Enable Internal Pin/Port}

-extest_enable {Required EXTEST_MODE Enable Internal Pin/Port}

-test_mode {Optional test mode name where wrapper chains are active. Default: Internal_scan}

-modify_si_timing {Optional enable/disable where SI timing constraints are modified automatically. Default: disable}

Results

This new flow has already been tested using DFT compiler and we have successfully avoided Q->SI paths for at-speed testing. We are seeing a major gain from reduction in at speed pattern count, and also slight improvement in coverage compared to the design where we allowed X sources into the design by opting not to close Q->SI at speed timing arc. X sources in designs with scan-compression implemented, will reduce observability and increase the ATE vector count. We have seen about 10% to 30% decrease in pattern count varying between designs, when using the custom shared wrapper cells. There is also improvement in test coverage of about 0.05% to 0.15%. Pattern count reduction can be even higher if we skip this additional coverage improvement for baseline comparison.

Following table shows block coverage/pattern numbers as comparison between schemes.

The idea we are proposing is a re-design of the core wrapping shared cell in order to avoid the “parasitic” non-functional at-speed paths while not creating any X sources and thus keeping the pattern count as small as possible. On top of redesigning the wrapper cell, we have also implemented an automatic flow that inserts these cells automatically given any design, and with almost no interference to the current project scan insertion flow/scripts.

Results: We are seeing about 10-30% reduction in at speed pattern count, and also slight improvement in coverage of about 0.05%-0.15% compared to the design where we allowed X sources into the design by opting not to close Q->SI at speed timing arc. Since the overall test content of any product is heavily dominated by at-speed content, the % savings in final test time may be higher.

This issue pertains to all projects which have their blocks core wrapped (border sealed) using current DFT insertion tools. Q->SI paths timing closure where the registers are physically far apart in the block can over-constrain a high speed design, and will hurt the real functional timing arcs. Moreover, Q->SI timing arc between two different synchronous clock domains is difficult to time properly. Not closing timing for Q->SI paths will create X sources which will increase ATE vector count, test time and cost.

Conclusion

To summarize, we present a design and an automatic flow that is solving a limitation in current DFT insertion tools. The limitation is that the implementation they chose for core wrapping (border sealing) introduces “parasitic” undesirable non-functional at-speed timing path that if the designer doesn’t close will hurt test coverage and pattern count.

By designing a new core-wrapping cell to be used instead of the DFT Compiler shared wrapper cell, we have eliminated the unwanted non-functional at-speed Q->SI paths. By doing that we have improved two things: 1. eliminated the need to close at-speed timing for the Q->SI paths which creates longer project schedules and costs, and 2. if decided not to close timing for these paths suffer the penalty of increased pattern count and therefore tester time and memory.

To close, we would like projects to consider our solution since it is 1) automatic, 2) easy to implement and 3) will reduce pattern count and tester time when compared to the same scan insertion flow that doesn’t use our solution. The solution is proven and is currently used by two projects.

About the Author:

Guy Regev is a co-founder and Managing Partner of AlephZero Consulting. He is a veteran of the IC design Industry, with over 20 years of both management and hand-on expertise across all technical disciplines of chip and FPGA design. Extensive experience managing cross-functional HW/SW/FW international projects as well as hands-on experience through all aspects of the chip design/FPGA and productizing, with a proven track record of successful, time-crunched tape-outs and market intros of flagship products. He is also an Expert Witness for cases that involve hardware, chip, IC Design, SoC Design or FPGA design, as well as EDA tools, software, and embedded firmware. More about him at: https://www.guyregev.com/

Alephzero