Fault Localization

Are you looking for more information regarding this and other subjects? Look no further than Semitracks' Online Training. Semitracks' Online Training contains short courses and other material useful for any Engineer trying to learn new subjects or brush up on old ones.

What is Fault Localization?

Fault localization is the process of tracing back signals through an integrated circuit to locate the first failing node. This process can be performed using either mechanical probing or electron beam probing. The task of fault localization is heavily dependent on the design of the IC and usually requires individuals knowledgeable about the design of the IC and the test patterns needed to stimulate the IC. Although some tools, such as Schlumberger's Diagnostic Assistant, exist to help automate this process, fault localization is usually a manual task with highly complex ICs.

Proper design and test considerations up front can help reduce the complexity of this task. Techniques such as scan design, IDDQ testing, and structured design and test principles can eliminate many hours of fault isolation later on during design validation and qualification.

Why Perform Fault Localization?

Fault localization is necessary to determine the location of the defect. In most cases, the location of the defect is necessary to determine the root cause of failure. It is worth noting, however, that fault localization may not be necessary to give a satisfactory response to the requester. Many times, the requester is satisfied with a determination as to whether the defect is a wafer fabrication problem, a packaging problem, a testing problem, or an end use problem such as EOS/ESD.

How is Fault Localization Performed?

Once the IC has been characterized electrically and the packaging material has been removed, the signal should be traced from the failing node back into the IC. At some point, gate causing the incorrect output can be found. While this sounds simple, with the complexity of modern ICs and the high fan outs found on many nodes, this process can get very tedious very quickly. Other non-contact methods, such as voltage contrast, are better suited to handle complicated ICs, but if they are not available, mechanical signal tracing is a very inexpensive alternative.

After the IC is placed in the failing electrical state and contact can be made to the metal or poly layer of interest, the rest is easy. Start from the failing output and trace the signals backward until a node is found in the wrong state. A common way to determine if a node is in the wrong state is to probe the same node on a comparison IC that is in the same electrical state. If possible, placing the two ICs side by side under the same probe station will help facilitate this process. Care must be taken at all times not to damage the IC with the probes.

The most cumbersome part of this process is dealing with complicated branching that takes place on many circuits. Remember that these ICs were not laid out for ease of signal tracing for failure analysis.

General Guidelines

  1. Utilize the IC designer or design team if available. These individuals will have the best understanding of how the circuit works and can help you choose probe points that will more efficiently isolate the failing circuitry.
  2. Utilize the test engineer or test engineering team if available. These individuals can most likely make modifications to the test program to help you isolate the failure. Most complex circuits are tested in a hierarchical fashion. Specific test routines should test certain blocks of the IC. The test engineer may be able to help you isolate the failure to a specific block or further by modifying the test program.
  3. Have an analyst who is familiar with this product or product line help you. Even if he or she has only limited experience with the product (i.e. has done it once before) that experience may still be better than nothing.

Specific Guidelines

  1. If your company has guidelines developed for troubleshooting certain ICs or classes of ICs, please refer to the guidelines. Better yet, incorporate those guidelines into this help document to provide a single source of information for the analyst. If your guidelines exist only in one or two analyst's minds, have them write down the guidelines for incorporation into this help document. You don't want to loose that information to retirement, transfer, or other calamity.
  2. These guidelines below assume that you can stimulate the IC with assembly code.

  3. For RISC (Reduced Instruction Set Computing) Processors do the following:
  4. Probe the address strobes to determine whether the failure is a data failure or a control failure. If the failure is a data failure, divide and conquer using a binary search strategy on the line. If the failure is a control failure, probe the status bits for clues as to the location of the failure. Use the status bits as a starting point for a backwards trace.

  5. For CISC (Complex Instruction Set Computing) Processors do the following:
  6. Probe the address strobe. Next, probe the micro address buslines one at a time. Probe the clocks on the IC and group the signals together in time to evaluate them. Look at the micro address decode boxes to determine the transfer gates to probe. Probe the obvious failure point first (this is probably the bond pad). Next, probe the transfer gates using layout software to drive to the appropriate transfer gate if applicable. Start with the micro address box right before the failure is visible from the pins and work backwards through the vector set.

  7. For memory ICs such as RAMs, ROMs, EPROMs, EEPROMs, etc. do the following:
  8. Obtain the physical layout to electrical data diagram from the designer or product engineer. Next, using the test data results, map the electrical failures into the physical address space. This should allow you to determine if the failure is a single bit, single row, single column, multiple columns, or a block failure. You should also be able to determine if the failure is potentially in the address control logic. Once you know where to look on the IC physically, you can begin examining waveforms using an electron beam probe or mechanical probing. The following is an example of this process.

SN746 failed functional testing with the following pattern. Data at locations (addresses) 0, 192, 256, 704, 768, 1216, 1280, etc. were failing when checkboard, inverse checkboard, and unique address patterns were written and then read. The failures occurred across all eight bits of the data byte in a random fashion. The failure mode indicated that incorrect data were being read from the IC, suggesting a possible problem with addressing.

After exhausting the rapid localization techniques, we then moved to the electron beam probe to begin troubleshooting the three ICs. AN ATE provided stimulus to the IC in the electron beam probe. The diagram below, which describes the physical layout of the memory array, facilitates the following discussion (see Figure 1). The EEPROM uses X and Y decode signals to select rows. The X decode is a set of eight signals based on the values of address lines 6 through 8. The eight Y decode signals are based on the values of address lines 9 through 11. For example, if A8-6 has the value 011, then the X3 signal is high or active, if A8-6 has the value 101, then the X5 signal is active, and so on. The Y signals work in the same fashion as the X signals.

The failure on SN746 (0, 192, 256, 704, 768, etc.) produces the following binary sequence. This corresponds to the address lines of interest, which are A6-A8 (X decode signals). In particular, when the X group of addresses switches from X2 to X3 or from X3 to X4, the data on the first address is incorrect. This would indicate a delay on signal line X3. When signal line X3 was examined in the electron beam probe, a delay was observed after a particular metal 1 - metal 2 contact just before the signal line goes out to the memory array (see Figure 2).

Bit sequence for SN746
Y X
Addresses A12 A11 A10 A09 A08 A07 A06 A05 A04 A03 A02 A01 A00
0 0 0 0 0 0 0 0 0 0 0 0 0 0
192 0 0 0 0 0 1 1 0 0 0 0 0 0
256 0 0 0 0 1 0 0 0 0 0 0 0 0
704 0 0 0 1 0 1 1 0 0 0 0 0 0
768 0 0 0 1 1 0 0 0 0 0 0 0 0
1216 0 0 1 0 0 1 1 0 0 0 0 0 0

When is Fault Localization Performed?

Fault localization is performed usually after faster isolation techniques have been tried. You should first utilize techniques such as light emission, liquid crystal, fluorescent microthermographic imaging, CIVA, and LIVA before performing fault localization. In addition, before setting up for a manual fault isolation effort, check to see if there are any scan or IDDQ based techniques you can first try.

Fault localization using an e-beam probe should be performed before the top glass layer is removed. Fault localization using mechanical probes should be performed after the top glass layer is removed.

Figures

Figure 1 - Electron beam tester waveforms showing the internal node signals related to the fault.

References on Fault Localization

  1. S. L. Vaughan, "A Failure Analysis Approach to Fault Isolation on Complex Microprocessors and Microcomputers, in Proc of ISTFA, 1991, pp. 145-155.
  2. R. P. Kunda, "Fault Localization in Full-Scan Designs," Proc. Int. Symp. for Test. and Fail. Anal., Nov. 1993, pp. 121-127. (More theoretical)
  3. L. Doucet, G. Billus, D. DePaolis, A. Mehta, D. Robertson, A. Malamy, M. Murphy, S. Rathman, S. Perrin, T. Shay, "Fault Isolation for Failure Analysis Using JTAG Scan," Proc. Int. Symp. for Test. and Fail. Anal., Nov. 1993, pp. 113-119. (Describes basic operation of a scan test tool for the SUN Microsystems SuperSparc processor)
  4. Array Test technology designs

  5. T. Gheewala, M. Hardikar, B. Hunsiker, R. Mehta, B. Murray, B. Plata, "Built-In Diagnostics," Proc. Int. Symp. for Test. and Fail. Anal., Nov. 1993, pp. 109-112.
  6. IDDQ based fault localization

  7. R. C. Aitken and P. C. Maxwell, "Design for Diagnosis with IDDQ Measurements," Proc. Int. Symp. for Test. and Fail. Anal., Nov 1993, pp. 151-154.