# RADIATION-INDUCED SOFT ERROR HARDENED LATCH DESIGN TECHNIQUES FOR RELIABILITY AND ENERGY-EFFICIENCY IMPROVEMENTS 信頼性および電力効率向上のための放射線起因ソフトエラー

耐性ラッチ回路設計に関する研究

February 2020

Saki TAJIMA

田島 咲季

# RADIATION-INDUCED SOFT ERROR HARDENED LATCH DESIGN TECHNIQUES FOR RELIABILITY AND ENERGY-EFFICIENCY IMPROVEMENTS 信頼性および電力効率向上のための放射線起因ソフトエラー 耐性ラッチ回路設計に関する研究

February 2020

Waseda University

Graduate School of Fundamental Science and Engineering

Department of Electronic and Physical Systems

Research on Integrated System Design

Saki TAJIMA

田島 咲季

### Abstract

With complementary metal-oxide semiconductor (CMOS) technology shrinking and power supply voltage dropping greatly, the reliability issue has become much more critical than ever before. In general, reliability failures include systematic issues such as process/voltage/temperature (PVT) variations and aging effects (e.g. hot carrier injection (HCI) and bias temperature instability (BTI), and random failures such as soft errors caused by radiations. Among these reliability failures, radiation-induced soft errors are becoming one of the most critical concerns in state-of-the-art very large-scale integration (VLSI) designs. Soft error is a temporary error caused by collisions of radiation particles like alpha particles and high energetic neutrons on a circuit. When radiation particles strike a sensitive node of a circuit, electron-hole pairs are generated and then are swept across the diffusion junction to be collected to the diffusion of a PMOS or NMOS. If the collected charge exceeds the critical charge, the data of a PMOS transistor will be temporarily changed from low to high while the data of an NMOS transistor is changed from high to low. As process technology continues scaling down, the critical charge of internal nodes is reduced due to the reduced capacitance, which results in the increased soft error rate (SER) in modern integrated circuits (ICs), and the influence of charge sharing would cause multiple-node-upsets (MNUs). Furthermore, soft errors are generally caused in-field, and due to the temporary property, they can be recovered automatically if given some additional time or through reset operations. However, the unpredictable occurrence of soft errors might lead to severe system failures in critical designs such as medical devices, aircraft and high-performance supercomputers. Consequently, radiation-induced soft error hardened design techniques have become essential to guarantee systems' reliability.

In literature, several design techniques such as hardware redundancy methods (e.g. triple-modular-redundancy (TMR)), error-correcting code (ECC)-based memory, soft error aware physical designs, soft error tolerant designs, and error-detection-based methods have been proposed to guarantee the reliability of ICs. Since most of existing works are based on hardware redundancy for error recovery or error tolerance, the corresponding soft error resilience usually comes at the price of increased area, power, and/or delay penalties, which becomes the challenge for next generation energy-efficient MNU-tolerant designs. In addition, soft errors occurring in a latch can lead to an upset at the latch output, which may propagate through the succeeding combinational logic and is captured by the next-stage storage element; however, this issue was often neglected in previous works. Therefore, it is desired to develop architecture level reliable radiation-hardened latch designs.

In this dissertation, three soft error hardened latch designs are proposed for architecture level reliability and energy efficiency improvement. The first one is a low-cost soft error hardened latch (SHC) design using a novel Schmitt-trigger-based C-element (STC), which features small area overhead and low power consumption for single node upset (SNU) tolerance. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy or transistor upsizing, the proposed latch is implemented through double-sampling and node-checking by using the STC design for soft error tolerance improvement.

The second latch design is a power-efficient single node upset hardened latch with in-situ error detection capability (EDSL) by incorporating SNU self-recovery ability with in-situ error detection capability for reliability improvement against soft error. Not only can the proposed EDSL recover from any incurred single event upset, it can provide in-situ error detection capability when the latch output is upset. EDSL design consists of the error tolerant part and the error detector part which can detect the transition caused by any soft error on the output.

The third one is an output transition detector-based radiation-hardened latch (TDRHL) for power efficiency and reliability improvement of critical designs against both single- and multiple-node upsets. The proposed TDRHL design contains a baseline latch, an error recovery assistant logic (ERAL) and an output transition

detector (QTD), in which the error recovery assistant logic is optimized from the existing single-event-induced-double-node-upset-tolerant latch (SEID) for performance improvement and power reduction, and the output transition detector is proposed to provide architecture level recovering capability. TDRHL can 1) recover from any SNUs and DNUs to its correct state, and 2) generate a warning signal for architecture level recovery only when the latch output is flipped. It should be noted that TDRHL is the only latch that can recover from any SNUs and DNUs, and provide architecture level resiliency. Moreover, the PDP results clearly illustrate the power efficiency of TDRHL.

In this dissertation, three radiation-induced soft error hardened latch designs are proposed for reliability and energy-efficiency improvements, which can be viewed as i) SNU hardened design, ii) SNU hardened and detection-based design, and iii) MNU hardened and detection-based design, respectively. As process technology continues scaling down, radiation-induced soft errors are becoming one of the most critical concerns in state-of-the-art IC designs. Due to the transient property of soft errors, architecture level radiation-hardened design techniques should be developed to guarantee systems' reliability. Therefore, the presented works embodied in this dissertation can be further studied involving the improvement of the proposed latch designs for architecture level protection of critical systems. On the other hand, during the development of this dissertation, it is observed that more and more research works on the next generation non-volatile memories (NVMs) such as phase change memory (PCM) and spin transfer torque RAM (STT-RAM) have been proposed recently, which is believed to gain more and more attention in the coming intelligent era. Therefore, another direction of the future research direction involves developing architecture level soft-error tolerant NVMs for reliability improvement of future memory systems. The works described in Chapter 4 and 5 of this dissertation will be the starting point by extending the latch design techniques to memory cores.

## Acknowledgments

I would like to express my profound gratitude and appreciation to my advisor, Professor Youhua Shi, for his constant guidance, support, encouragement and understanding me deeply throughout my years at Waseda University. I could not write this thesis and continue to research without him. He models many of the high quality characteristics that I aspire to emulate during my professional and personal life. Working with him has been and will continue to be a source of honor and pride for me.

I would like to express my sincere gratitude to Professor Masao Yanagisawa for being my associate advisor and being in my dissertation reading committee. Without him, I cannot challenge to get into a doctoral program in Waseda University. I am very grateful for his invaluable help during the preparation of this dissertation and several papers since I was undergraduate.

I would like to also thank Professor Takashi Tanii for being in the reading committee of my dissertation. It will always be a source of honor for me to have had the names of these world-class professors on my dissertation. I also show my greatest appreciation to all professors of department of electronic and physical systems. And I also thank Professor Shinji Kimura and Professor Toshihiko Yoshimasu for valuable advices.

I also would like to express my appreciation to Professor Nozomu Togawa for being my associate advisor of my master course.

I have greatly appreciated all of the students and the colleagues at Information System Lab.

Finally, I deeply appreciate my family and friends for their many years of support.

## List of Acronyms

| BTI                                                              | Bias Temperature Instability            |  |  |
|------------------------------------------------------------------|-----------------------------------------|--|--|
| CMOS                                                             | Complementary Metal-Oxide Semiconductor |  |  |
| DEM                                                              | Double Exponential Model                |  |  |
| DNU                                                              | Double-Node Upset                       |  |  |
| ECC                                                              | Error Correcting Code                   |  |  |
| ERAL                                                             | Error Recovery Assistant Logic          |  |  |
| FERST                                                            | Feedback Redundant SEU-tolerant         |  |  |
| FIT                                                              | Failure in Time                         |  |  |
| HCI                                                              | Hot Carrier Injection                   |  |  |
| $\mathbf{HT}$                                                    | Hardware Trojan                         |  |  |
| IC                                                               | Integrated Circuit                      |  |  |
| MCU                                                              | Multiple Cell Upset                     |  |  |
| MNU                                                              | Multiple-Node Upset                     |  |  |
| <b>MOSFET</b> Metal-Oxide-Semiconductor Field Effect Transistors |                                         |  |  |
| MTBF                                                             | Mean Time Between Failures              |  |  |
|                                                                  |                                         |  |  |

**NBTI** Negative-Bias Temperature Instability

- **NMOS** n-channel MOSFET
- **NVM** Non-Volatile Memory
- **PCM** Phase Change Memory
- PDP Power-Delay Product
- **PMOS** p-channel MOSFET
- **PTM** Predictive Technology Model
- **PVT** Process/Voltage/Temperature
- **QTD** Output Transition Detector
- **SEID** Single Event Induced Double Node Upset Tolerant Latch
- **SER** Soft Error Rate
- **SET** Single Event Transient
- **SEU** Single Event Upset
- **SHC** Soft Error Hardened Latch
- **SNU** Single Node Upset
- **STC** Schmitt-trigger-based C-element
- STT-RAM Spin Transfer Torque RAM
- **TDRHL** Transition Detector-based Radiation-Hardened Latch
- **TFH** Transient Fault Hardened
- **TMR** Triple Modular Redundancy
- **TNU** Triple-Node Upset
- **VLSI** Very Large-Scale Integration

## Contents

| Abstract      |       |                                                  | $\mathbf{iv}$ |
|---------------|-------|--------------------------------------------------|---------------|
| A             | cknov | wledgments                                       | vii           |
| $\mathbf{Li}$ | st of | Acronyms                                         | viii          |
| 1             | Intr  | roduction                                        | 1             |
|               | 1.1   | Motivation                                       | 2             |
|               | 1.2   | Contributions                                    | 4             |
|               | 1.3   | Dissertation Organization                        | 6             |
| <b>2</b>      | Soft  | E Errors and Existing Hardened Design Techniques | 7             |
|               | 2.1   | Reliability Issues                               | 8             |
|               | 2.2   | Radiation-induced Soft Errors                    | 11            |
|               | 2.3   | Soft Error Hardened Design Techniques            | 15            |
|               |       | 2.3.1 SNUs hardened methods                      | 15            |
|               |       | 2.3.2 DNUs hardened methods                      | 23            |
|               |       | 2.3.3 Error detection methods                    | 28            |
|               | 2.4   | Chapter Conclusion                               | 33            |
| 3             | SHO   | C: SNU Hardened Latch Design                     | <b>34</b>     |
|               | 3.1   | SHC Latch Design                                 | 35            |
|               | 3.2   | Evaluation of SNU Tolerance                      | 40            |
|               | 3.3   | Comparison Results                               | 46            |

|                                                 | 3.4   | Chapter Conclusion                           | 52 |
|-------------------------------------------------|-------|----------------------------------------------|----|
| 4 EDSL: SNU Hardened Latch with Error Detection |       |                                              | 53 |
|                                                 | 4.1   | EDSL Latch Design                            | 54 |
|                                                 | 4.2   | Evaluation Results                           | 59 |
|                                                 | 4.3   | Chapter Conclusion                           | 63 |
| <b>5</b>                                        | TD    | RHL: MNU Hardened Latch with Error Detection | 64 |
|                                                 | 5.1   | TDRHL Latch Design                           | 65 |
|                                                 | 5.2   | Evaluation of MNU Tolerance                  | 71 |
|                                                 | 5.3   | Comparison Results                           | 75 |
|                                                 |       | 5.3.1 Performance Comparison                 | 75 |
|                                                 |       | 5.3.2 Power Consumption Evaluation           | 76 |
|                                                 | 5.4   | Chapter Conclusion                           | 81 |
| 6                                               | Cor   | clusions and Future Research                 | 82 |
|                                                 | 6.1   | Technical Summary                            | 83 |
|                                                 | 6.2   | Future Research                              | 84 |
| Bi                                              | bliog | graphy                                       | 85 |
| Pι                                              | ıblic | ation List                                   | 91 |

## List of Tables

| 2.1 | Area, delay and tolerant ability                                        | 16 |
|-----|-------------------------------------------------------------------------|----|
| 3.1 | Truth table of C-element                                                | 36 |
| 3.2 | Comparison results with existing works.                                 | 46 |
| 3.3 | Comparisons on power consumption at various data activities ( $\mu W$ ) | 51 |
| 4.1 | Comparison results with existing works                                  | 62 |
| 5.1 | Comparison results with existing works at TT corner                     | 79 |

# List of Figures

| 2.1  | The increase of SER with technology scaling as presented in [11, Fig. 1].                                 | 10 |
|------|-----------------------------------------------------------------------------------------------------------|----|
| 2.2  | Soft error mechanism.                                                                                     | 11 |
| 2.3  | Generated pseudo soft error pulses                                                                        | 13 |
| 2.4  | Error propagation.                                                                                        | 14 |
| 2.5  | TFH latch[38]                                                                                             | 17 |
| 2.6  | FERST latch[39]. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$         | 18 |
| 2.7  | HiPeR latch[40]                                                                                           | 19 |
| 2.8  | SEH latch[13]                                                                                             | 20 |
| 2.9  | PDFSR latch[41]. $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$ $\ldots$                  | 22 |
| 2.10 | SHST latch[42]. $\ldots$ | 23 |
| 2.11 | SEID latch[43]. $\ldots$ | 24 |
| 2.12 | HRDNUT latch[44]                                                                                          | 30 |
| 2.13 | SED circuit[35]                                                                                           | 31 |
| 2.14 | sPGTD circuit[36]                                                                                         | 32 |
| 3.1  | General C-element                                                                                         | 36 |
| 3.2  | The proposed SHC latch                                                                                    | 38 |
| 3.3  | Conventional unhardened C <sup>2</sup> MOS latch                                                          | 40 |
| 3.4  | Simulation waveform of SHC latch in normal operations                                                     | 41 |
| 3.5  | Soft error simulation                                                                                     | 43 |
| 3.6  | SNU at ND1 and ND2                                                                                        | 44 |
| 3.7  | SNU at Q                                                                                                  | 45 |
| 3.8  | Propagation delay $(T_{pL2H})$ comparisons in normal operations                                           | 47 |
| 3.9  | Propagation delay $(T_{pH2L})$ comparisons in normal operations                                           | 48 |

| 3.10 | Time of recovery from high to low (SNU at Q)                                | 49 |
|------|-----------------------------------------------------------------------------|----|
| 3.11 | Time of recovery from low to high (SNU at Q)                                | 50 |
| 4.1  | Proposed EDSL design.                                                       | 57 |
| 4.2  | The operation of EDSL with the holding value 0 in the state-holding         |    |
|      | phase                                                                       | 58 |
| 4.3  | The operation of EDSL with the holding value 1 in the state-holding         |    |
|      | phase                                                                       | 58 |
| 4.4  | Simulation waveform of EDSL with various SNUs                               | 60 |
| 4.5  | Comparison of power consumption under various activity ratios               | 61 |
| 5.1  | Proposed TDRHL latch.                                                       | 66 |
| 5.2  | The operations of TDRHL with the holding value $0. \ldots \ldots \ldots$    | 68 |
| 5.3  | The operation of TDRHL with the holding value 1. $\ldots$ $\ldots$ $\ldots$ | 69 |
| 5.4  | Simulation waveform of SNUs                                                 | 72 |
| 5.5  | Simulation waveform of DNUs                                                 | 73 |
| 5.6  | Simulation waveform of TDRHL with triple-node upsets                        | 74 |
| 5.7  | Clock-to-Q delay at different process corners                               | 77 |
| 5.8  | Data-to-Q delay at different process corners                                | 77 |
| 5.9  | Comparisons on power consumption with various activity ratio                | 78 |
| 5.10 | Power-delay-product (PDP) at different process corners                      | 80 |

## Chapter 1

### Introduction

As semiconductor technology continues scaling down, the reliability issues on integrated circuits (ICs) have become much more critical than ever before. Unlike traditional hard errors caused by permanent physical damage which cannot be recovered in field, soft errors are caused by radiation or voltage/current fluctuations that lead to transient changes on internal node states, thus they can be viewed as temporary errors. However, due to the unpredictable occurrence of soft errors, it is desirable to develop soft error tolerant designs. For this reason, soft error tolerant design techniques have attracted great research interests.

This dissertation will firstly overview the reliability problems in state-of-the-art IC designs, especially to explain the soft error mechanism and to present the existing soft error tolerant design techniques. To solve the energy efficiency and soft error tolerance problems, three soft error hardened latch designs will be presented.

In this chapter, the motivation for this research will be explained. The contribution and the organization of this dissertation will be then illustrated.

This chapter is organized as follows. Section 1.1 gives an introduction to the motivation of this research. Section 1.2 outlines the main contributions of this work. Finally, Section 1.3 presents the organization of the dissertation.

#### 1.1 Motivation

With complementary metal-oxide semiconductor (CMOS) technology shrinking and power supply voltage dropping greatly, the reliability issue has become much more critical than ever before. In general, reliability failures include systematic issues such as process/voltage/temperature (PVT) variations and aging effects (e.g. hot carrier injection (HCI) and bias temperature instability (BTI)), and random failures such as soft errors caused by radiation. Among these reliability failures, radiation-induced soft errors are becoming one of the most critical concerns in state-of-the-art very large-scale integration (VLSI) designs. Soft error is a temporary error caused by incidence of radiation particles like alpha particles or high energetic neutrons in a circuit. When radiation particles strike on a metal-oxide-semiconductor field effect transistor(MOSFET), electron-hole pairs are generated and collected to the p-channel MOSFET (PMOS) or the n-channel MOSFET (NMOS) diffusions, respectively. If the collected charge is beyond the critical charge, the data of a PMOS transistor will be temporarily changed from low to high while the data of an NMOS transistor is changed from high to low.

As process technology continues scaling down, the critical charge of internal nodes is reduced due to the reduced capacitance, which results in the increased soft error rate (SER) in modern integrated circuits, and the influence of charge sharing would cause multiple-node upsets (MNUs). Furthermore, soft errors are generally caused in-field, and due to the temporary property, they can be recovered automatically if given some additional time or through reset operations. However, the unpredictable occurrence of soft errors might lead to severe system failures in critical designs such as medical devices, aircraft and high-performance supercomputers. Consequently, radiation-induced soft error hardened design techniques have become essential to guarantee systems' reliability.

In literature, several design techniques such as hardware redundancy methods (e.g. triple-modular-redundancy (TMR)), error-correcting code (ECC)-based memory, soft error aware physical designs, soft error tolerant designs, and error-detection-based methods have been proposed to guarantee the reliability of ICs. Since most of existing

works are based on hardware redundancy for error recovery or error tolerance, the corresponding soft error resilience usually comes at the price of increased area, power, and delay penalties, which becomes the challenge for next generation energy-efficient MNU-tolerant designs. In addition, soft errors occurring in a latch can lead to an upset at the latch output, which may propagate through the succeeding combinational logic and is captured by the next-stage storage element; however, this issue was often neglected in previous works. Therefore, it is desired to develop architecture-level reliable radiation-hardened latch designs.

#### **1.2** Contributions

In this dissertation, three soft error hardened latch designs are proposed for architecture level reliability and energy efficiency improvement.

Firstly, a low-cost soft error hardened latch (SHC) design using a novel Schmitttrigger-based C-element (STC) is proposed, which features small area overhead and low power consumption for single node upset (SNU) tolerance. Unlike state-of-theart soft error tolerant latches that are usually based on hardware redundancy or transistor up-sizing, the proposed latch is implemented through double-sampling and node-checking by using a novel Schmitt-trigger-based C-element for soft error tolerance improvement. The implementation results show that the total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened C<sup>2</sup>MOS latch, and up to 20.35% and 82.96% power reduction can be achieved when compared to the conventional unhardened C<sup>2</sup>MOS latch and the existing soft error tolerant HiPeR design, respectively.

The second proposed latch design is a power-efficient single node upset hardened latch with in-situ error detection capability (EDSL) for reliability improvements against SNUs. Because it is desired to develop architecture level reliable radiationhardened latch designs. Therefore, EDSL is proposed by incorporating SNU selfrecovery ability with in-situ error detection capability for reliability improvement against soft errors. Not only can the proposed EDSL recover from any incurred single event upset, it can provide in-situ error detection capability when the latch output is upset. The implementation results show that, when compared with state-of-the-art error-detection based designs and SNU resilient designs, the proposed EDSL latch can achieve up to 72.25% and 79.74% reduction of power-delay product (PDP) respectively, which clearly shows the effectiveness of the proposed EDSL design.

The third design is an output transition detector-based radiation-hardened latch (TDRHL) for power efficiency and reliability improvement of critical designs against both single- and multiple-node upsets. The proposed TDRHL design contains a baseline latch, an error recovery assistant logic (ERAL) and an output transition

detector (QTD), in which the error recovery assistant logic is optimized from the existing single-event-induced-double-node-upset-tolerant latch (SEID) for performance improvement and power reduction and the output transition detector is proposed to provide architecture level recovering capability. The evaluation results show that, the proposed TDRHL outperforms state-of-the-art double-node upsets (DNUs) tolerant designs with addition error detection capability, and up to 5.0X PDP improvement can be achieved. On the other hand, when compared with the existing detectionbased design, self-recovery capability of SNUs and DNUs is provided at the cost of only 4.4% more power consumption. As for power efficiency, the PDP improvements at the typical-typical (TT) corner of TDRHL over SEID, HRDNUT and SHST are 1.9X, 2.5X and 5.0X, respectively. It should be noted that TDRHL is the only latch that can recover from any SNUs and DNUs, and provide architecture level resiliency, therefore the PDP results clearly illustrate the power efficiency of TDRHL.

#### **1.3** Dissertation Organization

This dissertation summarizes my work in soft error hardened latch designs for reliability and energy efficiency improvements of next generation ICs when I was with Waseda University. Detailed description of each topic can be found in each chapter or in my previous scientific publications.

This dissertation is organized into six chapters. A brief description of each chapter is provide below.

- Chapter 2 reviews the research background for the reliability problem of IC designs. In particular, the mechanism of the random and transient soft errors is explained. And then the existing soft error design techniques are summarized.
- Chapter 3 proposes a low power soft error hardened latch with Schmitt-triggerbased C-Element (SHC) with the corresponding evaluation and comparison results.
- Chapter 4 proposes a power-efficient soft error hardened latch design with in-situ error detection capability (EDSL), in which the reliability is improved by incorporating error detection capability.
- Chapter 5 presents an output transition detector-based radiation-hardened latch (TDRHL) for both single- and multiple-node upsets.
- Chapter 6 draws the conclusions and presents several future research directions which can expand based on this dissertation.

### Chapter 2

# Soft Errors and Existing Hardened Design Techniques

In this chapter, the reliability problems in IC design are discussed, and then radiationinduced soft error mechanism is explained with the existing design methods.

This chapter is organized as follows. Section 2.1 discusses the reliability problems. Section 2.2 introduces the soft error mechanism. And then existing soft error hardened design techniques are overviewed.

#### 2.1 Reliability Issues

Reliability of integrated circuits for medical instrument, communication systems, financial systems, infrastructure systems and more are very significant factors to human life. The reliability issues can be categorized into physical problems and security threats.

Hardware Trojan (HT) is one kind of security threats, which is a malicious modification of an integrated circuit design for sensitive information leakage. Recently, ICs have been designed by third-party vendors for cost reduction, however, it makes it easier for attackers to insert HTs into designed ICs. HTs can launch serious attacks such as disabling or changing the functionality of ICs or leaking sensitive users' information[1] [2].

On the other hand, with CMOS technology shrinking to nanoscale and power supply voltage dropping greatly, the threat on ICs due to natural phenomena has become one of the critical issues of nowadays LSI designs. In general, reliability failures include systematic issues such as PVT variations or aging effects caused by NBTI, and random failures such as soft errors caused by collision of radiation particles on a circuit. PVT variations can lead to great performance degradation of ICs. Moreover, process-related variations will lead to low manufacturing yield, which is the ratio of the number of ICs with specified performance to the total number of manufactured ICs. For the improvement of yield, a circuit need to be designed with additional margins. Consequently, additional area overhead and power consumption will be introduced [3] [4] [5]. NBTI is one of the main factors of aging effects, in which the threshold voltage ( $V_{th}$ ) is increased by stress of voltage or temperature with time. Depending on the amount of  $\delta V_{th}$ , it might cause malfunction of the circuit. The effects of NBTI also have become obvious with process technology scaling down [6] [7]. But, the aging effects only affect the late lifetime of ICs.

Unlike NBTI that only affects the late lifetime of ICs, radiation-induced soft errors, one kind of reliability failures, can occur any time in the lifetime of ICs. A soft error is a temporary error when alpha particles or high-energy neutrons strike an integrated circuit, and is emerging as a serious problem to critical systems [8]. Several years ago, only the soft errors that occur and affect the memory systems gained the attentions; however, due to the increased soft error occurrence rate, soft errors that occur along the logic circuits become significant and cannot be ignored. As technology continues scaling down, the integration density of ICs continues increasing while the critical charge of a transistor has been decreased, which leads to an increased soft error rate [9].

The soft error rate can be estimated in terms of Failtures-In-Time (FIT) or Mean Time between Failures (MTBF), where one FIT unit is equivalent to one soft error occurrence per billion hours of operation (i.e.  $1 \text{ FIT} = \text{one error } /10^9 \text{ hours}$ ) while MTBF indicates the mean time between failures which is the inverse of the failure rate for an exponential distribution (i.e.  $MTBF = 10^9/FIT$ ). It should be noted here that the terms of a failure and a soft error are not equivalent. A failure might be caused by one or multiple soft errors while the occurence of soft error(s) might introduce no system failures. As illustrated in [10] and [11], the soft error rate per chip (SER/chip) increases with the technology scaling. As shown in Fig. 2.1 which was presented in [11, Fig. 1], the increase of SER reaches 100X when the technology node was changed from 180nm to 16nm, and this became more significant for data centers. Therefore, soft errors are becoming one of the most critical concerns in state-of-the-art LSI designs, especially due to the reduction of node capacitance and the lower operating voltage.

On the other hand, unlike traditional hard-errors caused by permanent physical damage which cannot be recovered in field, soft errors are generally caused by radiations, which would lead to transient changes of the internal node states, therefore they can be viewed as temporary errors. Because it has temporary property, the occurred soft error can be restored if given enough time. However, due to the unpredictable occurrence of soft errors, it should be serious problems when occurring and affecting the critical part of a circuit. Therefore it is desirable to develop soft error tolerant design methods. For the above reasons, soft error tolerant design techniques have gained great research interest recently. In literature, several soft error tolerant techniques ranging from transistor level to system level have been proposed.



Figure 2.1: The increase of SER with technology scaling as presented in [11, Fig. 1].

#### 2.2 Radiation-induced Soft Errors

In general, there are two kinds of errors that might occur in a circuit, which are caused by different mechanisms. The first one is called hard-error, which is caused by physical damage such as manufacturing defects. Therefore these errors cannot be (fully) recovered in field, but might be detected through manufacturing test. While, for the other kind of errors, soft error, it is usually caused by cosmic radiations or voltage/current fluctuations that leads to transient changes on internal node states. Soft errors are generally caused in-field, and due to the temporary property, can be recovered automatically if given some time. It has been shown that 80% of system failures are due to such kind of transient errors[12].

The mechanism of soft error occurrence is showed in Fig. 2.2. When neutrons



Figure 2.2: Soft error mechanism.

collide with the circuit, the nuclear reaction occurs with silicon atoms, and electronhole pairs will be generated and then will be swept across the diffusion junction if an electric field exists across the junction. As a result, holes will be collected to the PMOS diffusion while electrons are collected to the NMOS diffusion. If the collected charge is over the critical charge, the data of a PMOS transistor will be temporarily changed from low to high. And in an NMOS transistor, the data will be temporarily changed from high to low, respectively [13, 14]. This is called a soft error. In other words, a soft error changes the data of either a PMOS or an NMOS only in one direction.

In the LSI design phase, a soft error can be modeled as a current or a voltage source as shown in Fig. 2.3, which was cited from [15]. In the figure, the current pulse model for a technology node with various charge injection levels (in pC) is shown. And then by using such a kind of soft error model, spice simulation can be conducted with specified parameters such as rise delay, fall delay, or the peak current.

Several years ago, only the soft errors that occur and affect the memory systems gained the attentions; however, due to the increased soft error occurrence rate, the soft errors that occur along the logic circuits also become a critical reliability problem. As technology continues scaling down, the critical charge of a transistor and the supply voltage have been drastically decreased, which leads to an increased soft error rate even in logic circuits[16]. In other words, the data in the storage elements such as latches and flip-flops in the circuit becomes to flip easily due to soft errors. Therefore, SER is increased with the scaling down of technology size. Soft errors becomes critical when they occur in the devices related to human life such as aircraft or medical fields in which radiation influences greatly. Therefore, soft error tolerance is becoming a critical design challenge.

Soft errors can be classified into three types, single event upset (SEU), single event transient (SET) and multiple cell upset (MCU). SEU occurs in a latch or a flip-flop, which leads to a flipped storage value. While, SET occurs in logic circuits and generates a pulse at the output of the logic. Finally, MCU can reverse the data of multiple storage elements.

In nanoscale technologies, a radiation particle affects multiple nodes all at once

[17, 18, 19, 20]. DNUs is caused by charge sharing mainly [21] When a soft error affects a transistor, normally there is only one data that will be flipped. However, as the distance between nodes gets shorter, adjacent nodes might be flipped correspondingly, and this phenomenon is called multiple-node upsets [22].

In addition, as shown in Fig. 2.4, soft errors occurring in a latch can lead to an upset at the latch output, which may propagate through the succeeding combinational logic and is captured by the next-stage storage element; however, this issue was often neglected in previous works.



Figure 2.3: Generated pseudo soft error pulses. Various current pulses generated as a result of an  $\alpha$ -particle strike at time t=0, which was presented in [15, Fig.1].



Figure 2.4: Error propagation. A single-node upset at a latch output might cause an error in the following latch.

#### 2.3 Soft Error Hardened Design Techniques

Existing soft error tolerant latch design methods can be classified as detection-based methods, hardware redundant recovery methods, and node reuse-based methods. Most of the existing soft error tolerant methods are based on hardware redundancy in which errors can be detected and then corrected by comparing the results of redundant copies. DICE family [23, 24, 25, 26, 27] and TMR [28, 29, 30, 31] are the representative works. Node reuse-based methods refer to the methods by using the existing nodes in a latch design for soft error detection and recovery. SEH family [32, 33, 34] is the representative works. Detection-based methods refer to the idea of detecting and then generating warning signals when any soft error occurs. These methods are generally implemented by monitoring the saved value in the specified latch. If any transition is detected during the storage phase, it could be asserted that a soft error occurs, and then a warning signal is generated for architecture level error recovery. This is because soft errors are temporary errors, which can be recovered by reprocessing the corresponding data again. SED [35] and sPGTD [36] were proposed recently.

Table 2.1 provides an overview of existing soft error tolerant latch design techniques. From the table, it can be observed that node reuse-based recovery methods offers small area overhead, low power consumption and small performance overhead over redundancy-based recovery methods.

In the following, several related works that are based on the above mentioned methods will be illustrated according to the corresponding reliability level such as SNU hardened, DNU hardened and error detection.

#### 2.3.1 SNUs hardened methods

**Transient fault hardened (TFH) latch**: The TFH latch shown in Fig. 2.5 was proposed in [38]. The structure of TFH latch is simple, which contains of one C-element (P1, P2, N1, N2), one transmission gate and two inverters. In normal operations (i.e. no soft error occurs), it works like a normal latch. When ND1 (or ND2) is affected by a soft error in the holding mode (i.e. CK is low), due to the C-element,

|                      | Soft-error     | Hardware redundancy  | Node reuse     |
|----------------------|----------------|----------------------|----------------|
|                      | detection only | based recovery       | based recovery |
| Representative works | Razor II       | DICE family          | SEH family     |
|                      | [37]           | [23, 24, 25, 26, 27] | [32, 33, 34]   |
|                      |                | TMR [28, 29, 30, 31] |                |
| Soft error           | Detection-only | Recovery             | Recovery       |
| detection/recovery   |                |                      |                |
| Area overhead        | Large          | Large                | Small          |
| Performance          | Nomal : Small  | Small                | Small          |
| overhead             | Error : Large  |                      |                |
| Power consumption    | Large          | Large                | Small          |

Table 2.1: Area, delay and tolerant ability

the output Q will maintain the previous saved value without any changes. Although this latch is very simple and can recover from a soft error on ND1 and ND2 with small area overhead, the main problem of TFH latch is that it cannot recover, until getting the new data in the next clock, when ND3 (Q) is upset by a soft error.

**Feedback redundant SEU-tolerant (FERST) latch**: The FEST latch [39] is shown in Fig. 2.6, which contains three C-elements, four transmission gates and two inverters. Here, the C-elements are used to form a redundant feedback path and as a filtering unit to mask the possible soft errors. When one of the nodes (ND1-ND4) is flipped by a soft error, it doesn't propagate the error to the other nodes including the output Q with the C-element. Moreover it can recover from flipping referring to the data of the other nodes. In the case in which a soft error affects the output Q, it can pull-up or pull-down to the correct data immediately. Therefore, the soft error tolerability of FERST can be called fully tolerance when compared to TFH, however due to the fact that three C-elements are used in FERST, larger area overhead will be introduced with the corresponding power consumption.

**HiPeR latch**: Figure 2.7 shows the HiPeR latch [40] that is based on the conventional unhardened  $C^2MOS$  latch and C-element. In a HiPeR latch, duplication of the internal nodes can help to filter soft errors with the help of the inserted C-elements;



Figure 2.5: TFH latch[38].

moreover, as for the error affecting the output node, a special feedback that can activate two independent paths during the holding mode is inserted for error tolerance. In the case when a soft error flips the data of Q, it can be recovered through the C-element formed by  $M_{P4}, M_{P5}, M_{N4}$  and  $M_{N5}$  immediately. When any other node is affected by a soft error, the correct data can be stored by referring to the other correct nodes. HiPeR latch can provide a very low propagation delay but it incurs large area overhead and high power consumption; therefore it is only available to be applied to the storage elements along the critical paths.

**SEH latch**: As mentioned above, a soft error occurs only in one direction either in a PMOS or in an NMOS. SEH latch [32] uses this soft error property to achieve high error tolerance. The SEH latch is shown in Fig. 2.8 [32]. This latch is composed of three nodes, PDH and NDH, and DH. In order to use the property of soft error, PDH is only related to PMOS transistors while NDH is only related to NMOS transistors. In other words, on PDH, a soft error changes the value of PDH only from low to high.



Figure 2.6: FERST latch[39].

And for NDH, it changes the one only from high to low, respectively. In this way, by limiting the occurrence of soft errors, high soft error tolerance can be maintained. When CK is high and D is low, P4, P5, P6, N4, and N6 turn on, PDH and NDH



Figure 2.7: HiPeR latch[40].

are low and DH is high. At this time, Q stores low, according to the value of DH passed through the inverter. And then P1 turn on and keeps high as the same as DH. Furthermore, P2 and P3 turn on by referring to the value of Q and NDH, PDH



Figure 2.8: SEH latch[13].

keeps at high. When CK becomes low, the latch of first part stops. The latter part keeps holding the value. When PDH is affected by a soft error, its value changes from 0 to 1. In this case, P1 turns off; but, due to P2 and P3 keeps on, PDH can be restored to low. In particular, when the PDH is affected by a soft error, no error propagates to the output Q. Even when NDH is affected by soft error, it can recover in a similar way. When DH is affected by a soft error and is flipped from high to low because PDH keeps low which is a correct value, P1 is turned on and can be restored to high. At this time, the error temporarily propagates to the output Q and the value is inverted, however Q can be recovered as soon as DH recovers. When output Q is affected by a soft error, P3 turns off and N3 turns on. However, because DH keeps the correct value, it can recover from a soft error by referring to the correct value through the inverter. SEH latch can achieve high soft error tolerance,

and when compared with existing hardware redundancy-based recovery methods, its area overhead is comparatively small which is because it uses the internal nodes for recovery instead of using duplicated cells. However, when compared with the base  $C^2MOS$  latch, the power consumption and the performance overhead are still higher. **PDFSR latch**: Low-power SEU immune (PDFSR) latch was proposed in [41]. As shown in Fig. 2.9, specified efforts are spent on the sensitive nodes of a conventional latch by using cascaded transistors for SEU tolerance. This latch is consisted of three C-elements, three transmission gates, one clock gating-inverter and one Schmitttrigger inverter. When a soft error flips the data of N2 or the output Q, it can be recovered from the error state through the C-elements immediately. When a upset occurs on any nodes except N2 and Q, the error won't propagate to the other nodes due to the C-element and it can be recovered by other nodes. It has fully soft error tolerance. However C-elements should be passed through for inputs to propagate to the output Q, therefore, it will introduce not only large delay but also high power consumption.

Soft error hardened with Schmitt-trigger(SHST) latch: As proposed in [42], SHST also has fully SNU tolerance. It has Schmitt-trigger inverter(ST) and four PMOS-NMOS pairs( $[M_{P1}, M_{N1}]$ - $[M_{P4}, M_{N4}]$ ) like DICE latch and it has the different data between adjacent nodes. The input data from D is split into Int1 connected Celement directly and Int3 connected to ST. Due to the double-sampling mechanism, this design can mask the input SET pulse. The input data from D passes transmission gates TG1 and TG2 to the C-element, and it propagates to both of latch-part and inter-latching-part in normal mode. The data from inter-latching-part is stored to C-element In hold mode, when soft error flips the data of any node in inter-latchingpart, it can recover referring to the data of adjacent nodes. In the situation when soft error affects the node before C-element , it won't propagate the fault data to the output Q. When Q is stored flipped data, it get back correct data referring to data of the other nodes. With the use of inter-latching structure and double-samping technique for SET filtering, SHST latch can provide self-recovery ability of SNUs at the cost of additional area overhead.







Figure 2.10: SHST latch[42].

#### 2.3.2 DNUs hardened methods

In this section, several existing DNU hardened latches such as SEID latch and HRD-NUT latch will be illustrated.

Single-event-induced-double-node-upset-tolerant (SEID) latch: Fig. 2.11 shows the SEID latch[43]. SEID latch was optimized for power reduction with both of SNU and DNUs tolerance. SEID latch has 2 inputs, D and CK(CKB), and 7 main nodes, LP1, LN1, LP2, LN2, LQ, output Q. Almost all the transistors in this latch will be used for self-error-recovery. In particular, (P2, P4) are for the recovery of LP1, (N2, N4) for LN1, (P5, P6 and N7) for LP2, (N5, N6 and P7) for LN2, (P8, N9) for Q, and (P9, N8) are used for the recovery of LQ.

The operations of SEID latch is shown below. When CK=high, the input data from D propagates to LP1 and LN1. Depending on the value of the input data (LP1


Figure 2.11: SEID latch[43].

and LN1), LP2 and LN2 will be decided. The inverted D is stored to LP2 and LN2. And then, in the normal mode, D=LP1=LN1=LQ=Q and inverted-D=LP2=LN2.

When CK=0, assuming that LP1=LN1=LQ=Q=low and LP2=LN2=high, the corresponding operations are shown as following. First, the SNU tolerance of SEID latch is explained.

- Case I: When a soft error affects LP1 which is driven by only PMOS transistors, it should change from low to high, P5 turns off and N4 turns on. At this situation, the other nodes will not be affected by the soft error because P5 turns off. LP1 would be discharged to the correct data through P2 and P4. It would be recovered in a similar way when LN1 is flipped from high to low.
- Case II : When a soft error affects LN2, LN2 is flipped from high to low. Because the error does not affect the other nodes, it can be recovered from the error immediately through N5 until next clock-edge. It would work in a similar way when LP2 is flipped from low to high.
- Case III : In case that a transient error occurs on LQ, it would only change the state of P3. Because the error would not propagate to any other nodes, the error could be recovered immediately through N8.
- Case IV : If a soft error affects the output Q, Q is flipped from high to low. In this case, the occurred error does not propagate to the other nodes. By referring to the data of LQ, Q can recover from the error immediately.

From the above description, it can be observed that SEID latch has fully SNU tolerance. And in the following, the corresponding DNU tolerance would be explained.

Here, let us assume CK=LP1=LN1=LQ=Q=low and LP2=LN2=high. In this situation, the total possible error occurrence would be LP1, LN2, LQ and Q. Therefore, the corresponding 4 DNU occurrences will be discussed below.

- Case I: When LP1 and Q are flipped from low to high, Q can be recovered immediately because N8 turns on. In response to this, LP1 will be pulled down through P2-P4, and then it can be recovered.
- Case II: In the situation that LP1 and LQ are affected by a soft error, LQ can be restored to low by referring to LN2, and then LP1 will be recovered through P2 and P4.
- Case III: In the situation when Q and LQ are flipped from low to high, both nodes can be recovered through LP2 and LN2.

• Case IV: When LP1 and LN2 are affected by a soft error, the error don't affect any other nodes. Thus, LP1 can be recovered through P2 and P4, while LN2 can be restored through P7 and N6.

It should be mentioned that, in the situation when CK=LP1=LN1=LQ=Q=high and LP2=LN2=low, SEID latch can work in a similar way as above. Therefore, it has fully DNU tolerance.

**Highly robust double node upset tolerant (HRDNUT) latch**: The HRDNUT latch as shown in Fig. 2.12 was proposed in [44] which has fully SNU and DNU tolerability. HRDNUT consists of three 3-input C-elements(C1, C2, C7), four 2-input C-elements (C3-C6) and three transmission gates (TG1-TG3), which was designed by ensuring that no C-element would drive itself so that it can be applied in clock-gating designs.

During normal operations (i.e. no error), the input data (D) passes through the three transmission gates (TG1-TG3) to n1, n2 and Q when CK is high. Since the input D directly drives the output Q, the d-to-q delay can be minimized, however, the driving ability should be considered when the latch output drives a large load. In the case of SNU occurring at the holding phase (i.e. CK is low), HRDNUT latch takes advantages of the built-in C-elements for error recovery. If Q is upset and changes from high to low, since C7 still holds the correct state, Q can be recovered by the 3-input C-element C7. In this case, the occurred soft error would not affect other nodes because the error propagation is blocked by the built-in C-elements (C5 and C6).

If the internal node, n1, is upset, the C-elements C2, C5 and C7 will block the error propagation so that the error would not affect any other nodes. Moreover, because C1 still operates correctly, n1 can be restored to its error-free state. Similarly, other nodes (n2-n6) can also recover from errors through the adjacent C-elements.

The operations of HRDNUT latch when DNUs occurs can be categorize into 8 cases according to the possible DNUs combination as shown in the following.

• Case I : When DNUs affects n2 and Q, the error on n2 will propagates to C4. Since C1 and C3 can block propagating the error and n1, n3, n5 and n6 still hold the correct data, Q and n2 can be recovered from upsets through C2 and C7.

- Case II : When n1 and n2 are upset by DNUs, the error on n1 will propagate to C5 and C7. However, the error will not affect any other nodes because the built-in C-elements can block the error propagation. In this situation, these nodes can be restored to the correct states through adjacent C-elements.
- Case III: When DNUs affect n3 and n4, the error on n3 and n4 will propagate to (C6,C7) and (C1, C3), respectively. Although error propagation occurs in this case, the built-in C-elements will not propagate any transitions due to the error to the driving nodes. Therefore, since the other nodes hold the correct states, the C-elements can operate correctly and then n3 and n4 can be recovered from the error to the correct state.
- Case IV : In the case of DNUs affect n1 and n5, the error on n1 propagates to C7 and so does the error on n5. In this case, Q isn't affected by the soft error because C7 stops the error propagation to the output. N1 can be restored to the error-free state with the help of C1 and n5 can recover from the error through C5 immediately.
- Case V : When DNUs affect n4 and Q, the error will propagate to C1, and C3-C6. Because the inputs of C7 are not affected, Q can be recovered through C7 immediately. Moreover, n4 can be restored to the previous error-free state through C4.
- Case VI: When n1 and n6 are affected by DNUs, the error on n6 will propagate to C1 and C7, while the error on n1 will also propagate to C7. On the other hand, because n3 still holds the correct state, the C-element C7 blocks the error propagation and the output still holds the error-free state.
- Case VII : When DNUs affect n1 and n3, the error will propagate to C2, C5, C6 and C7. Since C7 blocks the error propagation, Q will not be upset, while n1 and n3 can be recovered from the error state through the connected C-element.

• Case VIII : When DNUs affect n5 and Q, the error will propagate to C2-C7. Although the built-in C-elements are affected by the DNUs, the error will not affect any other nodes due to the built-in C-elements. By referring to n1 and n2, n5 can be recovered back to the previous error-free state. In turn, C7 can operate correctly, and then the output Q will be restored back from the upset.

In the combinations (n1, out), (n3, out) and (n5, n6), the errors do not flip the data on the input of any C-elements, hence the previous correct data will be recovered. From above, it can be observed that HRDNUT can provide fully SNU and DNUs tolerance with the use of built-in C-elements, however, it will result in significant power and area overhead due to the large number of used C-elements.

#### 2.3.3 Error detection methods

Detection-based methods refer to the idea of detecting and then generating warning signals when any soft error occurs. These methods are generally implemented by monitoring the saved value in the specified latch. If any transition is detected during the storage phase, it could be asserted that a soft error occurs, and then a warning signal is generated for architecture level error recovery. This is because soft errors are temporary errors, which can be recovered by reprocessing the corresponding data again. In the following, the existing SED latch and the sPGTD latch will be introduced.

**SED larch**: The schematic of SEU tolerant latch based on Error Detection(SED) is showed in Fig. 2.13. It consists of the base latch and the error detector, where the error detector is used to detect the data flipping in the latch during the holding mode [35].

In the normal mode (i.e. no error), SED latch operates as a conventional latch. When CK=high, the input (D) propagates to the output (Q) through two transmission gates and two inverters. When CK changes from high to low, the saved data can be kept in the latch.

In the situation when CK is low and the node A is changed from low to high due to a soft error, E0 becomes low. Because  $M_{P1}$  turns off, the error can't propagate to the output. The SED latch can stop propagating the soft error occurring at A or B to the output, however, it cannot help to recover from the occurred error to the correct state when the output is upset due to a soft error.

**sPGTD circuit**: Another error-detection-based design method, sPGTD, which consists of a delay-based pulse generator and a dynamic buffer with weak keepers as shown in Fig. 2.14, was proposed in [36]. Although sPGTD was originally proposed as a PVT variation tolerant design technique by providing in-situ error detection ability, the idea can be extended for soft error detection while without the error recovery ability. If the sPGTD design is combined with a typical a C<sup>2</sup>MOS latch, soft errors that occur at the output (Q) could be detected. When the transition of Q occurs after the falling edge and before the rising edge of the clock signal (CK), the output of the dynamic gates is discharged, and the ERROR signal will be high to flag the error at Q. By doing this, the soft error at Q might be detected. It should be mentioned that the sPGTD design can't provide self-recovery capability, therefore it will lead to large error recovery cost when soft errors occur.



Figure 2.12: HRDNUT latch[44].



Figure 2.13: SED circuit[35].



Figure 2.14: sPGTD circuit[36].

## 2.4 Chapter Conclusion

This chapter presented an introduction to the mechanism of radiation-induced soft errors and overviewed the existing soft error hardened design techniques.

As shown in the chapter, with CMOS technology shrinking and power supply voltage dropping greatly, the reliability issue, such as PVT variations, aging effects, and random transient soft errors, have become much more critical than ever before. Among these reliability failures, due to the transient property, radiation-induced soft errors are becoming one of the most critical concerns in state-of-the-art IC designs.

To ensure the correct and reliable operation of next generation digital circuits, the soft error hardened design techniques must be taken into account, and then various existing latch designs were overviewed. The advantage and disadvantage of each design were also discussed.

## Chapter 3

# SHC: SNU Hardened Latch Design

To deal with the reliability issue caused by SNUs, a low power soft error hardened latch design using a novel Schmitt-trigger-based C-element is proposed for reliable low power applications in this chapter. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy with large area overhead and high power consumption, the proposed SHC latch is implemented through double-sampling and node-checking using a novel Schmitt-trigger-based C-element, which can help to reduce the area overhead and the corresponding power consumption as well.

This chapter is organized as follows. Section 3.1 present the proposed SHC latch design. Section 3.2 shows the evaluation and simulation results. And then the comparison results with the existing SNU hardened designs are proposed in Section 3.3. Finally, Section 3.4 concludes this chapter.

#### 3.1 SHC Latch Design

In this section, a low cost soft error hardened latch design is proposed for high speed circuit, and the corresponding structure of the proposed SHC latch is shown in Fig. 3.2. The structure of SHC latch is quite simple and symmetric, which contains only 14 transistors including the inverter to generate the local CKB signal. Unlike stateof-the-art soft error tolerant latches that are usually based on hardware redundancy, the proposed latch is implemented through double-sampling and node-checking by using a novel Schmitt-trigger-based C-element.

In previous chapters, it can be observed that most of existing soft error hardened latch designs (e.g. TFH, FERST, HiPeR, etc.) are implemented on the basis of Celement. The basic structure of C-element is shown in Fig. 3.1, and the corresponding truth table is shown in Table. 3.1. The C-element was introduced by David E. Muller in 1959, and is often referred as the "Muller C-element". Nowadays, C-element has widely been used in asynchronous circuits as a control unit. The interesting function of C-element, which can keep the output to maintain the previous state if the two inputs are different, can be used for data comparing (i.e. error filtering), therefore C-element seems to be suitable for filtering transient pulses caused by soft errors. Hence, designing low cost C-element-based soft error hardened latches seems to be quite worthwhile.

The input of SHC latch is double-sampled for soft error tolerance improvement. The idea of double-sampling is to sample the latch input twice through different transistors and/or at different times. It is not new and has been used in many existing works ( (e.g., [28], [39], [40], [45], to name a few). One of the reasons for double-sampling is to protect storage cells from soft errors occurring in combinational circuits. As we know, if the soft errors occurring in combinational circuits, which is usually referred as single event transients (SET), are loaded into the storage cell, they can result in data corruption in the corresponding storage cell. To prevent this kind of soft errors, we can sample the latch input at different time. For example, one with small delay ( $\tau$ ) and the other is as it is. By doing this, for any transient pulse caused by soft errors occurring in the combinational circuit and arriving at the latch input, if

| А | В | Q       |
|---|---|---------|
| 0 | 0 | 1       |
| 0 | 1 | keeping |
| 1 | 1 | 0       |
| 1 | 0 | keeping |

Table 3.1: Truth table of C-element

the duration is shorter than  $\tau$ , double-sampling can help to filter the pulse and then protect the storage cell from the soft errors occurring in the combinational circuit. Because here only the soft errors occurring in the storage cell are considered, no delay is inserted in Fig. 3.2. It should be mentioned here that the proposed SHC latch can be further improved by inserting small delay elements when considering soft errors in combinational circuits.

In the proposed SHC latch, the double-sampled inputs (ND1 and ND2) are sent to the Schmitt-trigger-based C-element for internal node checking and error filtering.



Figure 3.1: General C-element

The proposed Schmitt-trigger-based C-element consists of 8 transistors, in which a special feedback path from the output (Q) is inserted and integrated into the conventional C-element as shown in Fig. 3.1. The proposed Schmitt-trigger-based C-element can be viewed as two C-elements. One is formed with P1, P2, N1 and N2, which can be used for input filtering. And the other is formed with P3, P2, N1, and N3, which is used for input-output node checking. Unlike the traditional C-element as shown in Fig. 3.1, the structure is changed so that only 6 transistors (i.e. P1, P2, P3, N1, N2, and N3) can be used to implement two C-elements (P1, P2, N1, N2) and (P3, P2, N3, N1). By doing this, the required weak keeper in the tradition C-element can also be removed. As a result, the number of required transistors for the implementation of two C-elements can be reduced and then help to reduce the corresponding power consumption.

Now let us explain how the SHC latch works in the normal operation (i.e. no soft error occurs). When CK is high (i.e. in transparent mode), the input D is loaded to ND1 and ND2 through the transmission gates, and then due to the double-sampling, only when ND1 and ND2 have the same logic value, the output of the C-element will update and the internal node X and the output Q will get the new data. It should be noted here that if the new input is different from the previous latch output (Q), the feedback paths from the output Q through both (P3, P2) and (N3, N1) will be OFF. Simply assuming that the previous latch output Q=1 and the new input D=0when CK is high, ND1 and ND2 will be loaded as 0, so P1 and P2 will be on and the internal node X is changed to be 1 and the output Q will get the new data 0. In the same time, the previous output Q is 1 that makes N3 to be on, however, due to ND1 is 0 that makes N1 to be off, the feedback path from Q is cut off. By doing this, the new data can be loaded safely providing that the input data is stable. Therefore, in normal operations, the proposed SHC works like a normal latch. As a side benefit, the clock controlled transmission gates on the feedback path such as those in the existing FERST latch as shown in Fig. 2.6 are not required in the proposed SHC latch.

Next, let us explain how the proposed SHC latch works when a soft error occurs and affects the internal node of SHC latch in the holding mode. The critical nodes that could be affected by any soft errors in SHC latch can be simplified to be the internal nodes ND1 and ND2, and the output Q. Because the effects that the drains of (P1/P3, P2, N1, N2/N3) are upset by any soft errors can be simplified by considering the effect occurs at ND1 or ND2 or Q, only ND1, ND2 and Q will be considered here. Therefore, in the following we will illustrate how the SHC latch works when a soft error occurs in the holding mode and then affects the internal nodes (ND1 or ND2) or the output Q.

When CK is low, because the two transmission gates are OFF, the internal nodes ND1 and ND2 should maintain the previous loaded data. If ND1(or ND2) is affected by a soft error which causes ND1 (or ND2) to be upset, because ND1 and ND2 are different now, P1 and P2 (or N1 and N2) cannot both be ON, and then the output Q will be kept unchanged with the previous saved value due to the function of C-element. In addition, the feedback path from Q (through P3 or N3) and the other node ND2 (or ND1) can help the output to be stable. In this case, although ND1 (ND2) cannot recover from the soft error directly, the output of SHC latch doesn't change with the occurred soft error so as to guarantee the correct operation of the



Figure 3.2: The proposed SHC latch.

circuit.

In the case that Q is affected by a soft error to be upset, due to the C-element formed by (P3, P2) and (N3, N1), the feedback path (through P3 or N3) is OFF, the error value will not be saved. On the same time, ND1 and ND2 hold the correct value, which can help to recover the output Q through the C-element. As a result, when a possible soft error occurs at Q, although the output Q will be temporarily upset, the proposed SHC can recover from the occurred error to the correct state providing that ND1 and ND2 hold the correct state.

It should be noted that due to the dynamic operation of proposed SHC latch, ND1 and ND2 are floating when CK signal is low, and these nodes might be weak to soft errors if the clock frequency is not very high. If the clock frequency is high, the dynamic nodes will be charged and discharged frequently, the operations of proposed SHC latch can be guaranteed. While if the clock frequency is low, the floating nodes such as ND1 and ND2 might lose the save data due to the leakage current or the occurred soft errors, which would be the problems of dynamic latches. Moreover, clock gating does not apply to the proposed SHC latch and standby power might increase when compared with static latches. The purpose of this chapter, however, is to develop a low power soft error tolerant latch for high speed circuit. Therefore, the proposed SHC latch can meet the design requirement. When compared with static latches, the total number of transistors can be reduced which leads to low dynamic power consumption. On the other hand, the limitation and the cost of proposed dynamic SHC latch are that 1) the leakage power is greater than static latches; and 2) the floating nodes might be weak to soft errors if the clock frequency is not very high. These issues require further improvements.

#### **3.2** Evaluation of SNU Tolerance

The proposed SHC latch was simulated in ROHM 180nm CMOS process technology with Vdd=1.8V and the clock frequency to be 125MHz. The existing C-elementbased latch designs are also implemented. As for a clear comparison, the conventional unhardened C<sup>2</sup>MOS latch shown in Fig. 3.3 that has no specific soft error tolerance is also implemented. In this work, PMOS and NMOS transistors are implemented by using pcell design, and in this work, the width (W) and the length (L) of all the transistors in SHC latch and the related works are set to be 420nm and 180nm, respectively. As for the clock signal, the inverted clock (CKB) is locally generated by using a minimum-sized symmetric inverter.

First, transistor level simulations are performed with the supply voltage of 1.8V



Figure 3.3: Conventional unhardened  $C^2MOS$  latch.

and the clock period of 8ns. The simulation waveform of the proposed SHC latch with normal operations is shown in Fig. 3.4. From the figure, it can be observed that the proposed SHC latch works correctly as a normal latch.

Next, simulation results and comparisons on soft error tolerance will be presented. As for soft error simulation, critical charge  $(Q_{crit})$  is defined as the minimum charge that must be deposited by a particle strike to cause a circuit malfunction [46], which can be calculated by injecting current pulses into the sensitive nodes of a circuit in state-of-the-art transistor level simulators. There are several current models proposed in the literature that can be used to mitigate the effect of occurring soft errors so as to characterize  $Q_{crit}$  through spice simulation, among which double exponential model (DEM) is the most commonly used model [46]. In DEM, a current pulse with rapid rise time and slow fall time can is used to mitigate a soft error, and the corresponding equation is shown below:



Figure 3.4: Simulation waveform of SHC latch in normal operations.

$$I(t) = \frac{Q}{\tau_1 - \tau_2} \cdot \left[ exp\left(-\frac{t}{\tau_1}\right) - exp\left(-\frac{t}{\tau_2}\right) \right]$$
(3.1)

where Q,  $\tau_1$ ,  $\tau_2$  are the peak current, rise time and fall time, respectively. According to the results shown in [46], the rise time and the fall time constants are set to be 90ps and 200ps, respectively, in this work; while the peak current is adjusted for each node to identify the minimal one that can upset the value of corresponding node. By doing this, the critical charge can be calculated by doing the integral of the current pulse. It should be mentioned here that the integral region for  $Q_{crit}$  calculation is defined as from the start of the pulse until the pulse decreases to 80% of its peak as shown below according to [46].

$$Q_{crit} = \int_0^t I(t) \cdot dt \tag{3.2}$$

An example of the used current pulse for soft error mitigation is shown in Fig. 3.5, in which the peak current is adjusted for each node to identify the minimal one that can upset the value of corresponding node while the rise time and the falling time are maintained as constants.

The simulation waveforms for the proposed SHC latch when a soft error occurs and affects only one of the internal or the output nodes (ND1, ND2 or Q) respectively in Fig. 3.6 and Fig. 3.7 by using the current pulse shown in Fig. 3.5 with various peak current for each node.

The waveform in Fig. 3.6 shows that, when soft errors occur and affect ND1 and ND2 during the holding mode, although the corresponding affected node cannot be recovered until the next clock, the output of the proposed SHC latch is always correct and glitch-free, which clearly shows that SHC latch can filter the soft errors that affect the internal nodes. Moreover from Fig. 3.7, it can be observed that although the output Q will be temporarily affected by soft errors, SHC latch has the ability to recover from it and return to the correct state immediately. As mentioned above, if the output Q is temporarily affected by a soft error, a glitch on the output Q will be generated and then propagate through the following combinational logic. Depending on the occurring time of soft errors and the duration of this glitch, it might be captured by the latch or flip-flop in the next stage, and then an error will be saved into the circuit and it has the possibility to cause a failure of the circuit. It should be noted that, for soft error occurring at the output Q, the proposed SHC latch can not eliminate the generated glitch, which is also the common problem in existing works and requires more research efforts.





The current source used for soft error mitigation, in which the rise time and falling time are constants while the peak current is adjusted for each internal node for  $Q_{crit}$  calculation.



Figure 3.6: SNU at ND1 and ND2. Simulation waveform of SHC latch when soft errors occur at ND1 and ND2, in which the current pulse has fixed rise time and falling time as 90ps and 200ps, respectively while the current is adjusted.



Figure 3.7: SNU at Q. Simulation waveform of SHC latch when soft errors occur at Q, in which the current pulse has fixed rise time and falling time as 90ps and 200ps, respectively while the current is adjusted.

|            | Number      | CK-Q   | Propagation               | Propagation               | $T_{setup}$ | $T_{hold}$ | $Q_{crit}$ |
|------------|-------------|--------|---------------------------|---------------------------|-------------|------------|------------|
|            | of          | delay  | delay                     | delay                     |             |            |            |
|            | transistors | (ps)   | $(T_{pL2H})(\mathrm{ps})$ | $(T_{pH2L})(\mathrm{ps})$ | (ps)        | (ps)       | (fC)       |
| $C^2MOS$   | 12          | 169.5  | 102.78                    | 182.87                    | 94.67       | 166.93     | -          |
| TFH [38]   | 12          | 117.34 | 103.86                    | 28.97                     | 46.15       | 61.43      | -          |
| FERST [39] | 26          | 184.54 | 145.83                    | 169.54                    | 55.85       | 97.37      | 0.43       |
| HiPeR [40] | 20          | 119.77 | 108.08                    | 22.64                     | 50.41       | 97.37      | 0.54       |
| SHC        | 14          | 120.23 | 100.37                    | 115.25                    | 24.10       | 28.65      | 0.70       |

Table 3.2: Comparison results with existing works.

#### 3.3 Comparison Results

Next, in order to make a comparison with existing C-element-based soft error hardened latch designs, C<sup>2</sup>MOS, TFH, FERST, and HiPeR latches are also implemented in the same way as the proposed SHC latch for a fair comparison, in which PMOS and NMOS transistors are implemented by using pcell design, and the width (W) and the length (L) of all the transistors in SHC latch and the related works are set to be 420nm and 180nm, respectively. As for the clock signal, the inverted clock (CKB) is locally generated by using a minimum-sized symmetric inverter. Table 3.2 shows the implementation results in terms of number of transistors, delay and the corresponding critical charge power consumption. As for area overhead, we can observe that when compared to the conventional unhardened C<sup>2</sup>MOS latch, the total number of transistors of the proposed SHC latch is only increased by 2 even including the inverter for local CKB generation, therefore the introduced area overhead in SHC is very small.

For delay measurement, although with the same sized transistors, the proposed SHC latch can get a balance between propagation delay and CK-Q delay. From the table, it can be observed that the proposed SHC latch has the comparable delay to existing works. The corresponding waveform during normal operations are shown in Fig. 3.8 and Fig. 3.9 for comparisons including the existing TFH, FERST and HiPeR designs and the proposed SHC latch. The proposed SHC is faster than FERST for

both of them, while slower than TFH and HiPeR for  $T_{pH2L}$  in transparent mode. In addition, setup time and hold time are also shown in Table 3.2, which can be observed that the proposed SHC latch has shorter setup time and hold time than existing works, which makes it possible for the proposed SHC latch to be applied in high speed circuit. It should be noted that, in TFH and HiPeR latches, the output Q is directly driven by the input through the transmission gate without buffering, so that they have limited driving ability for large fan-out. The critical charge ( $Q_{crit}$ ) of the proposed SHC is about 1.63X and 1.3X of those in HIPER latch and FERST latch, respectively. It should be mentioned that, in Table 3.2, critical charge is only measured on the designs that can be able to recover from errors occurring at Q.

To investigate the recovery operations, the required recovery time of SHC, FERST and HiPeR when a soft error occurs and affects the output Q is shown in Fig. 3.10



Figure 3.8: Propagation delay  $(T_{pL2H})$  comparisons in normal operations.

and Fig. 3.11. Here, the recovery time indicates the time that is required for the output Q to change back to the correct value when a soft error occurs at Q. It can be observed from the figures that the proposed SHC latch is able to recover from the erroneous state as fast as existing works while with less transistors and less power consumption. Shorter recovery time indicates that the pulse width of generated glitch becomes smaller, which is desired because the generated glitch due to the occurred soft error at Q might be electrically masked in the following logic before it reaches next latch. It should be mentioned that, as for the soft error occurring at the output Q, the proposed SHC latch can eliminate the generated glitch at Q, which is also the common problem in existing works and should be improved in the future.

Finally, in the viewpoint of power consumption, Table 3.3 shows the comparisons on power consumption of the proposed SHC latch with the existing C-element-based



Figure 3.9: Propagation delay  $(T_{pH2L})$  comparisons in normal operations.

soft error hardened latch designs. Because the power consumption depends on data activity, the results of power consumption at various data activities of 100%, 50%, 25%, all zero and all one are given in the table. Data activity of 100% corresponds to the input pattern as 101010... and indicates that in each clock cycle there is only one transition at the output Q. 50% data activity corresponds to the input as 11001100... Furthermore, in order to analyze the static operations of the latch designs without any data switching, power dissipation corresponding to no data activity for all-one and all-zero input patterns are also provided. The results show that the proposed SHC latch consumes lowest total power for almost all the applied input patterns, and even less than the unhardened C<sup>2</sup>MOS latch. Furthermore, as the data transition activities increases, more power reduction can be achieved by using the proposed



Figure 3.10: Time of recovery from high to low (SNU at Q).

SHC latch. As can be calculated from the table, up to 20.35 % and 82.96 % power reduction can be achieved when compared to the conventional unhardened C<sup>2</sup>MOS latch and the existing soft error tolerant HiPeR design, respectively.



Figure 3.11: Time of recovery from low to high (SNU at Q).

|              | $0\%(all\_zero)$ | $0\%(all\_one)$ | 25%  | 50%   | 100%  |
|--------------|------------------|-----------------|------|-------|-------|
| $C^2MOS$     | 1.49             | 1.38            | 2.11 | 2.99  | 4.52  |
| TFH [38]     | 1.08             | 1.20            | 2.80 | 4.34  | 5.98  |
| FERST $[39]$ | 2.05             | 1.97            | 3.45 | 4.85  | 8.06  |
| HiPeR [40]   | 2.87             | 3.54            | 9.74 | 15.64 | 21.13 |
| SHC          | 1.11             | 1.53            | 2.04 | 2.43  | 3.60  |

Table 3.3: Comparisons on power consumption at various data activities ( $\mu W$ ).

### 3.4 Chapter Conclusion

In this chapter, a low power soft error hardened latch design using Schmitt-triggerbased C-element is proposed for reliable low power applications.

The total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened  $C^2MOS$  latch, while up to 82.96% power reduction can be achieved when compared to the existing soft error tolerant HiPeR design. When soft errors occur in the internal nodes of the proposed SHC, it can be filtered inside and will not cause the output Q to be upset. Moreover, in case of soft errors occurring and affecting the output, the proposed SHC latch can recover to the correct state as fast as the existing works while they usually introduce about 2X area overhead as large as the proposed SHC, which clearly shows the effectiveness of the proposed low cost SHC design.

## Chapter 4

# EDSL: SNU Hardened Latch with Error Detection

A power-efficient single node upset hardened latch design with in-situ error detection capability, EDSL, is proposed in this chapter for reliability improvement against soft errors.

This chapter is organized as follows. Section 4.1 presents the proposed EDSL latch design. The evaluation and comparison results with the existing SNU hardened designs are presented in Section 4.2. Finally, Section 4.3 concludes this chapter.

#### 4.1 EDSL Latch Design

As shown in Fig. 2.4, if a soft error occurs and affects either the PMOS or the NMOS in a latch, the output (Q) will be upset. It should be noted that the latch output may still be temporarily flipped even in soft-error resilient latch designs. Because Q is temporarily flipped, a glitch at Q will be generated and then propagate through the following combinational logic. Depending on the occurring time of soft errors and the width of this glitch, it might be captured by the storage element in the next stage, and the possibility might be increased with the clock frequency increment for higher speed processing requirement. This issue, however, was neglected in previous works. Therefore, it is desired to develop architecture-level reliable radiation-hardened latch designs by incorporating SNU self-recovery ability with in-situ error detection capability for reliability improvement against soft errors.

According to the soft error occurrence mechanism indicated in [32], the high-tolow transition or the low-to-high transition caused by a SNU occurs at a node driven by an NMOS or a PMOS transistor, respectively. Based on this mechanism, an error detection-based single-node-upset hardened latch as shown in Fig. 4.1 is proposed in this work. The proposed EDSL design consists of the error tolerant part and the error detector part, which makes it be able to not only recover from any SNU, but also detect the output transitions caused by occurred errors. The proposed EDSL latch can be viewed as an improvement of the SEH latch [32] for performance improvement and an additional error detector is also integrated for error detection so as to improve the reliability.

In the EDSL design, there are four main nodes such as PDH, NDH, BQ and Q. When a SNU occurs at PDH, PDH can only be flipped from low to high because PDH is driven only by PMOS transistors. Moreover,  $M_{P2}$  and  $M_{P3}$  are used for PDH to recover from high to low. In the same way, NDH is driven only by NMOS so that it can only be flipped from high to low by a soft error. Therefore, self-recovery ability is provided for PDH and NDH in EDSL. On the other hand, Q and BQ are driven by both PMOS and NMOS transistors, therefore they can be changed in both directions. As shown in Fig. 4.1, BQ is driven by PDH and NDH, therefore if PDH and NDH are correct, BQ will be kept as correct and then Q as well. Because Q and BQ might be temporally upset, a glitch at the latch output (Q) would be generated and propagate through the following combinational logic. To deal with this problem, the error detection logic in EDSL is proposed to generate an ERROR flag signal for architecture level reliability improvement.

Next, the operations of the proposed EDSL design will be explained. The operations of EDSL with different holding values are shown in Fig. 4.2 - 4.3. During the transparent mode (CK=1), the input from D propagates to Q through the transmission gate. At the same time, PDH and NDH are holding the same data as D and Q. BQ is stored the inverted value of D through  $M_{P4}$  or  $M_{N4}$ .

During the hold mode (CK=0), every internal node keeps the saved data in errorfree operations so that EDSL works as a normal latch. When a soft error occurs and affects the node driven by only PMOS (or NMOS), the corresponding node will be flipped from low to high (or from high to low). For easy explanation, here assume that D, PDH, NDH, Q are holding low and BQ is high. If a SNU occurs and affects PDH forcing it to be flipped from low to high, the upset PDH will not affect the other internal nodes because  $M_{P4}$  turns off and  $M_{N2}$  is off. And in this case, the detector part keeps outputting low to flag no transitions at the latch output (Q). It works in a similar way for NDH to be flipped from high to low.

When a soft error affects BQ and assume that BQ is flipped from high to low, the error data propagates to the output Q temporarily, so that Q will be flipped from low to high. In this case, because PDH holds low, the node X will be discharged to low because  $M_{P6}$ ,  $M_{P9}$  and  $M_{N9}$  turn on so that an error warning signal will be generated. While at the recovery part, because BQ is driven by  $M_{P4}$  and PDH holds the correct value, BQ can then be recovered. If BQ is restored the correct data, Q can also be recovered to the correct state immediately.

When the output Q is flipped from high to low by a SNU, because PDH/NDH and BQ hold the correct values, Q can then be recovered immediately. In this case,  $M_{N4}$  keeps on, and PDH/NDH and BQ hold high and low, respectively, therefore  $M_{N6}$  and  $M_{P8}$  turn on, the node X is discharged to low and then the ERROR signal is charged to be high to indicate the transition at the latch output Q.

#### CHAPTER 4. EDSL: SNU HARDENED LATCH WITH ERROR DETECTION56

Therefore, not only can the proposed EDSL design recover from any single event upset, it can provide in-situ error detection capability when the latch output is upset, which clearly shows the reliability improvement of EDSL over the existing works.



Figure 4.1: Proposed EDSL design.

**М**<sub>N7</sub>

СКВ



Figure 4.2: The operation of EDSL with the holding value 0 in the state-holding phase.



Figure 4.3: The operation of EDSL with the holding value 1 in the state-holding phase.

#### 4.2 Evaluation Results

In this work, the proposed EDSL was designed and simulated by using Rohm 180nm CMOS technology with  $V_{dd} = 1.8V$ , T=25°C and f = 125 MHz, and the minimum-sized transistors are used if the circuit runs correctly. An inverted clock (CKB) on a circuit is generated by using a minimum-sized symmetric inverter.

First, the operations of the proposed EDSL with various SNU occurrences are evaluated, and the corresponding simulation waveform is shown in Fig. 4.4. Because there are four critical nodes in the proposed EDSL design such as PDH, NDH, BQ and Q, Fig. 4.4 shows the SNU simulation waveform in which only one of the four critical nodes is upset. From the figure, it can be observed that the proposed EDSL works as a normal latch in error-free operations, and it can be successfully restored to the correct state immediately when a SNU occurs at Q or BQ. Moreover, a high ERROR signal can be successfully generated for architecture level recovery when the latch output is upset due to the SNU at Q or BQ.

To evaluate the advantages of the proposed EDSL design over the existing soft error hardened latch designs such as self-recovery methods (SEH [32], PDFSR [41] and SHST [42]) and error-detection methods (SED [35] and sPGTD [36]). In this work, the number of transistors, propagation delay, hold time, setup time, power consumption, power-delay-product and the corresponding reliability of the previous mentioned designs were evaluated, and the corresponding results are shown in Table 4.1, where the original sPGTD design was improved for soft error detection. The propagation delay (CQ delay and DQ delay) was measured by calculating the delay from D/CK to Q between the transition edge both at VDD/2. The average power consumption was measured with the typical 66.7% activity ratio of the error-free operations.

According to Table 4.1, EDSL achieves up to 59.75% and 72.39% reduction in CK-Q delay when compared with state-of-the-art detection-based sPGTD and SNU resilient SHST latch, respectively. As for D-Q delay, the corresponding reduction is 60.12% and 70.71%, respectively. The significant reduction in the propagation delay of the proposed EDSL latch is due to the short path from D to Q introduced in EDSL
which only contains one transmission gate. As for the power efficiency measurement, the proposed EDSL can achieve up to 72.25% and 79.74% reduction in power-delay product, respectively, which clearly shows the effectiveness of the proposed method.

The power consumption under various data activities ranging from 0% (static allzero or all-one input) to 100% (input toggles at every clock cycle) are also provided in Fig. 4.5. It should be noted that EDSL is the only latch that can recover from any incurred single event upset while can provide in-situ error detection capability as well when the latch output is upset. Therefore, the results shown in Fig. 4.5 further illustrate the power efficiency of the proposed EDSL design.



Figure 4.4: Simulation waveform of EDSL with various SNUs.



Figure 4.5: Comparison of power consumption under various activity ratios.

|                        | Table 4.1: ( | Jomparison res | sults with e | xisting works   |           |          |
|------------------------|--------------|----------------|--------------|-----------------|-----------|----------|
|                        | Detection    | 1-based latch  | S            | NU resilient la | tch       | Proposed |
|                        | SED [35]     | sPGTD [36]     | SEH [32]     | PDFSR [41]      | SHST [42] | EDSL     |
| # of transistor        | 21           | 31             | 16           | 26              | 28        | 24       |
| power[uW]              | 5.31         | 10.75          | 5.03         | 6.48            | 10.27     | 7.48     |
| CQ delay[ps]           | 159.06       | 221.79         | 127.42       | 309.96          | 323.37    | 89.27    |
| DQ delay(rise)[ps]     | 134.94       | 149.97         | 73.77        | 257.77          | 307.14    | 89.97    |
| DQ delay(fall)[ps]     | 128.99       | 225.58         | 140.82       | 295.21          | 300.02    | 26.87    |
| hold time[ps]          | 37.30        | 177.04         | 44.99        | 48.85           | 89.71     | 46.86    |
| setup time[ps]         | 50.14        | 225.58         | 65.09        | 142.16          | 71.50     | 32.46    |
| PDP                    | 844.61       | 2424.99        | 708.32       | 2008.54         | 3321.01   | 672.98   |
| Self-recovery of SNUs  | No           | No             | Yes          | Yes             | Yes       | Yes      |
| Output Error Detection | Yes          | Yes            | No           | No              | No        | Yes      |
|                        |              |                |              |                 |           |          |

### 4.3 Chapter Conclusion

In this chapter, an error detection-based single-node-upset latch was proposed for reliability improvement against soft errors. The proposed EDSL can recover from any incurred single event upset while can provide in-situ error detection capability as well when the latch output is upset. Moreover, EDSL can achieve up to 72.25% and 79.74% PDP reduction when compared with state-of-the-art detection-based and SNU resilient designs, respectively. The proposed EDSL will be applied to some critical designs for architecture level implementation.

## Chapter 5

# TDRHL: MNU Hardened Latch with Error Detection

To improve the reliability of critical designs, this chapter presents an output transition detector-based radiation-hardened latch design for both single- and multiple-node upsets.

With an error recovery assistant logic and an in-situ transition detector, for any radiation induced single- and double-node upsets, the proposed TDRHL can 1) provide full self-recovery capability, and 2) generate a warning signal for architecture level recovery only when soft errors cause the latch output flipped. The evaluation results show the significant reliability and energy efficiency improvements of the proposed TDRHL design.

This chapter is organized as follows. Section 5.1 present the proposed TDRHL latch design. The evaluation and comparison results with the existing DNU hardened and detection-based designs are presented in Section 5.2 and 5.3, respectively. Finally, Section 5.4 concludes this chapter.

### 5.1 TDRHL Latch Design

In order to mitigate the effects induced by soft errors, various radiation-hardened design methods have been proposed [23, 24, 32, 34, 35, 36, 37, 43, 44]. Most of them are based on hardware redundancy in which the results of redundant copies are compared and used for error recovery, and the DICE family is one of the representative works. Although the original DICE [23] works well for single-node upset, it is not capable of handling multiple-node upsets such as double-node upsets and triple-node upsets (TNUs). SEH latches [32, 34, 43] belong to another category of radiation-hardened design techniques, which, inherently based on double sampling, takes advantages of soft-error mechanism for recovery. These low-cost latch designs are very effective towards SNU mitigation; however, a trade-off between the area overhead of error recovery and the incurred delay is required for MNUs. In addition, there are also several error-detection-based methods [35, 36, 37] which can be improved for soft error detection. Although some of these methods are originally used for variation resilient designs, the basic idea can be extended for soft error detection. Unfortunately, they usually cannot provide the capability of self-recovery from soft errors. Furthermore, a few methods have recently been proposed for DNUs in [43, 44]. These methods are inherently based on hardware redundancy, thus the corresponding DNU resilience comes at the price of increased area, power, and/or delay penalties.

In addition to the above-mentioned low-cost MNU mitigation design challenge, it should be noted that soft errors occurring in a latch can lead to an upset at the latch output, which may propagate through the succeeding combinational logic and is captured by the next-stage storage element; however, this issue was often neglected in previous works. As shown in Fig. 2.4, if a soft error occurs and affects either the PMOS or the NMOS in a latch, the output (Q) will be upset. It should be noted that the latch output may still be temporarily flipped even in soft-error resilient latch designs. Because Q is temporarily flipped, a glitch at Q will be generated and then propagate through the following combinational logic. Depending on the occurring time of soft errors and the width of this glitch, it might be captured by the storage element in the next stage, and the possibility might be increased with the clock frequency increment for higher speed processing requirement. Therefore, it is desired to develop architecture-level reliable radiation-hardened latch designs with the consideration of both DNUs and the possible latch output flipping.

As shown in [32], due to the failure mechanism caused by soft errors, the highto-low transition and the low-to-high transition occur at a node driven by an NMOS and a PMOS, respectively. The error polarity depends on the electrical field direction which is decided by the P/N diffusion region and the type of substrate. Therefore, for one critical node if it is only driven by a PMOS or an NMOS, the same type of MOSFET can be used to help the node for error recovery. In addition, unlike most of detection-based methods in which delay buffers are inserted for transition detection, if the existing nodes in a latch can be used for comparison, the required area overhead as well as the corresponding power consumption may be reduced.

Based on the above idea, TDRHL is proposed in Fig. 5.1, which contains a baseline latch, an error recovery assistant logic and an output transition detector. It should be noted that, in TDRHL, there are six critical nodes such as LP1, LN1, LP2, LN2, Q and LQ. Because these nodes control all the transistors as gate inputs,



Figure 5.1: Proposed TDRHL latch.

self-recovery ability and transition detection should be provided for them. Among these six critical nodes, LP1 and LP2 are driven only by PMOS transistors so that they can only be changed from low to high when a soft error occurs, LN1 and LN2 can be changed from high to low, and Q and LQ can be changed in both directions due to the principal of soft error mechanism illustrated in [32].

In ERAL,  $M_{p2}$ ,  $M_{p3}$  and  $M_{p4}$  are used to for the recovery of LP1; while  $M_{n2}$ ,  $M_{n3}$  and  $M_{n4}$  are used for LN1. It should be noted that, when soft errors occur and affect the driving PMOS, LP1 can only be flipped from low to high because LP1 is only driven by PMOS. If Q, LQ and LN1 are holding the correct values,  $M_{p2}$ ,  $M_{p3}$  and  $M_{p4}$  should be ON and can help the floating node, LP1, to recover to low. Similarly,  $M_{n2}$ ,  $M_{n3}$  and  $M_{n4}$  are used for the recovery of LN1. The ERAL combined with the baseline latch can be considered as an enhancement of the SEID latch [43] for performance improvement. One of the main improvements is the added clock-controlled transmission gate between D and Q that can help to form one short path for shorter setup time and smaller propagation delay. Another improvement is the logic and wiring optimization for leakage power reduction. And the third improvement is the inserted clock-controlled inverter for performance improvement and the support of clock-gating.

The QTD unit in TDRHL is used to generate an architecture-level warning signal indicating the upset of a latch output in the state-holding phase. As shown in Fig. 5.1, the clock controlled PMOS header  $(M_{p12})$  is used to pre-charge the node (X) when CKB is low. Thus, the node X is initialized to high independent of any input transitions in the transparent phase, and the ERROR signal is low. When CKB is high (assuming the error detection phase is the same as the state-holding phase), the drop of X due to one of the conducting paths along  $(M_{p13} \text{ and } M_{n11})$ ,  $(M_{n13}$ and  $M_{n11})$ ,  $(M_{p14} \text{ and } M_{p11})$ ,  $(M_{n14} \text{ and } M_{p11})$  will switch the output inverter to generate a warning signal to flag the transitions at the output (Q). The two pairs of complementary transistors  $(M_{p13} \text{ and } M_{n13})$  and  $(M_{p14} \text{ and } M_{n14})$  can be viewed as two transmission gates, in which LP1 and LN2 should hold complementary values in normal operations, and so do LP2 and LN1. The operations of TDRHL with different holding values are shown in Figs. 5.2-5.3. Because LP1 and LN1 should hold the complimentary values from LP2 and LN2, there is no conducting path from X to GND when the latch is error-free, therefore the ERROR signal is kept as low. It should be noted that, unlike most of previous detection-based designs in which delay elements are inserted for transition detection, the proposed TDRHL uses the existing critical nodes for transition detection, which can drastically reduce the power and area overhead.

In the following, the error recovery and detection operations of TDRHL for SNUs will be illustrated.

- Case 1: If LP1 is flipped from low to high, the PMOSs driven by LP1 ( $M_{p5}$  and  $M_{p7}$ ) will be OFF and the NMOS ( $M_{n5}$ ) turns ON. Because no other critical nodes including Q would be affected, ERROR will be kept as low. At the same time, because LQ, Q and LN1 are not affected and still holding the correct values, the error recovery assistant transistors  $M_{p2}$ ,  $M_{p3}$  and  $M_{p4}$  are kept on, which forces LP1 to be restored to low. It works in a similar way when LN1 is flipped from high to low.
- Case 2: If LN2 is flipped from high to low, although the glitch at LN2 turns on



Figure 5.2: The operations of TDRHL with the holding value 0.

 $M_{p6}$ , all the other critical nodes would not be affected and still hold the correct values. Therefore  $M_{p7}$  and  $M_{n6}$  are kept on, and then LN2 can be restored to high. In this case, ERROR keeps low to flag no transitions at Q. It works in a similar way for LP2 to be flipped from low to high.

- Case 3: When an SNU occurs and affects LQ, because LQ is driven by both NMOS and PMOS, it can be flipped in both directions. LP2 and LN2 hold the correct value, thus LQ can be restored through either  $M_{p10}$  or  $M_{n10}$ . Because, in TDRHL, LQ is only used for recovery assistance, so the ERROR signal keeps low.
- Case 4: If a soft error affects Q in the state-holding phase, there are two cases we need to consider. For simple description, assume that Q = LP1 = LN1 =LQ = 0 and LP2 = LN2 = 1 for simple description. For the proposed QTD, because both  $M_{p14}$  and  $M_{N14}$  are OFF, therefore, even  $M_{p11}$  driven by the 0holding Q is ON, the node X still keeps as high and ERROR keeps low. If Q is flipped from low to high in the state-holding phase,  $M_{p11}$  turns OFF and  $M_{n11}$ turns ON. Because LP1 and LN2 are keeping at low and high, respectively, the



Figure 5.3: The operation of TDRHL with the holding value 1.

node X will be discharged through  $(M_{p13}, M_{n13})$  and  $M_{n11}$ , and then ERROR will become high indicating that there is a transition occurring at Q. At the same time, because all the other critical nodes in the proposed TDRHL hold the correct value, Q can be recovered through  $M_{n8}$  and  $M_{n9}$ . It works in a similar way for error recovery and transition detection when Q is flipped from high to low.

Next, let us consider the cases when DNUs occur in TDRHL. As mentioned above, there are 6 critical nodes, therefore there are totally 8 possible DNUs in the proposed TDRHL such as  $(Q \nearrow \text{ and } LP1 \nearrow)$ ,  $(Q \searrow \text{ and } LP2 \nearrow)$ ,  $(LQ \nearrow \text{ and } LP1 \nearrow)$ ,  $(Q \nearrow and LQ \nearrow), (Q \searrow and LQ \searrow), (LQ \searrow and LN1 \searrow), (Q \searrow and LN1 \searrow),$ and  $(Q \nearrow \text{ and } LN2 \searrow)$ . Due to the soft error property in which soft errors only result in one direction flip, (LP1 and LN1), (LP2 and LN2), (LP1 and LP2), and (LN1 and LN2) cannot be flipped at the same time. In TDRHL, even DNUs occur and make two critical nodes flipped, the flipped nodes can still be restored through the error recovery assistant logic, which makes the proposed TDRHL to be DNU resilient. Moreover, for all the 8 DNU patterns, there are 6 patterns that cause Q flipped and among them 2 patterns such as  $(Q \nearrow \text{ and } LQ \nearrow)$  and  $(Q \searrow \text{ and } LQ \searrow)$ can be viewed as an SNU at Q because, even if LQ is flipped, it does not affect either the other critical nodes or the transition detector. For the other 4 Q-flipped DNU patterns, TDRHL can work in a similar way, so here let us take (Q  $\nearrow$  and LP1  $\nearrow$ ) as an example for explanation. If Q and LP1 are flipped from low to high, because LN2 and LN1 hold the correct values, Q can be restored through  $M_{n8}$  and  $M_{n9}$ . If Q is restored,  $M_{p4}$ ,  $M_{p3}$ , and  $M_{p2}$  will be ON to help LP1 to recover to low. At the same time, for the detector,  $M_{n13}$  is ON because LN2 is high, and, if Q is flipped from low to high,  $M_{n11}$  will turn ON and then X becomes low. As a result, the ERROR signal will be high to flag the transitions at Q.

To combine the output transition detector with the error recovery assistant logic, TDRHL can recover from any SNUs and DNUs and provide architecture level recovering capability through error detection as well, which can provide reliability improvements over state-of-the-art techniques.

### 5.2 Evaluation of MNU Tolerance

In our evaluation, all the latch designs are designed and simulated by using the 32nm Predictive Technology Model (PTM) [47], and timing variations at various process corners (SS: 0.8V/125°C, TT: 0.9V/25°C, FF: 1.0V/-40°C) are also considered. The minimum-sized transistors are determined for the correct function of each latch at all process corners.

The operations of TDRHL with various soft error occurrences are evaluated, and the simulation waveforms of SNUs and DNUs are shown in Fig. 5.4 and Fig. 5.5, respectively. The proposed TDRHL can successfully self-recover from all the injected SNUs and DNUs. Moreover, in case of that Q is upset, a high ERROR signal is successfully generated as a warning signal to flag the transition at the latch output.

In addition to the above SNU and DNU evaluations, simulations are also conducted to check the robustness of TDRHL for triple-node upsets. The corresponding simulation waveform is shown in Fig. 5.6. Totally, there are 4 possible TNU patterns for the 6 critical nodes in TDRHL such as  $(Q \nearrow \& LP1 \nearrow \& LQ \nearrow)$ ,  $(Q \searrow \& LN1 \searrow \& LQ \nearrow)$ ,  $(Q \searrow \& LP2 \nearrow \& LQ \searrow)$ , and  $(Q \nearrow \& LN2 \searrow \& LQ \nearrow)$ . Even for these untargeted TNU patterns, there is still 50% possibility for TDRHL to successfully generate the ERROR signal. For TNUs, the critical nodes that are not directly affected by TNUs would also be flipped, which raises the difficulty of error detection and error recovery. In addition, the threshold voltage loss of the PMOS in QTD makes it difficult for the node X to be discharged. But it should be noted here that, even with the low-cost implementation, TDRHL still has the capability to increase the potential reliability for untargeted soft errors, which is a side benefit and is desirable in critical designs.



Figure 5.4: Simulation waveform of SNUs.







Figure 5.6: Simulation waveform of TDRHL with triple-node upsets.

### 5.3 Comparison Results

In the evaluation, all the latch designs were designed and simulated by using the 32nm Predictive Technology Model (PTM) [47], and timing variations at various process corners (SS: 0.8V/125°C, TT: 0.9V/25°C, FF: 1.0V/-40°C) were also considered. The minimum-sized transistors were determined for the correct function of each latch at all process corners.

#### 5.3.1 Performance Comparison

The comparison results of the proposed TDRHL with the existing DNU resilient latches [43, 44] ,SNU resilient latch [42] and the detection-based latches [35, 36] are presented in Table 5.1, where the original variation-tolerant sPGTD latch [36] was optimized for soft error detection. The number of transistors, propagation delay, setup time, hold time and average power consumption are evaluated. Clock-to-Q (CQ) delay and Data-to-Q (DQ) delay were calculated between the transition edge both at VDD/2. The average power was measured with the typical 30% activity ratio when the latch is error-free.

According to Table 5.1, when compared with the detection-based SED [35], selfrecovery ability of SNUs and DNUs in TDRHL is provided at the cost of only 4.4% more power consumption. Unlike sPGTD [36] in which delay elements are generally inserted to generate signals for comparation, TDRHL uses the internal nodes for comparison therefore up to 70.6% power saving can be achieved. On the other hand, when compared with SEID, on which the proposed TDRHL was built, not only full SNU and DNU resilience but also error detection can be achieved in TDRHL. Although additional 14 transistors and 61.7% power overhead are introduced due to the QTD in TDRHL, the corresponding CQ delay and DQ delay can be reduced by 81.3% and 88.5% due to the short path introduced in ERAL. Moreover, TDRHL incurs similar area overhead as state-of-the-art DNU resilient latch designs [44]. It is interesting that, compared with HRDNUT, on average 27.7% power saving can be achieved in TDRHL.

To examine the timing variations at different process corners, Fig. 5.8 illustrates

the corresponding results on DQ delay and CQ delay. SEID shows the greatest fluctuations and SHST is the next, while the timing variations of HRDNUT and TDRHL are the most consistent. Specifically, the proposed TDRHL outperformed existing works especially in DQ delay, which is partially due to the short path introduced in TDRHL.

#### 5.3.2 Power Consumption Evaluation

The power consumption is measured with various data activities ranging from 0% (static all-zero or all-one input) to 100% (input toggles at every clock cycle), and the results are shown in Fig. 5.9. Among the four soft error tolerant designs such SEID[43], HRDNUT[44] and TDRHL, the proposed TDRHL outperforms HRDNUT and SHST, and SEID is the most power-saving design, which is also one of the reasons why the proposed TDRHL was built on SEID. Compared with SEID, the power overhead of TDRHL is caused by the output transition detector. On the other hand, among the detection-based designs such as SED, sPGTD and TDRHL, TDRHL and SED have similar results, and TDRHL can achieve 72.5% power savings when compared with sPGTD, which clearly shows the low power implementation of the proposed QTD.

Power-delay-product comparison results at different process corners are shown in Fig. 5.10, in which the power was measured with the typical 30% activity ratio when the latch is error-free, and all the results are normalized to the PDP of TDRHL at TT corner. Among the detection-based methods, TDRHL achieves the lowest and the most consistent PDP at the three process corners. On the other hand, the PDP improvements at TT corner of TDRHL over SEID, HRDNUT and SHST are 1.9X, 2.5X and 5.0X, respectively. It should be noted that TDRHL is the only latch that can recover from any SNUs and DNUs, and provide architecture level resiliency, therefore the PDP results clearly illustrate the power efficiency of the proposed TDRHL design.



Figure 5.7: Clock-to-Q delay at different process corners.



Figure 5.8: Data-to-Q delay at different process corners.



Figure 5.9: Comparisons on power consumption with various activity ratio.

|                        | Tabl             | le 5.1: Compar     | ison results       | with existing wor | ks at TT corner     |                   |
|------------------------|------------------|--------------------|--------------------|-------------------|---------------------|-------------------|
|                        | Detection        | n-based latch      | DNU re             | esilient latch    | SNU resilient latch | Proposed          |
|                        | SED [35]         | sPGTD [36]         | SEID [43]          | HRDNUT [44]       | SHST[42]            | TDRHL             |
| Self-recovery          | No               | No                 | Yes                | Yes               | Yes                 | Yes               |
| of SNUs                |                  |                    |                    |                   |                     |                   |
| Self-recovery          | No               | No                 | Yes                | Yes               | No                  | Yes               |
| of $DNUs$              |                  |                    |                    |                   |                     |                   |
| Output Error           | Yes              | Yes                | No                 | No                | No                  | Fully Yes         |
| Detection              | for SNU          | for $SNU^*$        |                    |                   |                     | for $SNUs + DNUs$ |
|                        |                  |                    |                    |                   |                     | & Partially Yes   |
|                        |                  |                    |                    |                   |                     | for TNUs          |
| # of transistors       | 21               | 31                 | 20                 | 36                | 28                  | 34                |
| Clock-to-Q             | 38.61            | 46.51              | 109.07             | 23.93             | 81.26               | 20.41             |
| delay(ps)              |                  |                    |                    |                   |                     |                   |
| Data-to-Q              | 31.15            | 44.52              | 95.09              | 18.70             | 77.89               | 10.95             |
| delay(ps)              |                  |                    |                    |                   |                     |                   |
| Setup time(ps)         | 27.51            | 31.30              | 29.02              | 37.38             | 36.25               | 27.36             |
| Hold time(ps)          | 16.20            | 42.71              | 25.31              | 25.78             | 32.23               | 24.65             |
| Average                | 253.36           | 900.05             | 163.64             | 365.95            | 448.65              | 264.61            |
| power(nW)              |                  |                    |                    |                   |                     |                   |
| * Error may escape fro | m detection in s | PGTD due to the in | nserted delay elen | nents.            |                     |                   |

| CHAPTER 5. | TDRHL: MNU | HARDENED | LATCH | WITH ERROR | DETECTION79 |
|------------|------------|----------|-------|------------|-------------|
|            |            |          |       |            |             |



Figure 5.10: Power-delay-product (PDP) at different process corners.

### 5.4 Chapter Conclusion

In this chapter, a power-efficient TDRHL latch design, which contains a baseline latch, an error recovery assistant logic and an output transition detector, was proposed. The proposed TDRHL latch can recover from any SNUs and DNUs and provide architecture level recovery capability through error detection as well, which can provide reliability improvement over existing techniques. Compared with stateof-the-art DNU tolerant designs, up to 5.0X PDP improvement can be achieved. Moreover, even with the low-cost implementation, TDRHL is still able to partially detect the flipped output and flag it as a warning signal for untargeted TNUs, further increasing the potential reliability of critical designs.

# Chapter 6

# **Conclusions and Future Research**

This chapter summarizes the technical achievements of this dissertation. In addition, based on the achievement of this work, several future research directions which will contribute toward reliable IC designs are briefly outlined.

### 6.1 Technical Summary

To solve the radiation-induced soft error problem, three soft error hardened latch designs were proposed in this dissertation for reliability and energy-efficiency improvements, which can be viewed as i) SNU hardened design, ii) SNU hardened and detection-based design, and iii) MNU hardened and detection-based design, respectively.

The proposed SHC latch design as shown in Chapter 3 is a SNU hardened latch, which is based on the ultilization of Schmitt-trigger-based C-element. In addition to the SNU tolerance, up to 82.96 % power savings can be achived when compared to the existing soft error tolerant HiPeR design.

The EDSL design shown in Chapter 4 was proposed for reliability improvement with error detection ability, which can recover from any SNU while is still able to provide in-situ error detection capability when the latch output is flipped. As a result, up to 72.25 % and 79.74 % PDP improvements can be obtained when compared with the existing detection-based and the SNU hardened designs, respectively.

To further the reliability improvements for MNUs, Chapter 5 presented the TDRHL design, which can provide power efficiency and reliability improvement of critical designs against both single- and multiple-node upsets. TDRHL is the only latch that can recover from any SNUs and DNUs, and provide architecture level resiliency. Moreover, up to 5.0X PDP improvement could be obtained when compared with exiting works.

### 6.2 Future Research

As process technology continues scaling down, radiation-induced soft errors are becoming one of the most critical concerns in state-of-the-art IC designs. Due to the transient property of soft errors, architecture level radiation-hardened design techniques should be developed to guarantee systems' reliability. Therefore, the presented works embodied in this dissertation can be further studied involving the improvement of the proposed latch designs for architecture level protection of critical systems. On the other hand, during the development of this dissertation, it is observed that more and more research works on the next generation non-volatile memories (NVMs) such as phase change memory (PCM) and spin transfer torque RAM (STT-RAM) have been proposed recently, which is believed to gain more and more attention in the coming intelligent era. Therefore, another direction of the future research direction involves developing architecture level soft-error tolerant NVMs for reliability improvement of future memory systems. The works described in Chapter 4 and 5 of this dissertation will be the starting point by extending the latch design techniques to memory cores.

# Bibliography

- H. Li, Q. Lit, and J. Zhang, "A survey of hardware Trojan threat and defense," Integration, vol. 55, pp. 426–437, Sep. 2016.
- [2] K. Hasegawa, Y. Shi, and N. Togawa, "Hardware Trojan detection utilizing machine learning approaches," 17th IEEE Int. Conf. on Trust, Security and Privacy in Computing and Communications / 12th IEEE Int. Conf. on Big Data Science and Engineering, pp. 1891–1896, Aug. 2018.
- [3] K. Kobayashi, "How to mitigate reliability-related issues on nano-scaled LSIs," *IEICE Technical Report*, vol. 71, no. 112, pp. 25–30, May 2012.
- [4] M. Yanuuchi, R. Kishida, and K. Kobayashi, "Correlations between BTI-induced degradations and process variations on asics and FPGAs," *IEICE Trans. on Fundamentals of Electronics Communications and Computer Sciences*, vol. E97-A, no. 12, pp. 2367–2372, 2014.
- [5] M.-Y. Kim, H. Lee, and C. Kim, "PVT variation tolerant current source with on-chip digital self-calibration," *IEEE Trans. on Very Large Scale Integr. Syst.*, vol. 20, no. 4, pp. 737–741, Apr. 2012.
- [6] T. Grasser, B. Kaczer, W. Goes, H. Reisinger, T. Aichinger, P. Hehenberger, P. J. Wagner, F. Schanovsky, M. T. L. J. Franco, and M. Nelhiebel, "The paradigm shift in understanding the bias temperature instability: from reaction-diffusion to switching oxide traps," *IEEE Trans. on Electron Devices*, vol. 58, no. 11, pp. 3652–3666, Dec. 2011.

- [7] V. Huard, C. Parthasarathy, C. Guerin, T. Valentin, M. M. E. Pion, N. Planes, and L. Camus, "NBTI degradation: from transistor to SRÅM arrays," *Reliability Physics Symp.*, vol. 46, no. 1, pp. 289–300, Jan. 2008.
- [8] S. Almukhaizim, F. Shi, E. Love, and Y. Makris, "Soft-error tolerance and mitigation in asynchronous burst-mode circuits," *IEEE Trans. on Very Large Scale Integr. Syst.*, vol. 17, no. 7, pp. 869–882, Jul. 2009.
- [9] N. Seifert, P. Slankard, M. Kirsch, B. Narasimham, V. Zia, C. Brookreson, A. Vo, S. Mitra, B. Gill, and J. Maiz, "Radiation-induced soft error rates of advanced CMOS bulk devices," *IEEE Int. Reliability Physics Symp.*, vol. 4, pp. 217–225, Mar. 2006.
- [10] P. Hazucha, T. Karnik, J. Maiz, S.Walstra, B. Bloechel, J. Tschanz, G. Dermer, S. Hareland, P. Armstrong, and S. Borkar, "Neutron soft error rate measurements in a 90-nm CMOS process and scaling trends in SRAM from 0.25-um to 90-nm generation," *IEEE Int. Electron Devices Meeting 2003*, pp. 21.5.1–21.5.4, Dec. 2003.
- [11] H. Liu, M. Cotter, S. Datta, and V. Narayanan, "Soft-error performance evaluation on emerging low power devices," *IEEE Trans. on Device and Materials Reliability*, vol. 14, no. 2, pp. 732–741, Jun. 2014.
- [12] R. Iyer and D. Rosetti, "A statical load dependency of CPU errors at SLAC," 25th Int. Symp. on Fault-Tolerant Computing, p. 373, Jun. 1995.
- [13] Y. Arima, T. Yamashita, Y. Komatsu, T. Fujimoto, and K. Ishibashi, "Cosmicray immune latch circuit for 90nm technology and beyond," *IEEE Int. Solid-State Circuits Conf.*, pp. 492–493, Feb. 2004.
- [14] L. Freeman, "Critical charge calculations for a bipolar SRAM array," *IBM Jour*nal of Research and Development, vol. 40, no. 1, pp. 119–120, Jan. 1996.
- [15] A. Maheshwari, W. Burleson, and R. Tessier, "Trading off transient fault tolerance and power consumption in deep submicron (DSM) VLSI circuits," *IEEE Trans. on Very Large Scale Integr. Syst.*, vol. 12, no. 3, pp. 299–311, Mar. 2004.

- [16] N. Seifert, S. Jahinuzzaman, J. Velamala, R. Ascazubi, N. Patel, B. Gill, J. Basile, and J. Hicks, "Soft error rate improvements in 14-nm technology featuring second-generation 3D tri-gate transistors," *IEEE Trans. on Nuclear Science*, vol. 62, no. 6, pp. 2570–2577, Dec. 2015.
- [17] Y. Tosaka, H. Ehara, M. Igeta, T. Uemura, H. Oka, N. Matsuoka, and K. Hatanaka, "Comprehensive study of soft errors in advanced CMOS circuits with 90/130 nm technology," *IEEE Int. Electron Device Meeting Technical Di*gest, pp. 38.3.1–38.3.4, Dec. 2004.
- [18] E. Ibe, C. Sung, W. ShiJie, H. Yamaguchi, Y. Yahagi, H. Kameyama, S. Yamamoto, and T. Akioka, "Spreading diversity in multi-cell neutron-induced upsets with device scaling," *Custom Integr. Circuits Conf.*, pp. 437–444, Sep. 2006.
- [19] W. Wu and N. Seifert, "MBU-Calc: A compact model for multi-bit upset (MBU) SER estimation," *IEEE Int. Reliability Physics Symp.*, vol. SE.1.1-SE.2.6, Apr. 2015.
- [20] E. Ibe, S. Chung, S. Wen, H. Yamaguchi, Y. Yahagi, H. Kameyama, S. Yamamoto, and T. Akioka, "Spreading diversity in multi-cell neutron-induced upsets with device scaling," *IEEE Custom Integr. Circuits Conf.*, Sep. 2006.
- [21] N. Seifert, V. Ambrose, B. Gill, Q. Shi, R. Allmon, C. Recchia, S. Mukherjee, N. Nassif, J. Krause, J. Pick-holtz, and A. Balasubramanian, "On the radiationinduced soft error performance of hardened sequential elements in advanced bulk CMOS technologies," *IEEE Int. Reliability Physics Symp.*, pp. 3A. 1. 1–3A. 1. 10, May 2010.
- [22] K. Katsarou and Y. Tsiatouhas, "Double node charge sharing SEU tolerant latch design," *IEEE 20th Int. on-line Testing Symp.*, pp. 122–127, Jul. 2014.
- [23] T. Calin, M. Nicolaidis, and R. Velazco, "Upset hardened memory design for submicron CMOS technology," *IEEE Trans. on Nuclear Science*, vol. 43, no. 6, pp. 2874–2878, Dec. 1996.

- [24] A. Yan, Z. Huang, M. Yi, X. Xu, Y. Ouyang, and H. Liang, "Double-nodeupset-resilient latch design for nanoscale CMOS technology," *IEEE Int. Test Conf.*, vol. 25, no. 6, pp. 1978–1982, Jun. 2017.
- [25] K. Kobayashi, K. Kubota, M. Masuda, Y. Manzawa, J. Furuta, S. Kanda, and H. Onodera, "A low-power and area-efficient radiation-hard redundant flip-flop, DICE ACFF, in a 65 nm Thin-BOX FD-SOI," *IEEE Trans. on Nuclear Science*, vol. 61, no. 4, pp. 1881–1888, Jun. 2014.
- [26] T. Uemura, Y. Tosaka, H. Matsuyama, K. Shono, C. J. Uchibori, K. Takahisa, M. Fukuda, and K. Hatanaka, "SEILA: Soft error immune latch for mitigating multi-node-SEU and local-clock-SET," *IEEE Int. Reliability Physics Symp.*, May 2010.
- [27] C.Aishwarya and R.Vijayabhasker, "A low power delay product SEU tolerant ISO-DICE latch circuit design," Int. Archive of Applied Sciences and Tech., vol. 5, no. 2, pp. 24–30, Jun. 2014.
- [28] D. Mavis and P. Eaton, "Soft error rate mitigation techniques for modern microcircuits," *IEEE Int. Reliability Physics Symp.*, pp. 216–225, Apr. 2002.
- [29] R. E. Lyons and W. Vanderkulk, "The use of triple-modular redundancy to improve computer reliability," *IBM Journal of Research and Development*, vol. 6, no. 2, pp. 200–209, Apr. 1962.
- [30] C. Ramamurthy, S. Chellappa, V. Vashishtha, A. Gogulamudi, and L. T. Clark, "High performance low power pulse-clocked TMR circuits for soft-error hardness," *IEEE Trans. on Nuclear Science*, vol. 62, no. 6, pp. 3040–3048, Dec. 2015.
- [31] R. C. Lacoe, "Improving integrated circuit performance through the application of hardness-by-design methodology," *IEEE Trans. on Nuclear Science*, vol. 4, no. 55, pp. 1903–1925, Sep. 2008.
- [32] Y. Komatsu, Y. Arima, T. Fujimoto, T. Yamashita, and K. Ishibashi, "A softerror hardened latch scheme for SoC in a 90 nm technology and beyond," *IEEE Custom Integr. Circuits Conf.*, pp. 329–332, Oct. 2004.

- [33] S. Tajima, N. Togawa, M. Yanagisawa, and Y. Shi, "Soft error tolerant latch designs with low power consumption," *IEEE Int. Conf. on ASIC*, pp. 52–55, Oct. 2017.
- [34] —, "A low power soft error hardened latch with schmitt-trigger-based Celement," *IEICE Trans. on Fundamentals of Electronics Communications and Computer Sciences*, vol. E101-A, no. 7, pp. 1025–1034, Jul. 2018.
- [35] X. She, N. Li, and J. Tong, "SEU tolerant latch based on error detection," *IEEE Trans. on Nuclear Science*, vol. 59, no. 1, pp. 211–214, Feb. 2012.
- [36] J. Wang and S. Wei, "Process/voltage/temperature-variation-aware design and comparative study of transition-detector-based error-detecting latches for timingerror-resilient pipelined systems," *IEEE Trans. on Very Large Scale Integr. Syst.*, vol. 25, no. 10, pp. 2893–2906, Oct. 2017.
- [37] S. Das, C. Tokunaga, S. Pant, W. Ma, S. Kalaiselvan, K. Lai, D. M. Bull, and D. T. Blaauw, "RazorII: In situ error detection and correction for PVT and SER tolerance," *IEEE Int. Solid-State Circuits Conf.*, vol. 44, no. 1, pp. 32–48, Jan. 2009.
- [38] M. Omana, D. Rossi, and C. Metra, "Novel transient fault hardened static latch," *IEEE Int. Test Conf.*, pp. 886–892, Oct. 2003.
- [39] M. Fazeli, A. Patooghy, S. G. Miremadi, and A. Ejlali, "Feedback redundancy: A power-aware efficient SEU-tolerant latch design for deep sub-micron technologies," *IEEE/FIP Int. Conf. on Dependable Syst. Networks*, pp. 276–285, Jul. 2007.
- [40] M. Omana, D. Rossi, and C. Metra, "High-performance robust latches," *IEEE Trans. on Computers*, vol. 59, no. 11, pp. 1455–1465, Jan. 2010.
- [41] A. Yan, H. Liang, Y. Lu, and Z. Huang, "A transient pulse dually filterable and online self-recoverable latch," *IEICE Electronics Express*, vol. 14, no. 2, pp. 20160911–20160911, Jul. 2017.

- [42] H. Li, L. Xiao, J. Li, and X. Cao, "High robust and low cost soft error hardened latch design for nanoscale CMOS technology," *IEEE Int. Conf. on Solid-State Integr. Circuit Technology*, pp. 1–3, Oct. 2018.
- [43] K. Namba, M. Sakata, and H. Ito, "Single event induced double node upset tolerant latch," *IEEE Int. Symp. on Defect and Fault Tolerance of VLSI Syst.*, pp. 280–288, Oct. 2010.
- [44] A. Watkins and S. Tragoudas, "Radiation hardened latch designs for double and triple node upsets," *IEEE Trans. on Emerging Topics in Computing*, p. 1, Nov. 2017.
- [45] B. I. Matush, T. J. Mozdzen, L. T. Clark, and J. E. Knudsen, "Area efficient temporally hardened by design flip-flop circuits," *IEEE Trans. on Nuclear Science*, vol. 57, no. 6, pp. 3588–3595, Dec. 2010.
- [46] V. Chandra and R. Aitken, "Impact of technology and voltage scaling on the soft error susceptibility in nanoscale CMOS," *IEEE Int. Symp. on Defect and Fault Tolerance of VLSI Syst.*, pp. 114–122, Oct. 2008.
- [47] W. Zhao and Y. Cao, "Predictive technology model for nano-CMOS design exploration," Int. Conf. on Nano-Networks and Workshops, Sep. 2006. [Online]. Available: http://doi.acm.org/10.1145/1229175.1229176

# **Publication List**

#### 論文 (学術誌原著論文)

- S. Tajima, M. Yanagisawa and Y. Shi, "Transition detector-based radiationhardened latch for both single- and multiple-node upsets," IEEE Trans. on Circuits and Syst. II: Express Briefs, pp. 1-5, Jul. 2019. (Early Access) DOI: 10.1109/TCSII.2019.2926498.
- S. Tajima, N. Togawa, M. Yanagisawa and Y. Shi, "A low power soft error hardened latch with schmitt-trigger-based C-element", IEICE Trans. on Fundamentals, vol. E101-A, no. 7, pp. 1025-1034, Jul. 2018. DOI: 10.1587/transfun.E101.A.1025.

#### 国際学会 (招待講演・口頭発表)

- S. Tajima, M. Yanagisawa and Y. Shi, "A power-efficient soft error hardened latch design with in-situ error detection capability," IEEE Asia Pacific Conf. on Postgraduate Research in Microelectronics and Electronics, pp. 53-56, Nov. 2019.
- S. Tajima, N. Togawa, M. Yanagisawa and Y. Shi, "Soft error tolerant latch designs with low power consumption", IEEE Int. Conf. on ASIC, pp. 52-55, Oct. 2017.

1. <u>田島咲季</u>, 戸川望, 柳澤政生, 史又華, "C-element を用いたソフトエラー耐性をも つ SHC ラッチの設計,"電子情報通信学会 第 30 回回路とシステムワークショッ プ, pp. 214-219, 2017 年 5 月.

#### 業績賞等

- 1. 2018年度 第34回電気通信普及財団賞 テレコムシステム技術学生賞
- 2. 2018 年度 第8回 VDEC アイデアコンテスト部門 デザインアワード
- 3. 2018 年度 電子情報通信学会第 30 回回路とシステムワークショップ 奨励賞 研究費・助成金
  - 1. 2018 年度 早大理工総研-キオクシア (旧・東芝メモリ) 若手奨励研究
  - 2. 2018 年度 理工学術院総合研究所 アーリーバードプログラム