- [4] Y. Wang, H. Ahn, U. Bhattacharya, T. Coan, F. Hamzaoglu, W. Hafez, C.-H. Jan, R. Kolar, S. Kulkarni, J. Lin, Y. Ng, I. Post, L. Wel, Y. Zhang, K. Zhang, and M. Bohr, "A 1.1 GHz 12 μA/Mb-leakage SRAM design in 65 nm ultra-low-power CMOS with integrated leakage reduction for mobile applications," in *Proc. ISSCC*, Feb. 2007, pp. 324–606.
- [5] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu, "New paradigm of predictive mosfet and interconnect modeling for early circuit simulation," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, 2000, pp. 201–204.
- [6] J. Wang and B. Calhoun, "Canary replica feedback for near-DRV standby  $V_{\rm DD}$  scaling in a 90 nm SRAM," in *Proc. CICC*, 2007, pp. 29–32.
- [7] J. Wang and B. H. Calhoun, "Techniques to extend canary-based standby  $V_{\rm DD}$  scaling for SRAMs to 45 nm and beyond," *IEEE J. Solid-State Circuits*, vol. 43, no. 11, pp. 2514–2523, Nov. 2008.
- [8] H. Qin, A. Kumar, K. Ramchandran, J. Rabaey, and P. Ishwar, "Errortolerant SRAM design for ultra-low power standby operation," in *Proc. ISQED*, 2008, pp. 30–34.
- [9] J. Lohstroh, "Static and dynamic noise margins of logic circuits," *IEEE J. Solid-State Circuits*, vol. 14, no. 3, pp. 591–598, Mar. 1979.

# Design of Sequential Elements for Low Power Clocking System

Peiyi Zhao, Jason McNeely, Weidong Kuang, Nan Wang, and Zhongfeng Wang

Abstract—Power consumption is a major bottleneck of system performance and is listed as one of the top three challenges in International Technology Roadmap for Semiconductor 2008. In practice, a large portion of the on chip power is consumed by the clock system which is made of the clock distribution network and flop-flops. In this paper, various design techniques for a low power clocking system are surveyed. Among them is an effective way to reduce capacity of the clock load by minimizing number of clocked transistors. To approach this, we propose a novel clocked pair shared flip-flop which reduces the number of local clocked transistors by approximately 40%. A 24% reduction of clock driving power is achieved. In addition, low swing and double edge clocking, can be easily incorporated into the new flip-flop to build clocking systems.

Index Terms—Flip-flop, low power.

#### I. INTRODUCTION

The SYSTEM-ON-CHIP (SoC) design is integrating hundreds of millions of transistors on one chip, whereas packaging and cooling only have a limited ability to remove the excess heat. All of these results in power consumption being the bottleneck in achieving high performance and it is listed as one of the top three challenges in

Manuscript received July 25, 2009; revised November 24, 2009. First published January 19, 2010; current version published April 27, 2011.

P. Zhao is with the Integrated Circuit Design and Embedded System Laboratory, School of Computational Science, Schmid College of Science, Chapman University, Orange, CA 92604 USA (e-mail: zhao@chapman.edu).

J. McNeely is with The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504 USA.

N. Wang was with The Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504 USA. He is now with Department of Electrical and Computer Engineering, West Virginia University, Montgomerv, WV 25136 USA.

W. D. Kuang is with the Department of Electrical Engineering, Pan American University, Edinburg, TX 78539 USA.

Z. Wang is with Broadcom Corporation, Irvine, CA 92602 USA (e-mail: zfwang@broadcom.com).

Digital Object Identifier 10.1109/TVLSI.2009.2038705

ITRS 2008. The clock system, which consists of the clock distribution network and sequential elements (flip-flops and latches), is one of the most power consuming components in a VLSI system [1], [2]. It accounts for 30% to 60% of the total power dissipation in a system [1]. As a result, reducing the power consumed by flip-flops will have a deep impact on the total power consumed. A large portion of the on chip power is consumed by the clock drivers. Caution must be paid to reduce clock load when designing a clocking system.

There is a wide selection of flip-flops in the literature [1]–[18]. Many contemporary microprocessors selectively use master-slave and pulsed-triggered flip-flops [2]. Traditional master-slave single-edge flip-flops, for example, transmission gated flip-flop [3], are made up of two stages, one master and one slave. Another edge-triggered flip-flop is the sense amplifier-based flip-flop (SAFF) [4]. All of these hard edged-flip-flops are characterized by a positive setup time, causing large D-to-Q delays. Alternatively, pulse-triggered flip-flops reduce the two stages into one stage and are characterized by the soft edge property. 95% of all static timing latching on the Itanium 2 processor use pulsed clocking [5]. Pulse triggered flip-flops could be classified into two types, implicit-pulsed and explicit-pulsed, for example, the implicit pulse-triggered data-close-to-output flip-flops (ip-DCO) [6] and the explicit pulse-triggered data-close-to-output flip-flops (ep-DCO) [6].

This paper surveys various low power techniques for the clocking system in Section II. After that we elaborate on the reduction of clock capacity to achieve low power in Section III, then we propose a novel clocked pair shared flip- flop in Section IV. Section V presents simulation results. Section VI concludes this paper.

## II. SURVEY OF LOW POWER DESIGN OF A CLOCKING SYSTEM

Power consumption is determined by several factors including frequency f, supply voltage V, data activity  $\alpha$ , capacitance C, leakage, and short circuit current

$$P = P_{\rm dynamic} + P_{\rm short\ circuit} + P_{\rm leakage}.$$
 (1)

In the above equation, dynamic power  $P_{dynamic}$  is also called the switching power,  $P_{dynamic} = \alpha CV^2 f$ .

 $P_{\rm short\ circuit}$  is the short circuit power which is caused by the finite rise and fall time of input signals, resulting in both the pull up network and pull down network to be ON for a short while.  $P_{\rm short\ circuit} = I_{\rm short\ circuit} V dd$ .

 $P_{\text{leakage}}$  is the leakage power. With supply voltage scaling down, the threshold voltage also decreases to maintain performance. However, this leads to the exponential growth of the subthreshold leakage current. Subthreshold leakage is the dominant leakage now.  $P_{\text{leakage}} = I_{\text{leakage}} V dd$ .

Based on these factors, there are various ways to lower the power consumption shown as follows.

- Double Edge Triggering: Using half frequency on the clock distribution network will save approximately half of the power consumption on the clock distribution network. However the flip-flop must be able to be double clock edge triggered. For example, the clock branch shared implicit pulsed flip-flop [7] (CBS-ip DEFF), is a double edge triggered flip-flop. Double clock edge triggering method reduces the power by decreasing frequency *f* in equation.
- 2) Using a low swing voltage on the clock distribution network can reduce the clocking power consumption since power is a quadratic function of voltage. To use low swing clock distribution, the flip-flop should be a low swing flip- flop. For example, low swing double-edge flip-flop (LSDFF) [8] is a low swing flip-flop. In addition, the level converter flip-flop is a natural candidate to

be used in low swing environment too. For example, CD-LCFF-ip [9], could be used as a low swing flip-flop since incoming signals only drive nMOS transistors. The low swing method reduces the power consumption by decreasing voltage in equation.

- There are two ways to reduce the switching activity: conditional operation (eliminate redundant data switching: conditional discharge flip-flop (CDFF) [10], conditional capture flip-flop (CCFF) [11]) or clock gating.
  - a) Conditional Operation.

For dynamic flip-flops, like hybrid latch flip-flop (HLFF) [12], semidynamic flip-flop (SDFF) [13], there are redundant switching activities in the internal node. When input stays at logic one, the internal node is kept charging and discharging without performing any useful computation. The conditional operation technique is needed to control the redundant switching. For example, in CDFF, a feedback transistor is inserted on the discharging path of 1st stage which will turn off the discharging path when D keeps 1. Internal node will not be kept discharging at every clock cycle. In CCFF, it uses a clocked NOR gate to control an nMOS transistor in discharging path when Q keeps 1. The redundant switching activity is removed in both cases. This reduces the power consumption by decreasing data activity in the equation.

b) Clock Gating.

When a certain block is idle, we can disable the clock signal to that block to save power. Gated master slave flip-flop was proposed in [14]. Both conditional operation and clock gating methods reduce power by decreasing switching activity.

- 4) Using Dual Vt/MTCMOS to reduce the leakage power in standby mode. With shrinking feature size, the leakage current increases rapidly, the MTMOS technique [15] as well as transistor stacking, dynamic body biasing, and supply voltage ramping could be used to reduce leakage standby power consumption [16]. A data retention flip-flop is proposed in [17].
- Reducing Short Current Power: split path can reduce the short current power, since pMOS and nMOS are driven by separate signals.
- 6) Reducing Capacity of Clock Load: 80% of nonclocked nodes have switching activity less than 0.1. This means reducing power of clocked nodes is important since clocked node has 100% activity. One effective way of low power design for clocking system is to reduce clock capacity load by minimizing number of clocked transistor. Any local clock load reduction will also decrease the global power consumption. This method reduces power by decreasing clock capacity in equation. We will elaborate more in Section III.

# III. REDUCING CLOCK CAPACITY BY MIMIMIZING THE NUMBER OF CLOCKED TRANSISTORS

A large part of the on-chip power is consumed by the clock drivers [18]. It is desirable to have less clocked load in the system. CDFF and CCFF in Section II both have many clocked transistors. For example, CCFF used 14 clocked transistors, and CDFF used 15 clocked transistors. In contrast, conditional data mapping flip-flop (CDMFF, Fig. 1) [19] used only seven clocked transistors, resulting in about 50% reduction in the number of clocked transistors, hence CDMFF used less power than CCFF and CDFF. (Note that CDFF used double edge clocking. For simplicity purposes, we did not include the power savings by double edge triggering on the clock distribution network.) This shows the effectiveness of reducing clocked transistor numbers to achieve low power. Since CDMFF outperforms CCFF and CDFF in view of power consumption [19], we do not discuss CCFF or CDFF further in this paper.



Fig. 1. CDMFF.

However, there is redundant clocking capacitance in CDMFF. When data remains 0 or 1, the precharging transistors, P1 and P2, keep switching without useful computation, resulting in redundant clocking. Clearly, it is necessary to reduce redundant power consumption here. Further, CDMFF has a floating node on critical path because its first stage is dynamic. When clock signal CLK transits from 0 to 1, CLKDB will stay 1 for a short while which produces an implicit pulse window for evaluation. During that window, both P1, P2 are off. In addition, if D transits from 0 to 1, the pull down network will be disconnected by N3 using data mapping scheme (N6 turns off N3); If D is 0, the pull down network is disconnected from GND too. Hence internal node X is not connected with Vdd or GND during most pulse windows, it is essentially floating periodically. With feature size shrinking, dynamic node is more prone to noise interruption because of the undriven dynamic node. If a nearby noise discharges the node X, pMOS transistor P3 will be partially on, and a glitch will appear on output node Q. In a nanoscale circuit, a glitch not only consumes power but could propagate to the next stage which makes the system more vulnerable to noise. Hence, CDMFF could not be used in noise intensive environment. Unlike CDMFF, other dynamic flip-flops employ structure to prevent the floating point. For example, SDFF [13] has a keeper at node X while HLFF [12], and CCFF [11] have a transistor connecting to Vdd when D = 0, respectively. Both methods serve to increase noise robustness of node X.

Finally it is difficult to apply the low power techniques introduced in previous section to CDMFF. For example, the clock structure with precharging transistors P1, P2 in CDMFF makes it difficult to apply double edge triggering. Nor can CDMFF be used in a low swing clock environment. (Note that the incoming low swing clock signal cannot drive pMOS, P1 and P2, in high voltage block (VDDH), because the pMOS transistors will not turn off by a low swing voltage, resulting in short circuit power consumption.)

## IV. PROPOSED CLOCKED-PAIR-SHARED IMPLICIT PULSED FLIP FLOP

CDFF and CCFF use many clocked transistors. CDMFF reduces the number of clocked transistors but it has redundant clocking as well as



Fig. 2. Proposed clocked-pair shared flip-flop.

a floating node. To ensure efficient and robust implementation of low power sequential element, we propose Clocked Pair Shared flip-flop (CPSFF, Fig. 2) to use less clocked transistor than CDMFF and to overcome the floating problem in CDMFF.

In the clocked-pair-shared flip-flop, clocked pair (N3, N4) is shared by first and second stage. An always on pMOS, P1, is used to charge the internal node X rather than using the two clocked precharging transistors (P1, P2) in CDMFF. Comparing with CDMFF, a total of three clocked transistors are reduced, such that the clock load seen by the clock driver is decreased, resulting in an efficient design. Further the transistor N7 in the clocked inverter in CDMFF is removed. CPSFF uses four clocked transistors rather than seven clocked transistors in CDMFF, resulting in approximately 40% reduction in number of clocked transistors.

Furthermore the internal node X is connected to Vdd by an always on P1, so X is not floating, resulting in enhancement of noise robustness of node X. This solves the floating point problem in CDMFF. The always ON P1 is a weak pMOS transistor (length =  $3\lambda$ ). This scheme combines pseudo nMOS [16] with a conditional mapping technique [19] where a feedback signal, *comp*, controls nMOS N1. When input D stays 1, Q = 1, N5 is on, N1 will shut off to avoid the redundant switching activity at node X as well as any short circuit current. pMOS P2 should pull Q up when D transits to 1. The second nMOS branch (N2) is responsible for pulling down the output of Q if D = 0 and Y = 1 when the clock pulse arrives. pMOS in I1 should turn on nMOS N2 when D = 0.

Although P1 is always ON, short circuit only occurs one time when D makes a transition of  $0 \rightarrow 1$ , and the discharge path is disconnected after two gates delay by comp (turning off N1). After that, if D remains at 1, the discharge path is already disconnected by N1; there would be no short circuit. The clocked-pseudo-nMOS scheme is different from the general idea of conventional pseudo-nMOS logic in that we use clocked transistors in the pull down branch. P1, N1, N3, and N4 should be properly sized to ensure a correct noise margin [20].

Several low power techniques in Section II can be easily incorporated into the new flip-flop. Unlike CDMFF, low swing is possible for CPSFF since incoming low voltage clock does not drive pMOS transistors. Low swing voltage clock signals could be connected to the nMOS transistors N3 and N4, respectively. In addition, it is easy to



Fig. 3. Setup used for the flip-flops simulations. Inputs are driven by the inverters, and the output is driving a capacity load of 14 minimum inverters (FO14).

build double edge triggering flip-flop based on the simple clocking structure in CPSFF. Further CPSFF could be used as a level converter flip-flop automatically, because incoming clock and data signals only drive nMOS transistors.

### V. SIMULATION RESULTS

The simulation results were obtained from HSPICE simulations in 0.18- $\mu$ m CMOS technology at room temperature. VDD is 1.8 V. The parasitic capacitances were extracted from the layouts. The setup used in our simulations is shown in Fig. 3. In order to obtain accurate results, we have simulated the circuits in a real environment, where the flip-flop inputs (clock, data) are driven by the input buffers, and the output is required to drive an output load. An inverter is placed after output Q, providing protection from direct noise coupling [6]. The value of the capacitance load at node Qb is 21 fF, which is selected to simulate a fan out of 14 minimum sized inverters (FO14) [21]. Assuming uniform data distribution, we have supplied input D with 16-cycle pseudorandom input data with an activity factor of 18.75% to reflect the average power consumption. A clock frequency of 250 MHz is used.

Each design is simulated using the circuit at the layout level. All capacitances were extracted from layout such that we can simulate the circuit more accurately. This is because the internal gate capacitance, parasitic capacitance, and wiring capacitance affect the power consumption heavily in deep submicrometer technology. Further the delay strongly depends on these capacitors.

Circuits were optimized for power delay product (PDP). Delay is data to output delay (D-to-Q delay) which is the sum of the setup time and the clock to the output delay. The D-to-Q delay [22], [23] is obtained by sweeping the  $0 \rightarrow 1$  and  $1 \rightarrow 0$  data transition times with respect to the clock edge and the minimum data-to-output delay corresponding to optimum set up time is recorded. This optimization methodology is similar to that in [6], [22] . Transistor width is shown in Figs. 1 and 2, respectively.

Table I shows a comparison of the flip-flop characteristics in terms of delay, total power and PDP as well as clock power, data driving power, latching power, number of transistors, number of clocked transistors, area, and the total transistor width.

Power consumed in the data and clock drivers are measured in our simulation. In this way, the load seen by driving logic imposed by the flip-flop is included in total power consumption. The clock power is the power consumed by the clocked transistors. It is a very important parameter since it determines potential power saving in the clock distribution network by reducing the clock load [22]. Fig. 4 shows the power break down chart. CPSFF uses three less clocked transistors, which leads to about 40% reduction in number of clocked transistor. It achieves 24% less clock driving power than CDMFF, which improves power efficiency considerably. CPSFF improves overall power consumption over CDMFF about 9%.

CPSFF and CDMFF are simulated through different design corners, and CPSFF shows lower power consumption in all four corners, as shown in Fig. 5. Furthermore, Fig. 6 shows the power consumption

| Design<br>Name | #<br>of<br>tr | 100%<br>switching<br>activity tr<br>*1 | Area $\lambda^2$ | Total<br>transistor<br>width<br>(um) | Low<br>Swing | Double<br>Edge | DQ<br>(ps)<br>* <sup>2</sup> | Clock<br>power | Data<br>driving<br>power | Latching<br>power | Total<br>Power<br>(uw)<br>* <sup>3</sup> | PDP<br>(fJ) |
|----------------|---------------|----------------------------------------|------------------|--------------------------------------|--------------|----------------|------------------------------|----------------|--------------------------|-------------------|------------------------------------------|-------------|
| CDMFF          | 22            | 7                                      | 23407            | 23.2                                 | N            | difficult      | 387                          | 6.25           | 0.42                     | 5.29              | 11.98                                    | 4.63        |
| CPS FF         | 19            | 4                                      | 23144            | 21.1                                 | Y            | easy           | 392                          | 4.74           | 0.47                     | 5.72              | 10.9                                     | 4.28        |

 TABLE I

 COMPARING THE FLIP-FLOP IN TERMS OF DELAY, POWER, AND POWER DELAY PRODUCT

\*<sup>1</sup> Includes clocked transistors that switch with the clock

\*<sup>2</sup> Delay uses DQb

\*<sup>3</sup> Note that total power= clock power+ data driving power+ latching power.



Fig. 4. Power break down.



Fig. 5. Power consumption at process corners.



Fig. 6. Power consumption under different switching activity.

comparison under different switching activities. The power improvement of CPSFF over CDMFF is larger when the switching activity is smaller. In view of the clocking load in the latch, the proposed clocked-pair shared flip-flop is more efficient than other designs like the CCFF, CDFF, CDMFF, etc. It uses the least number of clocked transistors in a flip-flop in published papers so far.

In terms of PDP, more than 7.6% improvement is achieved. Note that CPSFF has a slightly larger delay than CDMFF. Though there is contention between always on P1 and pull down path in the first stage, its negative effect on speed is alleviated by the reduction of capacitor load on internal node X, where two precharging clocked transistors are removed. One thing to note is that pulsed flip-flops might need a larger hold time than conventional flip-flops.

CDMFF suffers from the periodically floating point problem if it is in a noise sensitive environment. Clocked pair shared scheme resolves this issue effectively.

With feature size shrinking, the leakage current increases rapidly, and the MTMOS technique could be used to reduce the leakage power consumption [15]. In addition, with technology scaling, process variation tolerant techniques like combinations of adaptive body bias and adaptive VDD may be used to improve functionality and performance of the die [24]. As CMOS technology continues scaling, integrated circuits are more susceptible to soft errors, soft-error-tolerant techniques could be used [25].

# VI. CONCLUSION

In this paper, a variety of design techniques for low power clocking system are reviewed. One effective method, reducing capacity of the clock load by minimizing number of clocked transistor, is elaborated. Following the approach, one novel CPSFF is proposed, which reduces local clock transistor number by about 40%. In view of power consumption of clock driver, the new CPSFF outperforms prior arts in flip-flop design by about 24%. Furthermore, several low power techniques, including low swing and double edge clocking, can be explored to incorporate into the new flip-flop to build clocking systems.

## ACKNOWLEDGMENT

P. Zhao would like to thank Mr. J. Tschanz from Intel for his valuable technical help and also Dr. M. Fahy for his help.

#### REFERENCES

- H. Kawaguchi and T. Sakurai, "A reduced clock-swing flip-flop (RCSFF) for 63% power reduction," *IEEE J. Solid-State Circuits*, vol. 33, no. 5, pp. 807–811, May 1998.
- [2] A. Chandrakasan, W. Bowhill, and F. Fox, *Design of High-Perfor*mance Microprocessor Circuits, 1st ed. Piscataway, NJ: IEEE Press, 2001.
- [3] G. Gerosa, "A 2.2 W, 80 MHz superscalar RISC microprocessor," *IEEE J. Solid-State Circuits*, vol. 29, no. 12, pp. 1440–1454, Dec. 1994.
- [4] B. Nikolic, V. G. Oklobzija, V. Stojanovic, W. Jia, J. K. Chiu, and M. M. Leung, "Improved sense-amplifier-based flip-flop: Design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876–883, Jun. 2000.

- [5] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Sullivan, and T. Grutkowski, "The implementation of the Itanium 2 microprocessor," *IEEE J. Solid-State Circuits*, vol. 37, no. 11, pp. 1448–1460, Nov. 2002.
- [6] J. Tschanz, S. Narendra, Z. P. Chen, S. Borkar, M. Sachdev, and V. De, "Comparative delay and energy of single edge-triggered & dual edgetriggered pulsed flip-flops for high-performance microprocessors," in *Proc. ISPLED*, Huntington Beach, CA, Aug. 2001, pp. 207–212.
- [7] P. Zhao, J. McNeely, P. Golconda, M. A. Bayoumi, W. D. Kuang, and B. Barcenas, "Low power clock branch sharing double-edge triggered flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 3, pp. 338–345, Mar. 2007.
- [8] C. L. Kim and S. Kang, "A low-swing clock double edge-triggered flip-flop," *IEEE J. Solid-State Circuits*, vol. 37, no. 5, pp. 648–652, May 2002.
- [9] P. Zhao, J. McNeely, S. Venigalla, G. P. Kumar, M. Bayoumi, N. Wang, and L. Downey, "Clocked-pseudo-NMOS flip-flops for level conversion in dual supply systems," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., to be published.
- [10] P. Zhao, T. Darwish, and M. Bayoumi, "High-performance and lowpower conditional discharge flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 5, pp. 477–484, May 2004.
- [11] B. Kong, S. Kim, and Y. Jun, "Conditional-capture flip-flop for statistical power reduction," *IEEE J. Solid-State Circuits*, vol. 36, no. 8, pp. 1263–1271, Aug. 2001.
- [12] H. Partovi, R. Burd, U. Salim, F. Weber, L. DiGregorio, and D. Draper, "Flow-through latch and edge-triggered flip-flop hybrid elements," in *ISSCC Dig.*, Feb. 1996, pp. 138–139.
- [13] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A. Mehta, R. Heald, and G. Yee, "Semi-dynamic and dynamic flip-flops with embedded logic," in *Symp. VLSI Circuits, Dig. Tech. Papers*, Jun. 1998, pp. 108–109.
- [14] D. Markovic, B. Nikolic, and R. Brodersen, "Analysis and design of low-energy flip-flops," in *Proc. Int. Symp. Low Power Electron. Des.*, Huntington Beach, CA, Aug. 2001, pp. 52–55.
- [15] J. Tschanz, Y. Ye, L. Wei, V. Govindarajulu, N. Borkar, S. Burns, T. Karnik, S. Borkar, and V. De, "Design optimizations of a high performance microprocessor using combinations of dual-Vt allocation and transistor sizing," in *IEEE Symp. VLSI Circuits, Dig. Tech. Papers*, Jun. 2002, pp. 218–219.
- [16] J. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*. Englewood Cliffs, NJ: Prentice-Hall, 2003.
- [17] Shigematsu, S. Mutoh, Y. Matsuya, Y. Tanabe, and J. Yamada, "A 1-V high-speed MTCMOS circuit scheme for power-down application circuits," *IEEE J. Solid-State Circuits*, vol. 32, no. 6, pp. 861–869, Jun. 1997.
- [18] T. Sakurai, "Low –power CMOS design through Vth control and lowswing circuits," in *Proc. ISLPED*, 1997, pp. 1–6.
- [19] C. K. Teh, M. Hamada, T. Fujita, H. Hara, N. Ikumi, and Y. Oowaki, "Conditional data mapping flip-flops for low-power and high-performance systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 14, no. 12, pp. 1379–1383, Dec. 2006.
- [20] D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design of Digital Integrated Circuits, 3rd ed. New York: McGraw-Hill, 2004.
- [21] V. G. Oklobdzija, "Clocking in multi-GHz environment," in *Proc. 23rd IEEE Int. Conf. Microelectron.*, 2002, vol. 2, pp. 561–568.
  [22] V. Stojanovic and V. Oklobdzija, "Comparative analysis of
- [22] V. Stojanovic and V. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low power system," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, Apr. 1999.
- [23] N. Weste and D. Harris, CMOS VLSI Design. Reading, MA: Addison Wesley, 2004.
- [24] J. Tschanz, K. Bowman, and V. De, "Variation-tolerant circuits: Circuits solutions and techniques," in *Proc. IEEE Symp. Des. Autom. Conf.*, Jun. 2005, pp. 762–763.
- [25] S. Lin, H. Z. Yang, and R. Luo, "High speed soft-error-tolerant latch and flip-flop design for multiple VDD circuit," in *Proc. IEEE Int. Comput. Soc. Annu. Symp. VLSI (ISVLSI)*, Mar. 2007, pp. 273–278.

# Placement for Immunity of Transient Faults in Cell-Based Design of Nanometer Circuits

# Koustav Bhattacharya and Nagarajan Ranganathan

Abstract—The rate of soft errors have been significantly increasing due to the aggressive scaling trends in the nanometer regime. Several circuit optimization techniques have been proposed in literature for preventing such transient faults, however, to the best of our knowledge, the reduction of soft error rate at the layout level has not been attempted in logic circuits. In this work, we show that transient glitches due to cosmic strikes can be sufficiently reduced by intelligently modifying the placement stage in cell based designs to selectively assign larger wirelengths to certain critical nets. Towards this, we propose a computationally efficient placement algorithm based on quadratic programming that significantly reduces the soft error rates of logic circuits. The algorithm tries to assign higher wirelengths for nets with low glitch masking probabilities for higher reduction in soft error rates (SER), while maintaining low delay and area penalty for the overall circuit. Experimental results on the ISCAS'85 benchmark circuits indicate that such a placement algorithm can significantly improve the soft error immunity in logic circuits without much delay and area overheads.

*Index Terms*—Cell placement, quadratic programming, soft errors, transient faults.

## I. INTRODUCTION

Aggressive scaling trends have significantly impacted the susceptibility of nanometer designs to transient faults. Transient faults occur due to several reasons, such as soft errors, power supply and interconnect noise, and electromagnetic interference. Soft errors occur when the energetic neutrons coming from space or the alpha particles arising out of packaging materials hit the transistors. The primary sources of soft errors are: 1) alpha particle emission from chip packaging materials; 2) cosmic rays from outer space creating energetic neutrons and protons; and 3) due to generation of thermal neutrons. The magnitude of the generated glitch due to a radiation strike is determined by the impinging neutron flux, its duration and its density. The neutron flux is dependent upon the altitude and on various other environmental factors. A voltage glitch appears at a circuit node due to the strikes of energetic neutrons. Such a glitch is referred to as a single event transient (SET). The voltage glitch, if it appears on a feedback node, may change the states of the memory bits. The glitch may also appear in the internal nodes of a combinational logic and may finally propagate to the register boundaries and create a soft error. Although, soft errors have been a greater concern for memory elements, technology trends like smaller feature sizes, lower voltage levels, higher operating frequency and reduced logic depth, are projected to increase the soft-error rate (SER) in combinational logic beyond that of unprotected memory elements [5], [8]. Several approaches have been proposed in the literature to protect logic circuits against soft errors [3], [6], [8], [10]. However, to the best of our knowledge, the reduction of soft error rate at the layout level has never been attempted previously. Towards this, we have developed a new placement algorithm for radiation immunity of logic circuits using a standard cell-based design flow.

Manuscript received May 17, 2009; revised September 17, 2009 and November 30, 2009. First published February 22, 2010; current version published April 27, 2011. This work was supported in part by a grant from the Semiconductor Research Corporation (SRC) under Contract 2007-HJ-1596.

The authors are with the Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620 USA (e-mail: kbhattac@cse.usf. edu; ranganat@cse.usf.edu).

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TVLSI.2010.2040295