# Low-Power Clocked-Pseudo-NMOS Flip-Flop for Level Conversion in Dual Supply Systems

Peiyi Zhao, *Member, IEEE*, Jason B. McNeely, *Student Member, IEEE*, Pradeep K. Golconda, Soujanya Venigalla, Nan Wang, Magdy A. Bayoumi, *Fellow, IEEE*, Weidong Kuang, and Luke Downey, *Student Member, IEEE* 

*Abstract*—Clustered voltage scaling (CVS) is an effective way to decrease power dissipation. One of the design challenges is the design of an efficient level converter with fewer power and delay overheads. In this paper, level-shifting flip-flop topologies are investigated. Different level-shifting schemes are analyzed and classified into groups: differential style, n-type metal–oxide–semiconductor (NMOS) pass-transistor style, and precharged style. An efficient level-shifting scheme, the clocked-pseudo-NMOS (CPN) level conversion scheme, is presented. One novel level conversion flip-flop (CPN-LCFF) is proposed, which combines the conditional discharge technique and pseudo-NMOS technique. In view of power and delay, the new CPN-LCFF outperforms previous LCFF by over 8% and 15.6%, respectively.

Index Terms-Dual supply, flip-flop, level conversion, low power.

#### I. INTRODUCTION

► HE system-on-chip (SoC) design will integrate hundreds of millions of transistors on one chip, whereas packaging and cooling only have a limited ability to remove the excess heat. All of these result in power consumption being one of the main problems in achieving high-performance design. Due to quadratic relations between voltage and power consumption, reducing the supply voltage is very efficient in decreasing power dissipation. A clustered voltage scaling (CVS) scheme has been developed in [1]. In the CVS scheme, by using low supply voltage (VDDL) in noncritical paths, i.e., placing speed insensitive gates with supply voltage VDDL, and using high supply voltage (VDDH) in speed sensitive paths, the whole system power consumption could be reduced without degrading the performance. To implement CVS scheme in a chip, a level converter must be used when a gate, which is supplied by the low supply voltage VDDL, connects to a gate that is supplied by high supply voltage VDDH. The reason is that the data

Manuscript received September 15, 2007; revised April 11, 2008. First published March 10, 2009; current version published August 19, 2009. This work was supported by the grant from Broadcom, Inc. and Emulex, Inc., the U.S. Department of Energy (DoE), EETAPP program, DE97ER12220, and the Governor's Information Technology Initiative.

P. Zhao and L. Downey are with the Integrated Circuit Design and Embedded System Lab, Math and Computer Science Department, Chapman University, Orange, CA 92604 USA (e-mail:zhao@chapman.edu).

J. B. McNeely and M. A. Bayoumi are with the Center for Advanced Computer Studies, University of Louisiana at Lafayette, Lafayette, LA 70504 USA (e-mail: jbm8240@cacs.louisiana.edu; mab@cacs.louisiana.edu).

N. Wang is with the Department of Electrical and Computer Engineering, Institute of Technology, West Virginia University, Montgomery, VW 25136 USA.

P. K. Golconda and S. Venigalla are with Intel Corporation, Folsom, CA 95630 USA.

W. Kuang is with the Department of Electrical Engineering, Pan American University, Edinburg, TX 78539 USA.

Digital Object Identifier 10.1109/TVLSI.2008.2002426

or clock from a low supply voltage block cannot connect to a p-type metal-oxide-semiconductor (PMOS) in a VDDH block directly, since the PMOS cannot be shut off with low supply voltage VDDL. Notice that the dual Vdd system has an overhead, being that it needs an extra power supply line for VDDL causing an area overhead in addition to the level converter's delay and power penalty. One of the main challenges in the CVS system is to design level converters with less power and latency overhead [2] to interface low-voltage blocks with high-voltage blocks.

Different level converters have been published [3], [4]. To alleviate the delay overhead of the inserted level converter, integrating the level conversion in the flip-flop is a good choice, which results in the level conversion flip-flop (LCFF). LCFF designs appeared in [1], [5]–[8]. This paper surveys various level-shifting schemes in LCFFs and classifies them into three types: differential level-shifting scheme style, n-type metal–oxide–semiconductor (NMOS) pass-transistor style, and precharged style. We also propose a novel LCFF design with lower power consumption overhead. This paper is organized as follows. Section II reviews the published LCFFs. Section III introduces the proposed level-shifting scheme with the new CPN-LCFF. Section IV shows the simulation result and Section V concludes the paper.

# II. LCFF SURVEY

## A. Differential Level-Shifting Scheme

One type of LCFF uses a differential level conversion structure, where the inputs in differential cascade voltage switch logic (DCVSL) circuits [9] do not connect to PMOS at all, as can be seen in Fig. 1(a) where the low-voltage inputs do not drive PMOS P1, P2.

One master–slave LCFF, slave latch level-shifting (SLLS) flip-flop [Fig. 1(b)], was proposed in [1]. (Devices and signals in dotted line boxes are using VDDL; the same as in other figures.) SLLS uses a differential cascade voltage switch logic (DCVSL) style level conversion scheme in the slave latch. The clock signal and the data signal in the dotted line use low-voltage VDDL, and do not connect to the PMOS directly, which makes it suitable as a level converter.

However the SLLS flip-flop has drawbacks. There is a relatively large crossover current in the internal nodes, causing large delay and power consumption [10]. The contention is aggravated when the voltages of the clock and input are low swing. The low voltage reduces the NMOS transistor's driving ability



Fig. 1. (a) DCVSL level conversion scheme. (b) SLLS (devices in dotted line boxes use VDDL; the same as in other figures).

to pull down the internal node. The cross couple fighting between the PMOS pull-up devices and the NMOS pull-down devices is aggravated [4], which makes it difficult for this circuit to switch the logic state at transition time, therefore the delay would be larger. Moreover, it has many gates on its critical path.

Another flip-flop using differential level-shifting scheme is the clock level shifted sense amplifier (CSSA) flip-flop [1]. It consists of a sense amplifier latch [11] and set–reset latch. There is large crossover fighting that causes power consumption and delay, particularly when the clock is low swing. Moreover, CSSA uses the dynamic precharge style. If *D* remains stable, one of the internal nodes will be charged/discharged every clock cycle, hence there is an internal redundant switching power consumption, further causing power penalty. An alternative LCFF from [7], pulsed sense amplifier (PSA), used a similar differential level-shifting scheme.

## B. NMOS Pass-Transistor Level-Shifting Scheme

Another level-shifting scheme is called NMOS pass transistor level-shifting scheme (Fig. 2), where one end of the NMOS transistor N1 connects to the low-voltage input signal, and the level shift point "*sf*" is lifted to (VDDL-Vth of the NMOS) through NMOS transistor N1. Keeper I2 will pull "*sf*" up to VDDH. One NMOS transistor N1 and one inverter I2 are used to implement the level shifting.

Pulsed half-latch (PHL) LCFF is proposed in [7] [Fig. 3 (inverters in dark use VDDL; the same as in other figures)], which uses the NMOS pass transistor level-shifting scheme. However, the data driving inverter *I*1 works at low-voltage VDDL. The keeper *I*2, which works at high voltage, fights with *I*1 during



Fig. 2. NMOS level-shifting scheme.



Fig. 3. Pulsed half-latch (PHL).

level shifting, so the keeper I2 cannot be too strong. Notice that there is a threshold voltage dropping due to the pass transistor N1; the voltage at node "sf" must be restored from (VDDL-Vthn) to VDDH when D = 0, where the difference of {VDDH-(VDDL-Vthn)} is a quite large amount. Thus, these factors negatively impact the switching significantly. The two NMOS transistors in serial N2 and N3 must be strong enough to pull Qdown quickly to help to lift "sf," but it takes a two-gate delay to do that.

One master-slave LCFF, master-slave half latch (MSHL), is proposed in [7], where the NMOS pass transistor level-shifting scheme is also used. However, it has the similar drawback of threshold voltage drop on node "sf" that has an impact on the speed considerably. Furthermore, it has one more gate in VDDL, resulting in larger delay than PHL as well as a slightly higher power consumption over PHL [7].

# C. Precharged Level-Shifting Schemes

Unlike the differential or pass-transistor level-shifting scheme, several LCFFs achieve level conversion by precharging the circuit in this scheme [Fig. 4(a)]. The precharging device will keep switching.

An elegant design, the pulse precharged (PPR) LCFF, is proposed in [7] [see Fig. 4(b)]. A low-swing clock signal drives the gate of NMOS transistor N1, which is connected to VDDH; when N1 turns on it lifts the voltage of the node X to (VDDL-Vth of NMOS); PMOS transistor P1 in the clocked keeper will pull the node X up to VDDH. However, the clocked transistors for level-converting N1 and N2 keep switching even when the input D remains stable. Since this portion of power does not contribute to necessary level conversion, it causes redundant power overhead.

Another elegant level-shifting scheme is called selfprecharged level-shifting scheme [6] (see Fig. 5), where



Fig. 4. (a) Precharged level-shifting scheme (precharging device will keep switching at different times). (b) PPR (the total number of the transistor: 31; clocked transistor: 13).



Fig. 5. SPFF (the total number of the transistors: 32; clocked transistors: 11).

the gate of PMOS (P1) connects to a high voltage from a NOR gate and the last two inverters (I1, I2) at the output node are employed to enable the self-precharging level-shifting scheme. However, the last two inverters (I1, I2) cause delay and power consumption overhead [7].

## III. PROPOSED CLOCKED-PSEUDO-NMOS LCFF

The differential level-shifting scheme normally has large delay and power overhead due to crossover contention and SLLS has larger delay and power consumption than PHL. CSSA consumes dynamic power in addition to the crossover contention problem. MSHL has larger PDP than PHL due to the long critical path (five gates) resulting in large delay, and it dissipates more power than PHL [7]. PPR dissipates more energy than PHL, thus losing its advantage in CVS systems [7]. SPFF has an overhead of the last two inverters in the critical path [7], as well as eight more transistors than PHL (a 33%) increase in the number of total transistors), and it consumes more power than PHL [22]. PHL is the most efficient design in view of power consumption among LCFFs including SLLS, CSA, MSHL, PPR, and SPFF. We will not discuss SLLS, CSA, MSHL, PPR, and SPFF further in this paper due to their relatively higher power consumption than PHL.

Balanced reduction of both power and delay of an LCFF is the main method to reach improved power savings in a CVS system. PHL is the best example of this [7] in comparison with other previous designs such as PPR. However, PHL has a threshold drop problem aggravated by the low voltage of the input, and it has an explicit pulse generator, which normally consumes more power. Furthermore, it has four gates on the critical path.

To further attain power improvement, the clocked-pseudo-NMOS (CPN) level-shifting scheme is proposed [Fig. 6(a)]. In this level-shifting scheme, the PMOS (P1) is always ON. This



Fig. 6. (a) Proposed clocked-pseudo-NMOS level-shifting scheme. (b) Proposed clocked pseudo-NMOS level-converting flip-flop (CPN-LCFF).

scheme combines pseudo-NMOS [10] with the conditional discharge technique [13] where a feedback signal  $Q_{-}$ fdbk controls NMOS N5. When input D stays high, N5 will shut off to avoid unnecessary short-circuit current as well as the redundant switching activity at node X. Low-swing signals including input signal (D) and clock signal (CLK\_pulse) are connected to the NMOS transistors N1 and N3, respectively.

A level-converting flip-flop, clocked-pseudo-NMOS level-converting flip-flop (CPN-LCFF), is proposed [Fig. 6(b)].  $Q_{\rm f}$  fdbk is connected to transistor N5 to disconnect discharge path when Q = 1 and  $Q_{\rm f}$  fdbk = 0; the second NMOS branch (N2, N4, N6) is responsible for pulling down the output of Q.

We use a weak pull-up PMOS device P1 (length L = 5) to precharge the internal node X rather than using the clocked precharge device in PPR. Although P1 is always ON, short circuit only occurs one time when D makes a transition of 0->1, and the discharge path is disconnected after a two-gates delay by Q fdbk (turning off N5). After that, if D remains at 1, the discharge path is already disconnected by N5 and there will be no short circuit. This pseudo-NMOS technique is also used in [14].

P1, N1, N3, N5, and N7 should be properly sized to ensure a correct noise margin [15]. The NMOS in inverter I4 should not be too strong, otherwise it can disconnect N5 before the pulse window is closed. P2 should pull Q up when D = 1, and PMOS in I1 should turn on N2 when D = 0. The discharge control transistor N5 is placed at the bottom of the NMOS stack to speed up the design, because Q\_fdbk is ready before the next clock edge to sample the data D.

The clocked-pseudo-NMOS scheme is different from the general idea of conventional pseudo-NMOS logic in that we use clocked transistors in the pull-down branch as well as a conditional discharge feedback to control transistor N5. Comparing this with previous published level-shifting schemes, the proposed level-shifting scheme employs only one single PMOS P1, resulting in an efficient design. One thing to note is that



Fig. 7. Setup used for the flip-flop simulations. Inputs are driven by the inverters, and the output is driving a capacity load of 14 minimum inverters (FO14).

pulsed flip-flops might need more hold time than conventional flip-flops.

#### **IV. SIMULATION RESULTS**

The simulation results were obtained from HSPICE simulation in 0.18-µm complementary metal-oxide-semiconductor (CMOS) technology at room temperature. VDDH is 1.8 V and VDDL = VDDH  $\times$  70% = 1.25 V (the optimal VDDL-to-VDDH ratio is 60%-70% to yield the best power consumption [7]). The parasitic capacitances were extracted from the layouts. The setup used in our simulations is shown in Fig. 7. In order to obtain accurate results, we have simulated the circuits in a real environment, where the flip-flop inputs (clock, data) are driven by the input buffers, and the outputs are required to drive an output load. The value of the capacitance load at node Q is 21 fF, which is selected to simulate a fan out of 14 minimum sized inverters (FO14) [16]. Assuming uniform data distribution, we have supplied D with 16-cycle pseudorandom input data with an activity factor of 18.75% to reflect the average power consumption. A clock frequency of 250 MHz is used.

Each design is simulated using the circuit at the layout level. All capacitances were extracted from layouts such that we can simulate the circuit more accurately. This is because the internal gate capacitance, parasitic capacitance, and wiring capacitance affect the power consumption heavily in deep submicron technology. Further, the delay strongly depends on these capacitance.

Power consumed in the data and clock drivers are included in our measurements. Circuits were optimized for power-delay product (PDP). Delay is the data-to-output delay (*D*-to-*Q* delay), which is the sum of the setup time and the clock to the output delay. The *D*-to-*Q* delay [17] is obtained by sweeping the 0- > 1 and 1- > 0 data transition times with respect to the clock edge and the minimum data-to-output delay corresponding to optimum setup time is recorded. This optimization methodology is similar to that in [17] and [18].

Table I shows a comparison of the flip-flop characteristics in terms of the delay, power and power–delay–product as well as level-shifting schemes, number of transistors, number of clocked transistors, number of gates on critical path, area, and the transistor width. The waveform of CPN-LCFF when Dmakes a 0->1 transition is shown in Fig. 8.

PHL suffers from threshold voltage drop and contention problems, which are aggravated by the low-voltage VDDL of input when switching. Further, it uses an explicit pulse generator. On the other hand, CPN-LCFF uses an implicit pulse as well as it

|          |                        |             |                  |                                   | -                 |                    | -        |        |          |
|----------|------------------------|-------------|------------------|-----------------------------------|-------------------|--------------------|----------|--------|----------|
|          | Level shifting scheme  | # of<br>Tr. | # of clocked Tr. | # of gates<br>on critical<br>path | Tr. Width<br>(um) | Area $(\lambda^2)$ | DQb (ps) | P (uW) | PDP (fJ) |
| PHL      | NMOS pass              | 23          | 14               | 4                                 | 17.5              | 15670              | 643      | 8.29   | 5.33     |
| CPN-LCFF | Clocked Pseudo<br>NMOS | 23          | 10               | 3                                 | 22.8              | 18009              | 541      | 7.61   | 4.17     |

 TABLE I

 Comparing the Flip-Flop Characteristics in Terms of the Delay, Power, and Power–Delay Product

-- Includes clocked transistors that switch with the clock both in the pulse generator and in the latch part.

-- CPN\_ip, PHL use DQb as delay, respectively.

-- All the designs are implemented using layout.



Fig. 8. Waveform of CPN-LCFF: D0 - > 1 transition.

has four less clocked transistors than PHL; in addition, it has one less gate on the critical path than PHL. Hence CPN-LCFF improves power and delay over PHL by 8.2% and 15.6%, respectively. In terms of PDP, 22.7% improvement is achieved. Note that CPN-LCFF uses more areas than PHL due to up sizing of the serial NMOS transistor stacks. PHL has lower power consumption than PPR, SPFF, SLLS, CSSA, and MSHL. However, CPN-LCFF further improves power dissipation over PHL, so it is suitable to be used in low-power systems.

In view of the level-shifting scheme, the proposed clockedpseudo-NMOS level-shifting scheme is more efficient than the other approaches such as the DCVSL style, the NMOS passtransistor scheme, and the precharged schemes.

The clocked-pseudo-NMOS technique in combination with the conditional-discharge technique could be used on other flip-flops like ip-DCO [18], single-transistor-clocked flip-flop [20], etc., as well, because replacing the precharging clocked transistor with a pseudo-NMOS transistor (a weak always-on-PMOS) will gain power improvement.

CPN-LCFF presents small delay; besides level-shifting environment, it could also be used in a critical path (VDDH blocks in CVS systems) directly, which may simplify the structure of the dual voltage system. Further, in case of the low-swing clock system, CPN-LCFF could be used since clock signals only connect to NMOS transistors (N3, N4, N6, N7).

As CMOS technology continues scaling, integrated circuits are more susceptible to soft errors and soft-error-tolerant techniques can be used [21]. With feature size shrinking, the leakage current increases rapidly and the multitreshold metal–oxide–semiconductor (MTMOS) technique can be used to reduce leakage power consumption [5], [22]. In addition, with technology scaling, process variation tolerant techniques such as combinations of adaptive body bias and adaptive VDD may be used to reduce the variation in frequency of fabricated dies [23].

# V. CONCLUSION

In this paper, previous LCFFs are surveyed and their levelshifting schemes are analyzed. A novel level-shifting scheme, clocked-pseudo-NMOS scheme, is proposed. A clockedpseudo-NMOS level-converting flip-flop is introduced, which uses the clocked-pseudo-NMOS technique.

CPN-LCFF combines the clocked-pseudo-NMOS technique with the conditional-discharge technique, and it uses an implicit pulse. In terms of power and delay, CPN-LCFF improved by 8.2% and 15.6% over PHL, respectively. In view of PDP, CPN-LCFF outperforms PHL by 22.7%. Hence, CPN-LCFF is suitable for low-power high-performance systems.

### ACKNOWLEDGMENT

The authors would like to thank J. Tschanz, Intel, for his valuable help.

### REFERENCES

- [1] M. Hamada, M. Takahashi, H. Arakida, A. Chiba, T. Terazawa, T. Ishikawa, M. Kanazawa, M. Igarashi, K. Usami, and T. Kuroda, "A top-down low power design technique using clustered voltage scaling with variable supply-voltage scheme," in *Proc. IEEE Custom Integr. Circuits Conf.*, 1998, pp. 495–498.
- [2] L. Benini, E. Macii, and G. De Micheli, "Designing low power circuits: Practical recipes," *IEEE Circuit Syst. Mag.*, vol. 1, no. 1, pp. 6–25, 2001.
- [3] S. Kulkarni and D. Sylvester, "High performance level conversion for dual VDD design," *IEEE Trans. Very Large Scale Integr. VLSI. Syst.*, vol. 12, no. 9, pp. 926–936, Sep. 2004.
- [4] R. Krishnamurthy, S. Hsu, M. Anders, and B. Bloechel, "Dual supply voltage clocking for 5 G, 130 nm integer execution core," in *Proc. IEEE Very Large Scale Integr. (VLSI) Symp.*, 2002, pp. 128–129.
- [5] M. Bai and D. Sylvester, "Analysis and design of level-converting flipflops for dual-Vdd/Vth integrated circuits," in *Proc. IEEE Int. Symp. System-on-Chip*, 2003, pp. 151–154.
- [6] H. Mahmoodi-Meimand and K. Roy, "Self-precharging flip-flop (SPFF): a new level converting flip-flop," in *Proc. Eur. Solid-State Circuits Conf.*, Sep. 2002, pp. 407–410.
- [7] F. Ishihara, F. Sheikh, and B. Nikolic, "Level conversion for dualsupply systems," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 2, pp. 185–195, Feb. 2004.
- [8] P. Zhao, G. P. Kumar, and M. Bayoumi, "Contention reduced/conditional discharge flip-flops for level conversion in CVS systems," in *Proc. IEEE Int. Symp. Circuits Syst.*, Vancouver, BC, Canada, May 23–26, 2004, pp. 669–672.
- [9] L. G. Heller, W. R. Griffin, J. W. Davis, and N. G. Thoma, "Cascode voltage switch logic: a differential CMOS logic family," in *Proc. IEEE Solid-Circutis Conf.*, 1984, pp. 16–17.
- [10] J. Rabaey, A. Chandrakasan, and B. Nikolic, *Digital Integrated Circuits*. Englewood Cliffs, NJ: Prentice-Hall, 2003.
- [11] B. Nikolic, V. G. Oklobzija, V. Stojanovic, W. Jia, J. K. Chiu, and M. M. Leung, "Improved sense-amplifier-based flip-flop: design and measurements," *IEEE J. Solid-State Circuits*, vol. 35, no. 6, pp. 876–883, Jun. 2000.
- [12] A. Chandrakasan, W. Bowhill, and F. Fox, *Design of High-Performance Microprocessor Circuits*, 1st ed. New York: IEEE Press.

- [13] P. Zhao, T. Darwish, and M. Bayoumi, "High-performance and lowpower conditional discharge flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 12, no. 5, pp. 477–484, May 2004.
- [14] P. Zhao, J. McNeely, P. Golconda, M. A. Bayoumi, W. D. Kuang, and B. Barcenas, "Low power clock branch sharing double-edge triggered flip-flop," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 15, no. 3, pp. 338–345, Mar. 2007.
- [15] D. A. Hodges, H. G. Jackson, and R. A. Saleh, Analysis and Design of Digital Integrated Circuits, 3rd ed. New York: McGraw-Hill, 2004.
- [16] N. Weste and D. Harris, CMOS VLSI Design. Reading, MA: Addison-Wesley, 2004.
- [17] V. Stojanovic and V. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and low power system," *IEEE J. Solid-State Circuits*, vol. 34, no. 4, pp. 536–548, Apr. 1999.
- [18] J. Tschanz, S. Narendra, Z. P. Chen, S. Borkar, M. Sachdev, and V. De, "Comparative delay and energy of single edge-triggered and dual edgetriggered pulsed flip-flops for high-performance microprocessors," in *Proc. Int. Symp. Low-Power Electron. Design*, Huntington Beach, CA, 2001, pp. 207–212.
- [19] B. Kong, S. Kim, and Y. Jun, "Conditional-capture flip-flop for statistical power reduction," *IEEE J. Solid-State Circuits*, vol. 36, no. 8, pp. 1263–1271, Aug. 2001.
- [20] P. Zhao, T. Darwish, and M. Bayoumi, "Low power and high speed explicit-pulsed flip-flops," in *Proc. 45th IEEE Int. Midwest Symp. Circuits Syst. Conf.*, Tulsa, OK, Aug. 4–7, 2002, vol. 2, pp. 477–480.
- [21] S. Lin, H. Z. Yang, and R. Luo, "High speed soft-error-tolerant latch and flip-flop design for multiple VDD circuit," in *Proc. IEEE Int. Comput. Soc. Annu. Symp. Very Large Scale Integr. (VLSI)*, Mar. 2007, pp. 273–278.
- [22] J. Tschanz, Y. Ye, L. Wei, V. Govindarajulu, N. Borkar, S. Burns, T. Karnik, S. Borkar, and V. De, "Design optimizations of a high performance microprocessor using combinations of dual-Vt allocation and transistor sizing," in *Proc. IEEE Symp. VLSI Circuits, Dig. Tech. Papers*, Jun. 13–15, 2002, pp. 218–219.
- [23] J. Tschanz, K. Bowman, and V. De, "Variation-tolerant circuits: Circuits solutions and techniques," in *Proc. IEEE Symp. Design Autom. Conf.*, Jun. 13–17, 2005, pp. 762–763.



**Peiyi Zhao** (S'02–M'05) received the B.Sc. degree in electronic engineering from Zhejiang University, Hangzhou, China, in 1987, and the Ph.D. degree in computer engineering from the University of Louisiana, Lafayette, in 2005.

He worked with Ningbo Radio Factory, Ningbo, China, from 1987 to 1995, designing FM/AM radio, television, and tape cassette recorder. From 1995 to 1999, he was with Ningbo Huaneng Corporation. Since 2001, he has been a graduate student researcher in the VLSI research group at The Center

for Advanced Computer Studies at University of Louisiana, Lafayette. Since 2005, he has been an Assistant Professor in Chapman University, Orange, CA. He has one patent pending. His research areas include digital/analogue circuit design, low-power design, and digital VLSI design.



**Jason B. McNeely** (S'99) received the B.S. degree in electrical engineering and the M.S. degree in computer engineering from The University of Louisiana, Lafayette, in 2001 and 2003, respectively, where he is currently working towards the Ph.D. degree in computer engineering at The Center for Advanced Computer Studies (CACS).

His research interests include low-power VLSI design, video compression, and sensor fusion.



**Pradeep K. Golconda** received the B.S. degree in electronics and communications engineering from Osmania University, Hyderabad, Andhra Pradesh, India, in 2002 and the M.S. degree in computer engineering from University of Louisiana Lafayette, in 2004.

He has been with Intel Corporation, Folsom, CA, since 2005, where his work includes implementation and validation of low-power and high-performance mobile chipset designs.

**Soujanya Venigalla** received the M.S. degree in computer engineering from University of Louisiana, Lafayette, in 2004.

She has been with Intel Corporation, Folsom, CA, since 2005.



**Nan Wang** received the B.S. degree in computer science from Xiamen University, Xiamen, China, in 1990, and the M.S. and Ph.D. degrees in computer engineering from University of Louisiana, Lafayette, in 2000 and 2008, respectively.

Currently, he is an Assistant Professor at the Department of Electrical and Computer Engineering, Institute of Technology, West Virginia University, Montgomery. His research interests include SOC/NOC communication architecture design, embedded system design, and low-power VLSI design.



Magdy A. Bayoumi (S'80–M'84–SM'87–F'99) received the B.Sc. and M.Sc. degrees in electrical engineering from Cairo University, Cairo, Egypt, in 1973 and 1977, respectively, the M.Sc. degree in computer engineering from Washington University in St. Louis, MO, in 1981, and the Ph.D. degree in electrical engineering from the University of Windsor, Windsor, ON, Canada, in 1984.

Currently, he is the Director of the Center for Advanced Computer Studies (CACS), Department Head of the Computer Science Department, the Edmiston

Professor of Computer Engineering, and the Lamson Professor of Computer Science at The Center for Advanced Computer Studies, University of Louisiana, Lafayette, where he has been a faculty member since 1985. He has edited and coedited three books in the area of VLSI signal processing. He has one patent pending. His research interests include VLSI design methods and architectures, low-power circuits and systems, digital signal processing architectures, parallel algorithm design, computer arithmetic, image and video signal processing, neural networks, and wideband network architectures.

Dr. Bayoumi was an Associate Editor of the Circuits and Devices Magazine and is currently an Associate Editor of Integration, the VLSI Journal, and the Journal of VLSI Signal Processing Systems. He is a Regional Editor for the VLSI Design Journal and on the Advisory Board of the Journal on Mi-

croelectronics Systems Integration. He received the University of Louisiana at Lafayette 1988 Researcher of the Year Award and the 1993 Distinguished Professor Award. He was an Associate Editor of the IEEE CIRCUITS AND DEVICES MAGAZINE, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, the IEEE TRANSACTIONS ON NEURAL NETWORKS, and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS-II: ANALOG AND DIGITAL SIGNAL PROCESSING. From 1991 to 1994, he served on the Distinguished Visitors Program for the IEEE Computer Society, and currently, he is on the Distinguished Lecture Program of the Circuits and Systems Society. He was the Vice President for the technical activities of the IEEE Circuits and Systems Society. He was the Cochairman of the Workshop on Computer Architecture for Machine Perception in 1993, and currently, he is a member of the Steering Committee of this workshop. He was the General Chairman of the 1994 IEEE International Mid West Symposium on Circuits and Systems (MWSCAS) and is a member of the Steering Committee of this symposium. He was the General Chairman for the Eighth Great Lake Symposium on VLSI in 1998. He has been on the Technical Program Committee for IEEE International Symposium on Circuits and Systems (ISCAS) for several years and he was the Publication Chair for ISCAS'99. He was also the General Chairman of the 2000 Workshop on Signal Processing Design and Implementation. He was a founding member of the VLSI Systems and Applications Technical Committee and was its Chairman. He is currently the Chairman of the Technical Committee on Circuits and Systems for Communication and the Technical Committee on Signal Processing Design and Implementation. He is a member of the Neural Network and the Multimedia Technology Technical Committees. Currently, he is the faculty advisor for the IEEE Computer Student Chapter at the University of Louisiana at Lafayette.



Weidong Kuang received the B.S. and M.S. degrees from Nanjing University of Aeronautics and Astronautics, Nanjing, China, and the Ph.D. degree from the University of Central Florida, Orlando, in 1991, 1994, and 2003, respectively, all in electrical engineering.

From April 1994 to June 1999, he was with Beijing Institute of Radio Measurement, Beijing, China, where his work involved the development of phasedarray radar systems. Since August 2004, he has been with the Department of Electrical Engineering, Uni-

versity of Texas—Pan American, Edinburg, where he is now an Assistant Professor. His research interests include asynchronous circuits, low-power IC design, and fault tolerance in digital VLSI circuits.



**Luke Downey** (S'07) is working towards the B.S. degree in computer Science major at the Department of Mathematics and Computer Science, Chapman University, Orange, CA.

During summer 2008, he has participated in a summer research program at North Carolina State University, working on embedded applications in an effort to create a wireless, multiaxis control interface for 3-D environments.