

# Communication at the Speed of Light (CaSoL): A New Paradigm for Designing Global Wires

Reza Sarvari<sup>®</sup>, *Member, IEEE*, Amin Rassekh<sup>®</sup>, and Sina Shahhosseini

Abstract— In this paper, we argue that communication at the speed of light (CaSoL) through on-chip copper interconnects is possible in the near future based on giga-scale integration (GSI) technologies. A three-step algorithm is introduced to design the optimum buffers in such systems. HSPICE simulations show that a 1.3 × time of flight ( $T_F$ ) is reachable in 7-nm FinFET technology. It is also shown that such a design is by nature, robust, and immune to process variations and crosstalk noise.

Index Terms—Buffer insertion, global on-chip interconnect, integrated circuit interconnections, repeater insertion.

#### I. INTRODUCTION

URING the past decade, as transistors benefited from scaling, interconnects became the limiter used for giga-scale integration (GSI) [1]. Although with technology scaling the minimum feature size becomes smaller, the die size increases due to having more functionality on a chip. Hence, technology scaling results in an increase in both the length and number of global lines. In fact, in nanometer technologies, the global interconnects play a dominant rule in performance, particularly in high-performance chips [2], [3]. Buffer insertion or repeater between a long interconnect is a well-known method to reduce the delay [4]. The size and number of repeaters are commonly optimized to minimize the total delay [4]. However, in some works, optimal power is considered in the repeater insertion technique [2]. Also, there are some other works in the literature, which design repeaters to reduce delay and power [5], minimize area and power [6], and minimize power with delay and bandwidth constraints [7].

With scaling, the critical dimension (CD) process variations would be a significant concern since the variations strongly affect performance, power, and yield [8]. On the other hand, the interconnect variability aggravates with technology scaling [9], [10]. To avoid these effects, either the fluctuations should be controlled, or the design should be robust.

In this paper, we show that for upcoming technologies, "global wires" could potentially benefit from scaling. A simple buffer insertion technique would guarantee communication at

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TED.2019.2919629



Fig. 1. (a) Representation of a TL with capacitive load  $CC_L$ , and driver resistance  $R_D$ . (b) Proposed "tapered buffer" structure. (c) Cross-section of global wires.

the speed of light (CaSoL) along copper wires. To keep up with the trend of Moore's law when clock frequencies are almost constant, the use of multicore chips has been the customary approach for the past decade. Communication between cores becomes affordable using network-on-chip (NoC) architecture.

A wireless and optical NoC architecture has been proposed to overcome the speed and power issues of copper wires [11]. The technique presented in this paper for buffer insertion for CaSoL not only enables data communication at the material's speed limit with high bit-rate but also is robust and immune to crosstalk noise and process variations. Low-swing signaling in these conditions restores the electrical wires as a competitor to their optical counterparts, both in terms of power and delay.

# **II. DESIGN ALGORITHM**

Bakoglu proposed buffer insertion to reduce propagation delay over on-chip metal wires [4]. Since then, modifications have been made. Most of the previous works in this area are based on propagation delay minimization through a structure, shown in Fig. 1(a), where  $R_D$  represents the driver's resistance; r, l, and c are the resistance, inductance, and capacitance per unit length of the wire, respectively; x is the length of the wire; and  $C_L$  is the load capacitance. A stripline, as shown in Fig. 1(c), is used for this work. HSPICE U-model is used for transmission lines (TLs) throughout this work. The dc resistance, inductance, and capacitance per unit length of a single wire which is extracted from HSPICE simulation will be as  $r = 9.4 \text{ k}\Omega/\text{m}$ , l = 292 nH/m, and c = 84 pF/m, respectively.

Manuscript received May 3, 2019; accepted May 24, 2019. The review of this paper was arranged by Editor M. S. Bakir. *(Corresponding author: Reza Sarvari.)* 

The authors are with the School of Electrical Engineering, Sharif University of Technology, Tehran, Iran (e-mail: sarvari@sharif.edu).

### A. Problem Definition and Design Algorithm

Historically, the *rc* model has been accurate enough for onchip wires. However, for the global wires in recent technologies, the TL effect becomes crucial, and the wire inductance should be considered in modeling and optimization [12]. What shifts the optimization paradigm in this work is not the TL effect by itself but the load capacitance, which becomes smaller as the technology nodes are scaled down [13]; thus, the wire can be considered an open-ended TL. More generally, if the load impedance (at the frequency of interest) is much larger than the characteristic impedance of the line, this argument can be used. This means  $Z_0T_F/C_L$ , where  $Z_0$  is the characteristic impedance of TL and  $T_F$  is the time of flight.  $T_F = x\sqrt{lc}$ , where its reciprocal represents the frequency of operation.

We propose "tapered buffer" insertion, as shown in Fig. 1(b), where the first cascaded buffer is minimum-sized, and h is the taper ratio (i.e., the buffer at each stage is h times larger than that of the previous stage). For a buffer in FinFET technology, h is determined by the number of fins, as shown in Fig. 1(b). Throughout this work, 7-nm FinFET technology is used, and HSPICE models are derived from PTM [14]. The design algorithm can be summarized in the following steps.

Step I (Determining technology parameters): (i.e., the characteristics of the global wire and minimum-sized buffer). For 7-nm FinFET technology, the input capacitance and driver resistance of the minimum-sized inverter (which is used as a buffer) are  $C_0 \approx 60$  aF and  $R_0 \approx 7$  k $\Omega$ , respectively. The stripline structure of the interconnects dictates that l and c of the wires. Based on the International Technology Roadmap for Semiconductors (ITRS) projections, global wiring pitch and its aspect ratio remain almost constant for different technologies [3]. As a result,  $Z_0$  and  $T_F$  (for a specific length) remain the same in different technologies, except for a little change arising from the change in the wires' aspect ratio. For the stripline shown in Fig. 1(c),  $Z_0 \approx 60 \Omega$ . While technology scales down,  $C_{\rm L}$  (which is the input capacitance of minimumsized buffer  $C_0$ ) becomes much smaller than  $T_F/Z_0$ , and this causes a jump in the voltage level at the end of the line. For the chosen technology node, the ratio of  $C_{\rm L} Z_0 / T_{\rm F}$  is in the order of  $10^{-4}$  for a 1-mm-long global wire.

Step II (Finding the size of the driver): The voltage jump at the end of the line is equal to the following equation [15]:

$$V(x, x\sqrt{lc}) = \frac{2Z_0 V_{\rm DD}}{Z_0 + R_D} e^{-rx/2Z_0}$$
(1)

so  $R_{\rm D}$  should be smaller than  $3Z_0$  to enable CaSoL. In Fig. 2, the delay normalized to the time of flight  $(\tau_{\rm AB}/T_{\rm F})$  where  $\tau_{\rm AB}$  is the delay from point A to point B, is plotted for a 1-mm-long global wire (shown in Fig. 1) versus  $R_{\rm D}$ , which is normalized to  $Z_0$ . Step responses are plotted for two different  $R_{\rm D}$ . This shows that for an  $R_{\rm D}$  less than  $3Z_0$ , the delay is simply equal to  $T_{\rm F}$ . Note that later, the optimal length will be around 10 mm. Equation (1) implies that CaSoL is possible for  $R_{\rm D} < Z_0$  at such a length. One possible design strategy is to start from a matched driver ( $R_{\rm D} = Z_0$ ) and design the global wiring pitch accordingly.



Fig. 2. Delay (normalized to the time of flight) for a 1-mm-long global wire versus  $R_{\rm D}$  (normalized to  $Z_0$ ). Step response also plotted for two cases  $R_{\rm D} = 2.5Z_0$  and  $R_{\rm D} = 3.5Z_0$ .

 TABLE I

 RESULTS FOR R<sub>ON</sub> WITH HSPICE SIMULATION COMPARE TO (2)

| h | n | $R_D(Simulation)(\Omega)$ | $R_D(Equation (2))(\Omega)$ | Error (%) |
|---|---|---------------------------|-----------------------------|-----------|
| 1 | 1 | 7.0641 k                  | 7.0641 k                    | 0         |
| 2 | 4 | 890.4562                  | 883.0125                    | -0.8359   |
| 3 | 4 | 261.2815                  | 261.6333                    | +0.1347   |
| 4 | 4 | 108.3818                  | 110.3766                    | +1.8405   |
| 5 | 4 | 57.1208                   | 56.5128                     | -1.0644   |
| 2 | 5 | 428.5443                  | 441.5063                    | +3.0246   |
| 3 | 5 | 86.2123                   | 87.2111                     | +1.1585   |
| 4 | 5 | 27.9243                   | 27.5941                     | -1.1823   |
| 2 | 6 | 217.5186                  | 220.7531                    | +1.4870   |
| 3 | 6 | 29.3818                   | 29.0704                     | -1.0599   |
| 2 | 7 | 110.2716                  | 110.3766                    | +0.0952   |
| 3 | 7 | 10.1887                   | 9.6901                      | -4.8934   |

The driver resistance of the tapered buffer can be written empirically as (2). We supported our claim with a HSPICE simulation, as shown in Table I. For the chosen technology node,  $R_{\rm D} < 3 Z_0$  translates into  $h^{n-1} > 100$  as follows:

$$R_{\rm D} = \frac{R_0}{h^{n-1}}.\tag{2}$$

Step III (Finding the optimal value for the distance between buffers):  $x_{opt}$ . As shown in Fig. 1(b), a line is divided into equal segments of length x. Therefore, the ratio of delay over the time of flight,  $\tau_{AB}/T_F$ , should be minimized, where  $\tau_{AB}(h, n, r, l, c, x)$  and  $T_F(x, l, c)$ . As is discussed in Step II, if  $R_D < 3Z_0$  then the dependence of  $\tau_{AB}$  on h and n is very small. As shown in [16], four different line parameters, r, l, c, and x can be considered as a single, independent normalized parameter. Hence, the optimal solution could be found by scanning different lengths and finding the  $x_{opt}$  wire length at which  $\tau_{AB}/T_F$  is minimized.

# B. Results

Fig. 3(a)–(d) show the dependence of  $\tau_{AB}/T_F$  on x. For  $x > x_{opt}$ , line attenuation becomes dominant, while for  $x < x_{opt}$ , the buffer delay becomes dominant.  $x > x_{opt}$  can be used



Fig. 3.  $\tau_{AB}/T_F$  versus length of the segment for buffer size. (a) n = 4. (b) n = 5. (c) n = 6. (d) n = 7. Step response of the segment for different buffer sizes shown in Fig. 4. (e) n = 4, h = 5,  $x_{opt} = 11.7$  mm. (f) n = 5, h = 4,  $x_{opt} = 13.2$  mm. (g) n = 6, h = 3,  $x_{opt} = 13.3$  mm. (h) n = 7, h = 3,  $x_{opt} = 14.3$  mm.

 TABLE II

 RESULTS FOR OPTIMAL DESIGN FOR DIFFERENT VALUES

 OF *h* AND *n. x*<sub>OPT</sub> IS THE OPTIMAL DISTANCE

 BETWEEN REPEATERS AS SHOWN IN FIG. 1

| h | n | x <sub>opt</sub><br>[mm] | T <sub>F</sub><br>[psec] | $rac{	au_{AB}}{T_F}$ | N <sub>Fin</sub><br>(max) | $\frac{A}{A_0}$ | E <sub>s</sub><br>[f]] | $E_D$<br>[f]] | E <sub>D-line</sub><br>[%] |
|---|---|--------------------------|--------------------------|-----------------------|---------------------------|-----------------|------------------------|---------------|----------------------------|
| 5 | 4 | 11.7                     | 57.84                    | 1.3630                | 125                       | 156             | 0.21                   | 501.94        | 95.6                       |
| 4 | 5 | 13.2                     | 65.26                    | 1.3127                | 256                       | 341             | 0.47                   | 572.60        | 94.4                       |
| 3 | 6 | 13.3                     | 65.75                    | 1.3115                | 243                       | 364             | 0.49                   | 582.77        | 93.5                       |
| 3 | 7 | 14.3                     | 70.69                    | 1.3115                | 729                       | 1093            | 1.50                   | 646.51        | 90.4                       |



Fig. 4. Optimum length for different tapered buffers.

to save the silicon area and reduce power dissipation by a delay penalty, while choosing x around or smaller than  $x_{opt}$  could result in a more robust design. As  $x_{opt}$  is around 10 mm, we expect that the impact of via is negligible. However, a full-wave simulation is needed to investigate that.

Both process variations and crosstalk noise become a concern for global wires in GSI. CaSoL buffer insertion, which results in a delay close to  $T_{\rm F}$ , automatically guarantees a robust design as  $T_{\rm F}$  is, by nature, nearly independent of process variations. Delay-induced crosstalk could also be overcome by the same phenomenon. The optimization results are summarized in Table II. Fig. 4 shows different buffer sizes with their  $x_{opt}$  values. The step response of these segments is shown in Fig. 3(e)–(h). For h = 3 and n = 6,  $\tau_{AB} = 1.3115 T_F$  can be obtained, which equals the delay in an optical system with a refractive index of 1.9 and 0 delays for electrical-to-optical and optical-to-electrical conversion. The same table shows total dynamic energy  $(E_{\rm D})$  per bit per segment, static energy  $(E_{\rm S})$ per bit per segment, and the percentage of the dynamic energy of line  $(E_{D-line})$ . The results show that more than 90% of the total dynamic energies arise from the interconnects. Also, the maximum number of fins  $(N_{\text{Fin}})$  and area consumed (A)

TABLE IIIRESULTS FOR INTERCONNECT PROCESS VARIATION OF OPTIMALDESIGNS SHOWN IN TABLE II.  $\sigma$  Stands for the StandardDEVIATION OF DELAY DISTRIBUTION

| - |   |                |                       |                       |          |
|---|---|----------------|-----------------------|-----------------------|----------|
| h | n | $x_{opt}$ [mm] | $\tau_{AB_{min}}/T_F$ | $\tau_{AB_{max}}/T_F$ | 3σ       |
| 5 | 4 | 11.7           | 1.3573                | 1.4992                | 0.015055 |
| 4 | 5 | 13.2           | 1.3051                | 1.4415                | 0.019462 |
| 3 | 6 | 13.3           | 1.3038                | 1.3232                | 0.009633 |
| 3 | 7 | 14.3           | 1.3034                | 1.4274                | 0.018804 |

with respect to the area of a minimum buffer size  $(A_0)$  are summarized in Table II. Note that we assumed the global wires are switching at the maximum clock frequency  $(F_{max})$ . Hence the contribution of static power on the "energy per bit" is negligible.

Table III shows delay variations due to the interconnect process variations. It is assumed that all interconnect geometrical values shown in Fig. 1 are suffering from a  $3\sigma = 10\%$  variation of the CD by Gaussian distributions [8], [10], [17], [18] where the CD is the ITRS microprocessor (MPU)



Fig. 5. Histogram of  $\tau_{AB}/T_F$  due to the interconnect process variation of the segment for different buffer sizes shown in Fig. 4. (a) n = 4, h = 5,  $x_{opt} = 11.7$  mm. (b) n = 5, h = 4,  $x_{opt} = 13.2$  mm. (c) n = 6, h = 3,  $x_{opt} = 13.3$  mm. (d) n = 7, h = 3,  $x_{opt} = 14.3$  mm. Histogram of  $\tau_{AB}/T_F$  due to the device process variation of the segment for different buffer sizes shown in Fig. 4. (e) n = 4, h = 5,  $x_{opt} = 11.7$  mm. (f) n = 5, h = 4,  $x_{opt} = 13.2$  mm. (g) n = 6, h = 3,  $x_{opt} = 13.3$  mm. (h) n = 7, h = 3,  $x_{opt} = 14.3$  mm.

TABLE IVRESULTS FOR SIMULATION OF DEVICE VARIATIONS FOR OPTIMALDESIGN SHOWN IN TABLE II,  $\sigma$  Stands for the<br/>Standard Deviation of Delay Distribution

| h | n | $x_{opt}  [mm]$ | $\tau_{AB_{min}}/T_F$ | $\tau_{AB}{}_{max}/T_F$ | $3\sigma$ |
|---|---|-----------------|-----------------------|-------------------------|-----------|
| 5 | 4 | 11.7            | 1.3204                | 1.4917                  | 0.071841  |
| 4 | 5 | 13.2            | 1.2830                | 1.3787                  | 0.041946  |
| 3 | 6 | 13.3            | 1.2831                | 1.3773                  | 0.040769  |
| 3 | 7 | 14.3            | 1.2838                | 1.3723                  | 0.040369  |

half-pitch [3], and  $\sigma$  is the standard deviation. In the same manner, the results of the device process variations are summarized in Table IV. It is assumed that the gate length  $(L_g)$ , fin thickness  $(T_{fin})$ , fin height  $(H_{fin})$ , and oxide thickness  $(T_{ox})$ are 11, 6.5, 18, and 1.15 nm, respectively, for 7-nm FinFET technology with  $3\sigma = 10\%$  variation in their nominal values, except for the oxide thickness, which is 5%. All of these are modeled by Gaussian distribution [19].

We also consider threshold voltage variation ( $\sigma V_T$ ) due to random dopant variation because this is an important issue in sub-10-nm CMOS technologies [9], [20]–[23]. The analytical equation for  $\sigma V_T$  is given in (3) [9], [22], [23], where q is the electron charge,  $W_{eff}$  is the effective width,  $\varepsilon_{ox}$  is the oxide permittivity, and  $N_{ch}$  is the total channel doping concentration. Threshold voltage variation from other sources, such as work-function and extensions resistance, are significant [9]. All of these threshold variations are lumped together in Pelgrom's rule [9], [21] as given by (4)

$$\sigma V_{\rm T} = q \cdot \sqrt{\frac{T_{\rm Fin} \cdot H_{\rm Fin}}{W_{\rm eff}}} \cdot \frac{T_{\rm ox}}{\varepsilon_{\rm ox}} \cdot \frac{\sqrt{N}_{\rm ch}}{\sqrt{L_g \cdot W_{\rm eff}}}$$
(3)

$$\sigma V_{\rm T} = \frac{A_{\rm VT}}{\sqrt{2 \cdot L_g \cdot W_{\rm eff}}} \tag{4}$$

where  $A_{VT}$  is the slope of the Pelgrom plot [21], [23]. Some recent measurements suggest an  $A_{VT}$  value for scaled FinFETs at about 1 mV ·  $\mu$ m [9], [23]–[26].

Fig. 5 shows histograms of  $\tau_{AB}/T_F$  variations for a segment of different tapered buffer sizes (which are shown in Fig. 4) under interconnect and device process variations. Process variations are studied using Monte Carlo simulations of HSPICE with 1000 iterations.

Fig. 6 shows how the voltage propagates along a 40-mm global wire. A long global interconnect has been chosen [27]. The wire is divided into three segments with buffers tapered by h = 3 and n = 6. Probing the voltages could help readers to understand how CaSoL is possible by proper buffer insertion. As shown, the voltage at the end of each segment jumps at the time of flight ( $T_F$ ), and the propagation delay inside each chain of tapered buffers is a small fraction of  $T_F$ . Implementing a repeater with a number of fins might add parasitic capacitances because of the layout constraints compared to the models that we used. We investigated this effect by adding some adjustments to our model, and the results show that it does not cause a significant change in delay or energy. The histogram of  $\tau_{AB}/T_F$  variations due to interconnect process variations and device process variations is shown in Fig. 7.

The older model in [28]–[30] minimizes the delay of a TL by

$$k_{\text{opt}} = \sqrt{\frac{0.56r\text{LcL}}{hn_{\text{opt}}R_0C_0}}, \quad n_{\text{opt}} = \frac{\ln\left(1 + \frac{c\cdot L}{k_{\text{opt}}\cdot C_0}\right)}{\ln(h)} \tag{5}$$

where *L* is the TL length,  $k_{opt}$  is the optimum number of segments, and  $n_{opt}$  is the optimum number of stages of a tapered buffer. The optimum value of *h* is equal to *e*. For a 40-mm global wire, (5) results in  $k_{opt} = 8$  with buffers tapered by  $h \approx 3$  and  $n_{opt} = 8$ , which increases the total delay from 1.3  $T_{\rm F}$  to 1.8  $T_{\rm F}$ . But the delay is not the point. In fact,



Fig. 6. (a) 40-mm global wire is divided into three segments with tapered buffers by h = 3 and n = 6. (b) Propagation of voltage along a 40-mm global wire with CaSoL buffer insertion. The propagation from  $i_1$  till  $o_1$  (over the wire) and per segment (from  $o_1$  till  $o_2$ ) is "at the speed of light," and hence, the technique is called CaSoL.



Fig. 7. (a) Histogram of  $\tau_{AB}/T_F$  due to the interconnect process variation for a 40-mm global wire with CaSoL buffer insertion shown in Fig. 6 ( $\tau_{AB_{max}}/T_F = 1.4721$ ,  $\tau_{AB_{min}}/T_F = 1.3378$ , and  $3\sigma = 0.017429$ ). (b) Histogram of  $\tau_{AB}/T_F$  due to the device process variation for a 40-mm global wire with CaSoL buffer insertion shown in Fig. 6 ( $\tau_{AB_{max}} = 1.4349$ ,  $\tau_{AB_{min}}/T_F = 1.3109$ , and  $3\sigma = 0.054266$ ).

the old buffer insertion technique results in a big overshoot at the terminal point, as shown in Fig. 8, due to choosing a nonoptimal length. The area in the conventional RC model is 24 times larger than the area in the CaSoL algorithm. Hence, CaSoL algorithm saves silicon area too.



Fig. 8. (a) 40-mm global wire is divided into eight segments with older standard tapered buffer insertion technique. (b) Plot of the step response of a system optimized as in [28]–[30] (older standard tapered buffer insertion technique based on rc-model for wires). It shows that voltage at the far end is suffering from unreasonably large overshoot.



Fig. 9. (a) 40-mm global wire is divided into three segments with tapered buffers by h = 4 and n = 6 with F = 1.7 at 22-nm CMOS. (b) Propagation of voltage along a 40-mm global wire with CaSoL buffer insertion at 22-nm CMOS (compared to 7-nm FinFET shown in Fig. 6). Total delay will be 1.7183  $T_{\rm F}$ . Slower rise time is due to the smaller drivability for CMOS compared to that of FinFET.

The CaSoL algorithm was applied to a 22-nm CMOS process; the results are reported in Table V.  $W_{\text{max}}$  is the maximum width of a buffer in a repeater chain, and  $W_0$ 



Fig. 10. (a) Proposed global wiring system which eliminates the crosstalk induced delay for a CaSoL system (Table VI). Worst and best cases are referring to the out-of-phase and in-phase signal patterns as shown in the figure. Per unit length parameters of such a bus will be as: as  $r = 11 \text{ k}\Omega/\text{m}$ , l = 150 nH/m,  $l_m = 11.5 \text{ nH/m}$ ,  $c_g = 140 \text{ pF/m}$ , and  $c_m = 12 \text{ pF/m}$ . Where *l* is self-inductance, *l*<sub>m</sub> is the mutual inductance of adjacent wires,  $c_g$  is the ground capacitance, and  $c_m$  is the mutual capacitance of adjacent wires. (b) Step response of structure shown in (a) for two different cases. It shows how by reducing the aspect ratio, one will guarantee at time of flight transition over wire at worst case while no overshoot is seen for the best case scenario.

 TABLE V

 REPETITION OF TABLE II FOR 22-NM CMOS PROCESS. F IS THE

 RATIO OF WIDTH OVER LENGTH OF PMOS TO THAT OF NMOS

| h | n | F   | $x_{opt}$<br>[mm] | T <sub>F</sub><br>[psec] | $rac{	au_{AB}}{T_F}$ | $\frac{W_{max}}{W_0}$ | $\frac{A}{A_0}$ | $E_S$<br>[ $fJ$ ] | $E_D$<br>[f]] | E <sub>D-line</sub><br>[%] |
|---|---|-----|-------------------|--------------------------|-----------------------|-----------------------|-----------------|-------------------|---------------|----------------------------|
| 5 | 4 | 3.1 | 9.7               | 47.95                    | 2.0161                | 125                   | 156             | 4.55              | 694.29        | 94.4                       |
| 4 | 5 | 3.2 | 11.3              | 55.86                    | 1.7131                | 256                   | 341             | 8.95              | 828.16        | 91.5                       |
| 3 | 6 | 3.0 | 12.2              | 60.31                    | 1.7193                | 243                   | 364             | 10.15             | 920.20        | 88.9                       |
| 3 | 7 | 2.8 | 15.0              | 74.16                    | 1.6477                | 729                   | 1093            | 26.12             | 1152.8        | 86.0                       |

is the width of minimum buffer size. Although the input gate capacitors are as small as the 7-nm FinFET, at best,  $\tau_{AB} = 1.7193 T_F$  can be achieved by h = 3 and n = 6. This is because of the limited drivability of the conventional CMOS compared to that of FinFET. Voltage propagation along a 40-mm wire in this technology is shown in Fig. 9.

We have also shown that by reducing the wire's aspect ratio to one-half, the crosstalk noise is canceled. That means CaSoL is guaranteed for out-of-phase switching, and there is no overshoot for in-phase signals.

The structure of wires, voltage plots for in- and outof-phase cases, and the results are summarized in Fig. 10

#### TABLE VI

Results for Optimal Design for Different Values of h and n.  $x_{OPT}$  Is the Optimal Distance Between Repeaters as Shown in Fig. 1(b) for a Bus of Five Global Wires With Wire Pitch of 4  $\mu$ M and Aspect Ratio of 1/2 as Shown in Fig. 10(a). Here Worst Case and Best-Case Are Referring to Out-of-Phase and In-Phase Signaling of Adjacent Wires, Respectively [As Shown in Fig. 10(a)]

| h |   | $x_{opt} \left[ mm  ight]$ | $T_F[psec]$ | $	au_{AB}$ / $T_F$ |           |  |
|---|---|----------------------------|-------------|--------------------|-----------|--|
|   | n |                            |             | Worst case         | Best case |  |
| 4 | 5 | 5.0                        | 24.72       | 1.6981             | 1.6021    |  |
| 5 | 5 | 7.0                        | 34.60       | 1.6592             | 1.5554    |  |
| 3 | 6 | 5.0                        | 24.72       | 1.7330             | 1.6263    |  |
| 4 | 6 | 7.5                        | 37.08       | 1.6377             | 1.5356    |  |
| 5 | 6 | 7.5                        | 37.08       | 1.6934             | 1.6058    |  |
| 3 | 7 | 7.0                        | 34.60       | 1.6272             | 1.5362    |  |
| 4 | 7 | 8.0                        | 39.55       | 1.6848             | 1.5830    |  |

and Table VI. As CaSoL is also immune to process variations, we propose adding such a layer to the top metal level for low-swing NoC/global on-chip communication. Such a system could surpass any on-chip optical interconnect system in terms of delay and power.

## **III. CONCLUSION**

A new paradigm for the design of global on-chip wires is proposed. As an example, a three-step algorithm for buffer insertion for CaSoL over global on-chip wires is presented. *Step 1* verifies the possibility of CaSoL over a wire with a capacitive load. If  $C_L \ll T_F/Z_0$ , then CaSoL is possible. Whereas  $T_F$  (the chip size) and  $Z_0$  are almost constant for future technologies, scaling (reducing  $C_L$ ) makes CaSoL possible. *Step II* finds the structure of the tapered buffer. We have shown that the effective driving resistance of the buffer should be less than  $3Z_0$ . Simply by using a larger buffer, the crosstalk noise can be overcome. *Step III* argues that among all different parameters, finding an optimal value for the distance between buffers completes the design algorithm. As the time of flight dominates the delay value, we expect it to be independent of interconnect and device process variations.

The simulations are in accordance with our expectations. CaSoL buffer insertion technique pushes the global wire to its physical limits, and it is a robust design. Hence, it can be used for low-swing signaling to overcome the power dissipation obstacle of next-gen GSI chips.

#### REFERENCES

- J. A. Davis *et al.*, "Interconnect limits on gigascale integration (GSI) in the 21st century," *Proc. IEEE*, vol. 89, no. 3, pp. 305–324, Mar. 2001. doi: 10.1109/5.915376.
- [2] K. Banerjee and A. Mehrotra, "A power-optimal repeater insertion methodology for global interconnects in nanometer designs," *IEEE Trans. Electron Devices*, vol. 49, no. 11, pp. 2001–2007, Nov. 2002. doi: 10.1109/TED.2003.820651.
- [3] (2005) International Technology Roadmap for Semiconductors (ITRS). [Online]. Available: http://www.itrs2.net/
- [4] H. B. Bakoglu and J. D. Meindl, "Optimal interconnection circuits for VLSI," *IEEE Trans. Electron Devices*, vol. 32, no. 5, pp. 903–909, May 1985. doi: 10.1109/T-ED.1985.22046.

- [5] V. Adler and E. G. Friedman, "Repeater design to reduce delay and power in resistive interconnect," *IEEE Trans. Circuits Syst. II, Analog Digit. Signal Process.*, vol. 45, no. 5, pp. 607–616, May 1998. doi: 10.1109/82.673643.
- [6] A. Nalamalpu and W. Burleson, "A practical approach to DSM repeater insertion: Satisfying delay constraints while minimizing area and power," in *Proc. IEEE 14th Annu. Int. ASIC/SOC Conf.*, Sep. 2001, pp. 152–156. doi: 10.1109/ASIC.2001.954689.
- [7] G. Chen and E. G. Friedman, "Low-power repeaters driving RC and RLC interconnects with delay and bandwidth constraints," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 14, no. 2, pp. 161–172, Feb. 2006. doi: 10.1109/TVLSI.2005.863750.
- [8] G. G. Lopez, "The impact of interconnect process variations and size effects for gigascale integration the impact of interconnect process variations and size effects for gigascale integration," Ph.D. dissertation, School ECE, Georgia Inst. Technol., Atlanta, GA, USA, 2009. [Online]. Available: http://hdl.handle.net/1853/31781
- [9] T. Huynh-Bao et al., "Statistical timing analysis considering device and interconnect variability for BEOL requirements in the 5-nm node and beyond," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 25, no. 5, pp. 1669–1680, May 2017. doi: 10.1109/TVLSI.2017.2647853.
- [10] H. Kitada et al., "The influence of the size effect of copper interconnects on RC delay variability beyond 45 nm technology," in *Proc. IEEE Int. Interconnect Technol. Conf.*, Jun. 2007, pp. 10–12. doi: 10.1109/IITC.2007.382333.
- [11] L. P. Carloni, P. Pande, and Y. Xie, "Networks-on-chip in emerging interconnect paradigms: Advantages and challenges," in *Proc. 3rd ACM/IEEE Int. Symp. Netw.-Chip (NOCS)*, vol. 2009, p. 93–102. doi: 10.1109/NOCS.2009.5071456.
- [12] Y. I. Ismail and E. G. Friedman, "Effects of inductance on the propagation delay and repeater insertion in VLSI circuits: A summary," *IEEE Circuits Syst. Mag.*, vol. 3, no. 1, pp. 24–28, Sep. 2003. doi: 10.1109/MCAS.2003.1228505.
- [13] C. Pan and A. Naeemi, "A paradigm shift in local interconnect technology design in the era of nanoscale multigate and gate-all-around devices," *IEEE Electron Device Lett.*, vol. 36, no. 3, pp. 274–276, Mar. 2015. doi: 10.1109/LED.2015.2394366.
- [14] Predictive Technology Model. Accessed: Feb. 1, 2018. [Online]. Available: http://ptm.asu.edu/
- [15] J. A. Davis and J. D. Meindl, "Compact distributed RLC interconnect models. I. Single line transient, time delay, and overshoot expressions," *IEEE Trans. Electron Devices*, vol. 47, no. 11, pp. 2068–2077, Nov. 2000. doi: 10.1109/16.877168.
- [16] R. Sarvari, "Impact of size effects and anomalous skin effect on metallic wires as GSI interconnects," Ph.D. dissertation, School ECE, Georgia Inst. Technol., Atlanta, GA, USA, 2008. [Online]. Available: http://hdl.handle.net/1853/31636

- [17] G. Lopez, J. Davis, and J. Meindl, "A new physical model and experimental measurements of copper interconnect resistivity considering size effects and line-edge roughness (LER)," in *Proc. IEEE IITC*, Jun. 2009, pp. 231–234. doi: 10.1109/IITC.2009.5090396.
- [18] G. Lopez, R. Murali, R. Sarvari, K. Bowman, J. Davis, and J. Meindl, "The impact of size effects and copper interconnect process variations on the maximum critical path delay of single and multi-core microprocessors," in *Proc. IEEE Int. Interconnect Technol. Conf.*, no. 404, Jun. 2007, pp. 40–42. doi: 10.1109/IITC.2007.382346.
- [19] V. B. Kleeberger, H. Graeb, and U. Schlichtmann, "Predicting future product performance: Modeling and evaluation of standard cells in FinFET technologies," in *Proc. DAC*, 2013, Art. no. 33. doi: 10.1145/2463209.2488775.
- [20] C. Shin et al., "Random dopant fluctuation-induced threshold voltage variation-immune ge FinFET with metal-interlayer-semiconductor source/drain," *IEEE Trans. Electron Devices*, vol. 63, no. 11, pp. 4167–4172, Nov. 2016. doi: 10.1109/TED.2016.2606511.
- [21] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE J. Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, Oct. 1989. doi: 10.1109/JSSC.1989.572629.
- [22] D. D. Lu, C.-H. Lin, A. M. Niknejad, and C. Hu, "Compact modeling of variation in FinFET SRAM cells," *IEEE Des. Test Comput.*, vol. 27, no. 2, pp. 44–50, Mar./Apr. 2010. doi: 10.1109/MDT.2010.39.
- [23] K. J. Kuhn *et al.*, "Process technology variation," *IEEE Trans. Electron Devices*, vol. 58, no. 8, pp. 2197–2208, Aug. 2011. doi: 10.1109/TED.2011.2121913.
- [24] Y. X. Liu et al., "On the gate-stack origin threshold voltage variability in scaled FinFETs and Multi-FinFETs," in Proc. Symp. VLSI Technol. Tech. Dig., Jun. 2010, pp. 101–102. doi: 10.1109/VLSIT.2010.5556187.
- [25] Q. Zhang et al., "Experimental study of gate-first FinFET thresholdvoltage mismatch," *IEEE Trans. Electron Devices*, vol. 61, no. 2, pp. 643–646, Feb. 2014. doi: 10.1109/TED.2013.2295715.
- [26] C.-H. Lin *et al.*, "Channel doping impact on FinFETs for 22 nm and beyond," in *Proc. Symp. VLSI Technol. Tech. Dig.*, Jun. 2012, pp. 15–16. doi: 10.1109/VLSIT.2012.6242438.
- [27] A. Naeemi, R. Venkatesan, and J. D. Meindl, "Optimal global interconnects for GSI," *IEEE Trans. Electron Devices*, vol. 50, no. 4, pp. 980–987, Apr. 2003. doi: 10.1109/TED.2003.812104.
- [28] A. Alizadeh and R. Sarvari, "On temperature dependency of delay for local, intermediate, and repeater inserted global copper interconnects," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 23, no. 12, pp. 3143–3147, Dec. 2015. doi: 10.1109/TVLSI.2014.2379954.
- [29] S. Dhar and M. A. Franklin, "Optimum buffer circuits for driving long uniform lines," *IEEE J. Solid-State Circuits*, vol. 26, no. 1, pp. 32–40, Jan. 1991. doi: 10.1109/4.65707.
- [30] B. S. Cherkauer and E. G. Friedman, "Design of tapered buffers with local interconnect capacitance," *IEEE J. Solid-State Circuits*, vol. 30, no. 2, pp. 151–155, Feb. 1995. doi: 10.1109/4.341744.