# A 2.5-Gbps De-Skew Chip for Very Short Reach (VSR) Interconnects

Wei Tang, student member, IEEE, and David V. Plant, Senior Member, IEEE, Fellow, OSA

Dept. of Electrical and Computer Engineering, McGill University, 3480 University St., Montreal, Quebec, Canada, H3A 2A7 (wei.tang, david.plant)@mcgill.ca

### I. INTRODUCTION

Parallel optical interconnects have drawn lots of attention for the past several years. As the cost of optical transceivers keeps decreasing, parallel optical interconnects are replacing the traditional copper wire connections gradually in highspeed digital systems[1][2]. However, due to the unavoidable transceiver latencies and the transmission distances mismatching, the received data bits on the parallel channels are not always aligned on the time scale, or skewed. For parallel data processing, the data has to be synchronized, or de-skewed.

The method used in this work to de-skew the received data is quite simple and effective[3]. As shown in Fig. 1. The clock is recovered by a phase-locked loop (PLL) from a master channel, channel 1 for example. The other n channels are retimed by the recovered clock through a group of latches. At the output, all the channel outputs are de-skewed.



II. THEORY

We choose to use a half-rate clock data recovery (CDR) circuit to recovery the clock[4]. One of the most distinctive advantages of using the half-rate CDR is that rising/falling edges of the recovered clock are automatically aligned to center of the data width of the master channel for optimum sampling. And also by running the voltage controlled oscillator (VCO) at only half of the data rate, the stringent requirements on designing high performance oscillator are lowered. Data is sampled at both the rising and falling edges of the clock signal. A 2-to-1 multiplexer is used at the output, which severs the functions of both the decision circuit and the latch.

As shown in Fig. 2, a typical timing diagram is used to study the limit of this de-skew scheme.  $\pm D$  is the maximum channel skew that the de-skew method can correct. We can derive the relationship of the channel skew D in terms of T,  $t_r$ , and  $t_f$ . The relation is given in equation (1),

$$D = \frac{1}{2} \times \left( T - \frac{t_r + t_f}{2} \right)^{T} \tag{1}$$

where T is the bit width;  $t_r$  and  $t_f$  are the rising and falling times. The details of D as a function of data rates and rising/falling time are shown in Fig. 3. Typical values of D at



Fig. 2. The timing diagram.



Fig. 3. Maximum allowed skew D vs. data rate.

several specific line rates are listed in Table I. The maximum allowed channel skew is directly related to the rising and falling times.

TABLE I. MAXIMUM ALLOWED CHANNEL SKEW (D) AT SEVERAL SPECIFIC DATA RATES

| Speed 1.25Gbps 2Gbps 2.5Gbps   a. 385ps 235ps 185ps | AT BEVERTE DI ECHTC DATATATES. |          |         |         |  |
|-----------------------------------------------------|--------------------------------|----------|---------|---------|--|
| a. 385ps 235ps 185ps                                | Speed                          | 1.25Gbps | 2Gbps   | 2.5Gbps |  |
|                                                     | a.                             | 385ps    | 235ps   | 185ps   |  |
| b. 332.5ps 182.5ps 132.5ps                          | b.                             | 332.5ps  | 182.5ps | 132.5ps |  |

a: when  $t_r/t_f = 30$  ps; b: when  $t_r = 140$  ps,  $t_f = 130$  ps;

## III. EXPERIMENTAL RESULTS

The chip is designed and fabricated in the IBM CMRF8SF-DM 0.13µm CMOS process. For testing purpose, a single data input is split in to 3 channels (Ch  $0 \sim$  Ch2). These 3 channels go through a skew generation block and pass different combinations of delay blocks. The delays among the 3 channels can be adjusted through an external differential control voltage.

The phase relationship among the 3 channels is described in Fig. 4, where  $t_{d1}$  is the delay between Ch 0 and Ch 1;  $t_{d2}$  is the delay between Ch 1 and Ch 2;  $f_{d1}$  and  $f_{d2}$  are the fixed delays(constants) between channels; t<sub>delay</sub> is the delay cell tunability. The tuning range is 44ps continuously.



Fig. 4. Phase relationship among the 3 channels.

The measured  $f_{d1}$  and  $f_{d2}$  values are 140.8ps and 84.8ps respectively. The 10% ~ 90% rising and falling times are 140ps and 130ps respectively. At 2.5Gbps, we can correct for skew of 132.5ps maximum (Table I.). We find the following relationships.

$$f_{d2} < 132.5 \, ps < f_{d1} < 182.5 \, ps \tag{3}$$

Therefore, we can only de-skew Ch 1 and Ch 2 at 2.5Gpbs. To successfully de-skew all 3 channels, the data rate has to be decreased. This will be proved experimentally.

We tested the de-skew function at two different data rates, 2.32Gbps and 2.5Gbps. It turns out that 2.32Gbps is the upper limit to de-skew all 3 channels because of  $f_{d1}$ . At 2.5Gbps, only Ch 1 and Ch 2 can be successfully de-skewed. Ch 0 can also be successfully recovered. However, the data on Ch 0 is one-bit ahead of Ch 1 and Ch 2 since  $f_{d1} > 132.5ps$ .



(a) Recovered clock (b) Recovered data Fig. 5. The recovered clock and data eye diagram for PRBS 2<sup>31</sup>-1 NRZ data at 2.5Gbps.

The data eye diagrams on all 3 channels before and after de-skew are shown in Fig. 6 and Fig. 7 for 2.32Gbps and 2.5Gbps. Note that although the CDR circuit is able to lock to PRBS  $2^{31}$ -1 NRZ data (Fig. 5), the input data pattern is Manchester encoded for two purposes,

- 1) It is easier to visualize the phase differences between channels with encoded data pattern than the raw PRBS data;
- In order to test the bit error rate (BER) with the recovered half-rate clock, it is convenient to use Manchester encoding.

From Fig. 6 we can see that at 2.32Gbps, all 3 channels are successfully de-skewed. The maximum channel skew among channels are about 140ps. The measured BER by using recovered clock on the 3 channels are all less than  $1 \times 10^{-11}$ , which satisfies SONET specifications.

At 2.5Gbps,  $f_{d1}$  is too large to correct for the de-skew block. So Ch 0 and Ch 1 are still skewed at the output. The phase difference between them is one bit, because the output MUX samples Ch 0 and Ch 1 at the bit cycles next to each other.  $f_{d2}$  is still within the range for de-skewing. Ch 2 is deskewed successfully as shown in Fig. 7(b).



(c) Ch0 & Ch1 after de-skew (d) Ch1 & Ch2 after de-skew Fig. 6. Received data before and after de-skew at 2.32Gbps.



Fig. 7. Received after de-skew at 2.5Gbps.

### IV. CONCLUSION

We designed and implemented a de-skew chip for parallel optical interconnects. The de-skew ability is limited by the line rate and rising/falling time. We modeled and tested the upper boundary of the maximum allowed channel skew that can be corrected by using this method. This is proved to be a simple and effective way at low line rates at around 2.5Gbps. For higher line rates, per-pin de-skew scheme is necessary to successfully de-skew all channels with excessive amount of channel skew.

#### V. ACKNOWLEDGEMENT

This work is supported by the NSERC, industrial and government partners through the Agile All-Photonics Network. The authors gratefully acknowledge A. Li, J.-P. Thibodeau, and J. Faucher.

#### VI. REFERENCE

- L.A.B. Windover, "Parallel-optical interconnects and their applications", Technical Digest of Optical Fiber Communication Conference, OFC 2005, vol. 3, Mar. 2005.
- [2] D. Kuchta, "100 Gb/s-class parallel optical interconnects for high productivity computing systems", Proceeding of Lasers and Electro-Optics Society annual meeting, 2005, LEOS 2005, pp. 583-584, Oct. 2005.
- [3] K. Kim, et al., "Design of 250 Mb/s 10-channel CMOS optical receiver array for computer communication", The First IEEE Asia Pacific Conference on ASICs, AP-ASIC '99, pp. 29-32, Aug. 1999.
- [4] J. Šavoj and B. Razavi, "A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phase detector," J. of Solid-State Circuits, vol. 36, no. 5, pp. 761 – 768, May 2001