# Low-Power Hybrid Structure of Digital Matched Filters for Direct Sequence Spread Spectrum Systems Sung-Won Lee and In-Cheol Park Department of Electrical Engineering and Computer Science, KAIST 373-1 Guseong-Dong Yuseong-Gu, Daejeon 305-701, Republic of Korea Email: swlee@ieee.org, icpark@ee.kaist.ac.kr Abstract -- This paper presents a low-power structure of digital matched filters (DMFs), which is proposed for direct sequence spread spectrum systems. Traditionally, low-power approaches for DMFs are based on either the transposed-form structure or the direct-form one. A new hybrid structure that employs the direct-form structure for local addition and the transposed-form structure for global addition is used to take advantages of both structures. For a 128-tap DMF, the proposed DMF that processes 32 addends a cycle consumes 46 % less power at the expense of 6 % area overhead as compared to the state-of-the-art low-power DMF [7]. #### I. Introduction Direct sequence spread spectrum (DSSS) systems are the most popular wireless communication systems, and are being used for WLAN, IS-95, and WCDMA because of low power spectrum density and good resistance to multipath fading [1][2]. In the DSSS systems, matched filters (MFs) are widely employed for code acquisition, as they provide fast synchronization speed as well as reliable operation. Other techniques such as non-correlation methods [3] or threshold decoding methods [4] do not provide competitive performance in real environment. Growing demands for portable, battery-powered systems necessitate low-power MFs to reduce the power consumption of code acquisition circuits. Low-power MFs can be implemented by using either analog or digital technology. For short and fast MFs, analog implementation is more power efficient than digital counterpart [5]. As CMOS technology advances rapidly and the reference code length in modern wireless communication systems increases, significant attention has focused on low-power digital matched filters (DMFs) [6-8]. Traditional low-power approaches for DMFs are based on either the transposed-form structure or the direct-form structure. Due to the fast speed, low-power techniques such as differential coefficient scheme are first developed based on the transposed-form structure [6]. Although the schemes considerably save computation power, it is hard to reduce the large power consumed by memory elements that switch every cycle. As a result, the alternative direct-form structure is considered later. The register file adopted for storing the input replaces the tapped delay line structure, resulting in a large saving of switching power. To further reduce power by minimizing the number of additions, the prefilter [7] is employed. Pipelined and parallel additions [7][8] are also considered to compensate the speed penalty induced by the direct-form structure. As the low-power approaches have been developed separately based on the two structures each of which considers only one of two extreme cases, one addition a cycle and all additions a cycle, there are many chances to further reduce power by exploring the middle structure combining the two structures. To take advantages of both direct-form and transposed-form structures, we propose in this paper a low-power DMF based on a new hybrid structure. Compared to the state-of-the-art low-power scheme [7], 46 % power is reduced at the area overhead of 6 %. #### II. LOW-POWER DIGITAL MATCHED FILTER An *n*-tap direct-form DMF is expressed as below $$y[k] = \sum_{i=0}^{n-1} c_i x[k-i]$$ (1) where $c_i$ , $0 \le i < n$ , denotes the filter coefficients and x[i], i > 0, denotes the received signal. The transposed-form structure corresponding to the n-tap DMF is expressed as follows $$y[k] = v_n[k]$$ $$v_i[k] = c_{n-i}x[k] + v_{i-1}[k-1], 1 < i \le n$$ $$v_1[k] = c_{n-i}x[k]$$ (2) where $v_i[k]$ stores the intermediate result of output, y[k]. Based on (1) and (2), the direct-form and the transposed-form implementations of a 4-tap DMF are derived as shown in Figure 1. The transposed-form structure accumulates the output every cycle. As the number of addends is much less than that of the direct-form structure, the transposed-form structure can operate at a higher clock frequency. On the other hand, the large fanout and registers updated every cycle lead to more power consumption. Both structures compute extreme cases, i.e., either one addend a cycle or n addends a cycle. A different number (a) 4-tap direct-form digital matched filter (b) 4-tap transposed-form digital matched filter. Figure 1. Conventional digital matched filter. of addends in between one and n may produce more power-efficient results. Nevertheless, this issue has not been discussed. ## A. Hybrid Structure To explore the middles between the two extreme structures, we modify the transposed form equation (2) to control number of addends in a cycle. Let s be the summation degree that means the number of addends to be added in a cycle. Then n can be represented as $s \times t + r$ such that 1 < s < n, 0 < t, and $0 \le r < s$ where all s, t, and r are integers. The equation (2) can be rewritten as follows $$y[k] = \begin{cases} \sum_{j=0}^{r-1} c_{j}x[k-j] + v_{t}[k-r] &, r \neq 0 \\ v_{t}[k] &, r = 0 \end{cases}$$ $$v_{i}[k] = \sum_{j=0}^{s-1} c_{n-s \times i+j}x[k-j] + v_{i-1}[k-s] &, 1 < i \le t \end{cases}$$ $$v_{1}[k] = \sum_{j=0}^{s-1} c_{n-s+j}x[k-j]$$ $$(3)$$ Based on (3), we propose a new hybrid structure for the DMF. As each $v_i[k]$ is computed by using the direct-form equation, we call this a local direct-form structure. All $v_i[k]$ share the input x[k-j], and thus we call this a global transposed-form structure. For example, a 4-tap DMF based on the hybrid structure with s=2 is shown in Figure 2. Compared to the conventional DMFs, the hybrid structure has several advantages as can be seen in Figure 2. For an output computation, the addition takes place at every *s* cycles, which implies a register file can replace the memory elements consuming a lot of power in order to reduce switching activities. In addition, carry-save addition can be accommodated without extra memory elements, and the number of input fanouts is reduced to 1/s of the initial value. Although the computation speed of the hybrid DMF is slower than that of the transposed-form DMF, it is still faster than the direct-form DMF. Figure 2. Proposed hybrid structure of digital matched filter. **Figure 3.** Hybrid digital matched filter employing differential coefficient scheme. ### B. Applying the Differential Coefficient Scheme To reduce the number of computations in a DMF, the differential coefficient scheme was proposed to exploit the run property of PN sequences was proposed [6]. By taking the difference between two consecutive filter coefficients as new coefficients, the output function in (1) can be modified as follows $$y[n] = \{-c_{n-1}x[0] + (c_{n-1} - c_{n-2})x[1] + \cdots + (c_1 - c_0)x[n-1] + c_0x[n]\} + y[n-1]$$ $$= \sum_{i=0}^{n} d_i x[n-i] + y[n-1]$$ (4) where $d_i$ , $0 \le i \le n$ , is the filter coefficient of the new DMF. From (4), $d_0$ is $c_0$ , $d_i$ , 0 < i < n, is $c_i - c_{i-1}$ , and $d_n$ is $-c_{n-1}$ . Figure 3 shows the hybrid structure obtained by applying the differential coefficient scheme to the DMF shown in Figure 2. If the oversampling rate of a DMF is m, the number of multiplication and addition is reduced to 1/2m approximately by applying the differential coefficient scheme [6]. This reduction is very attractive in achieving low power consumption. Unfortunately, the differential coefficient scheme has not been employed in the current low-power DMFs because it draws more power in the direct-form structure [7][8]. However, as the proposed hybrid structure is globally transposed form structure, it is well fit for the differential coefficient scheme. #### C. Optimal Summation Degree s for Low Power The specification of a target DMF is needed to investigate the power characteristics of the proposed hybrid structure by varying *s*. Three key design parameters of the DMF that are carefully determined are listed in Table 1. The 4-bit input word length induces only a 0.1 dB performance loss in a scalar Gaussian channel at a marginal circuit complexity, and the oversampling rate of 4 is conservative and accurate enough for most applications [2]. The 128-tap DMF performs coherent integration over a 32-chip interval, which results in the processing gain of 15 dB. To determine the optimal summation degree s in terms **Table 1.** Key design parameters of the digital matched filter. | Design parameter | Value | |-------------------|----------------| | Input word length | 4 bits | | Oversampling rate | 4 samples/chip | | Tap number | 128 taps | **Figure 4.** Reference architecture of hybrid digital matched filter with summation degree *s*. of low power, a reference architecture that can be configured with varying s is designed. Figure 4(a) shows the overall architecture, which mainly consists of an input receiving unit, an add unit, and a register file. The input receiving unit shown in Figure 4(b) exploits the fact that m consecutive filter coefficients are equal in an m oversampling DMF. Under the differential coefficient scheme, the last m-1 coefficients become zero. Therefore, the first m input storing part is implemented with a register file instead of shift registers to avoid unnecessary switching power consumption. The rest of the input storing part is implemented using shift registers that are activated every m cycles. The add unit employs a carry-save adder structure, which computes a number of addends as a whole at the cost of a little timing overhead and has a regular layout as well. How to perform 2's complement signed addition in the add unit is presented in Figure 4(c). Among s/m addends, the addends with zero coefficient are omitted for the sake of less hardware and power saving. The sign bit inversion technique is used to reduce the hardware complexity of sign extension. The complementary constant that can be determined earlier is subtracted from the result at the last addition step. Filter coefficients with minus sign need negation that is implemented by using 1's complement numbers first and then adding 1s in the empty location as in Figure 4(c) or in the last steps. By managing the empty locations that would be assigned to 1s, we can make this constant disjoint with the 4-bit value. Finally, register files are used to reduce the number of switching activities of flip-flops (FFs). **Figure 5.** Power consumption and area (gate count) of the hybrid digital matched filter with varying s = 8, 16, 32, or 64. The dynamic power consumption of the DMF is estimated by using the following equation in [8] $$P = \frac{V_{dd}^{2}}{2} \sum_{\forall ner(i)} Cl_{i} \cdot Pn_{i} + \sum_{\forall cell(j)} E_{j} \cdot Pc_{j}$$ (5) where $Cl_i$ and $Pn_i$ are the total load capacitance and the switching probability for node i, $E_j$ and $Pc_j$ are the internal energy and the switching probability of the output node for logic cell j, respectively. In the estimation, $Cl_i$ and $E_j$ are characterized in the technology library [9]. $Pn_i$ and $Pc_j$ are obtained from the gate-level simulation. In Figure 5, the power consumption and the gate count of the hybrid DMF are compared with varying s. The gate count becomes smaller as the summation degree s increases. On the other hand, the power consumption is optimal at s=32 with a little area overhead. Therefore, the power-optimal s is set to 32 for the hybrid DMF. The final architecture is shown in Figure 6. # III. EVALUATION To evaluate the performance of the proposed low-power DMF, we compare it with the state-of-the-art low-power DMFs found in the literature. Since other DMFs are designed in different technologies and have different optimization levels, their architecture characteristics are extracted and redesigned with a standard cell library [9] in order to make the comparison fair. As the control part occupies a very small area and consumes a little power relatively, it is not included in the comparison results summarized in Table 2. In [7], the prefilter is employed to eliminate repeated identical additions. As a result, the number of total additions is reduced down to 1/m. A pipelined adder tree is used to boost the throughput. In [8], almost the same architecture as [7] is adopted except that m identical adder trees are used instead to enhance the throughput. In [6], the differential coefficient scheme is proposed and applied to the transposed-form DMF. However, the result is neither power efficient nor area efficient. As compared to the proposed DMF, it occupies 21 % more area and consumes 7 times more power. Note that Liu's and the proposed DMF (in part) are commonly based on the transposed- Figure 6. The proposed low-power hybrid digital matched filter (4 bits, 128-tap, 4× oversampling). **Table 2.** Performance evaluation of the proposed hybrid low-power digital matched filter (4 bits, 128-tap, 4× oversampling). | | Liou [7]† | Goto [8]‡ | Liu [6] | Proposed | |---------------------------------|-----------------|----------------|--------------------------|------------------------| | Area | 10623 gates | 14372 gates | 13624 gates | 11261 gates | | # of Switching FFs | 225 | 53 | 1306 | 79 | | Power dissipation | 0.908 mW/MHz | 0.526 mW/MHz | 3.384 mW/MHz | 0.488 mW/MHz | | Architecture<br>Characteristics | Direct-form | | Transposed-form | Hybrid-form | | | Register file | | Shift register | Shift reg. + Reg. file | | | Prefilter | | Differential coefficient | | | | Pipelined Adder | Parallel Adder | Carry-Select Adder | Carry-Save Adder | <sup>†: 6-3</sup> compressors are replaced by full adders. tion Center (IDEC) form structure, but the number of switching FFs of the proposed is much smaller than that of Liu's. Compared to Liou's, the proposed scheme dissipates 46 % less power with 6 % of area overhead. On the other hand, Goto's has 35% area overhead and achieves 42 % power reduction. In addition, the proposed DMF has a comparable critical path delay to that of Liou's and Goto's without applying throughput-boosting schemes such as pipelining or parallel computing. ### IV. CONCLUSIONS In this paper, a low-power DMF structure has been proposed for direct sequence spread spectrum communication systems. The proposed DMF is based on a hybrid structure that employs the direct-form structure for local addition and the transposed-form structure for global addition to take advantages of both structures. The proposed structure leads to the reduction of switching memory elements by adopting register files. In addition, substantial number of computations is reduced by employing the differential coefficient scheme. Using the proposed hybrid structure, we achieved 46 % power reduction with 6 % area overhead, compared to the state-of-the-art low-power DMF. Further power reduction is expected by applying transistor-level circuit techniques to the adders and the register files. Spread Spectrum Communications. Englewood Cliffs, NJ: Prentice-Hall, 1995. [2] S. S. Rappaport and D. M. Grieco, "Spread spectrum signal acquisition: method and technology." *IEEE Commun. Mag.* This work was supported by the Korea Science and Engineering Foundation through the MICROS center and IC Design Educa- REFERENCES [1] R. L. Peterson, R. E. Ziemer, and D. E. Borth, Introduction to [2] S. S. Rappaport and D. M. Grieco, "Spread spectrum signal acquisition: method and technology," *IEEE Commun. Mag.*, vol. 22, pp. 6-21, June 1984. [2] B. B. Word and K. B. Viv. "Acquisition of psychological signal." [3] R. B. Ward and K. P. Yiu, "Acquisition of pseudo-noise signal by recursion aided sequential estimation," *IEEE Trans. Commun.*, vol. 25, pp. 778-784, Aug. 1977. [4] G. L. Stuber, J. W. Mark, and I. F. Blake, "Sequential estimation using bit estimation techniques," *Inform. Sci.*, vol. 32, pp. 217-229, 1984. [5] M. D. Hahm, E. G. Friedman, and E. L. Titlebaum, "A comparison of analog and digital circuit implementations of low power matched filters for use in portable wireless communication terminals," *IEEE Trans. Circuits Syst. II*, vol. 44, pp. 498-506, June 1997. [6] K. Liu, W. Lin, and C. Wang, "A pipelined digital differential matched filter FPGA implementation & VLSI design," in *Proc. IEEE Custom Integrated Circuit Conf.*, 1996, pp. 75-78. [7] M. Liou and T. Chiueh, "A low-power digital matched filter for direct-sequence spread-spectrum signal acquisition," *IEEE J. Solid-State Circuits*, vol. 31, pp. 933-943, June 2001. [8] S. Goto, T. Yamada, N. Takayama, Y. Matsushita, Y. Harada, and H. Yasuura, "A low-power digital matched filter for spread-spectrum systems," in *Proc. IEEE Int. Symp. Low Power Electronics and Design*, 2002, pp. 301-306. [9] 0.25 µm 2.5 V CMOS Standard Cell Library Data Book. Samsung Electronics Co., Ltd., 2000. ## ACKNOWLEDGEMENT <sup>‡:</sup> Threshold judgment is not considered for its loss of accuracy.