A Bit-Serial Iterative CORDIC

Center for New Music and Audio Technologies

Next: Comparison of the Various Up: Implementation of various CORDIC Previous: A Bit-Parallel Unrolled CORDIC

A Bit-Serial Iterative CORDIC

Both, the unrolled and the iterative bit-parallel designs, show disadvantages in terms of complexity and path delays going along with the large number of cross connections between single stages. To reduce this complexity one could change the design into a completely bit-serial iterative architecture. Bit-serial means only one bit is processed at a time and hence the cross connections become one bit-wide data paths. Clearly, the throughput becomes a function of

$\displaystyle \frac{clock\;rate}{{number\;of\;iterations}\; \times \;{word\;width}}.$

In spite of this the output rate can be almost as high as achieved with the unrolled design. The reason is the stuctural simplicity of a bit-serial design and the correspondingly high clock rate achievable. Figure 1.6 shows the basic architecture of the bitserial CORDIC processor as implemented in a XILINX Spartan.

**Figure 1.6:** *Bit-serial CORDIC*
$\begin{figure} \centerline {\epsfig{figure=cordic.bitserial2.eps,width=120mm,clip=}}\end{figure}$

In this architecture the bit-serial adder-subtractor component is implemented as a fulladder where the subtraction is performed by adding the 2's complement of the actual subtrahent [13]. The subtraction is again indicated by the sign bit of the angle accumulator as described in section 1.2.1. A single bit of state is stored at the adder to realize the carry chain [14] which at the same time requires the LSB to be fed in first. The shift-by-i operation can be realized by reading the bit i-1 from it's right end in the serial shift registers. A multiplexer can be used to change position according to the current iteration. The initial values

and

are fed into the array at the left end of the serial-in - serial-out register and as the data enters the adder component the multiplexer at the input switch and map back the results of the bit-serial adder into the registers. The constant LUT for this design is implemented as a multiplexer with hardwired choices. Finally, when all iterations are passed the input multiplexers switch again and initial values enter the bit-serial CORDIC processor as the computed sine values exit.

The design as implemented runs at a much higher speed than the bit-parallel architectures described earlier and fits easily in a XILINX SPARTAN device. The reason is the high ratio of sequential components to combinatorial components. The performance is constrained by the use of multiplexers for the shift operation and even more for the constant LUT. The latter could be replaced by a RAM or serial ROM where values are read by simply incrementing the memory's address. This would clearly accelerate the performance but since optimization for one particular FPGA device falls outside the slope of this paper, we will not consider it further.

Next: Comparison of the Various Up: Implementation of various CORDIC Previous: A Bit-Parallel Unrolled CORDIC
Home

Norbert Lindlbauer
2000-01-19