Instead of buffering the output of one iteration and using the same
resources again, one could simply cascade the iterative CORDIC, which
means rebuilding the basic CORDIC structure for each
iteration. Consequently, the output of one stage is the input of the
next one, as shown in Figure 1.5, and in the face of
seperate stages two simplifications become possible. First, the shift
operations for each step can be performed by wiring the
connections between stages appropriately. Second, there is no need for changing constant
values and those can therefore be hardwired as well.
Figure 1.5: Unrolled CORDIC
The purely unrolled design only consists of combinatorial components
and computes one sine value per clock cycle. Input values find their
path through the architecture on their own and do not need to be
controlled.
Obviously the resources in an FPGA are not very suitable
for this kind of architecture. As we talk about a bit-parallel unrolled
design with 16 bit wordlength, each stage contains 48 in- and outputs
plus a great number of cross-connections between single stages. Those
cross-connections from the x-path through the shift components to the
y-path and vice versa make the design difficult to route in an FPGA and
cause additional delay times. From table 1.1
it can be seen how performance and resource usage change with the
number of iterations if implemented in an XILINX FPGA
XC4010E. Naturally, the area and therefore the maximum path delay
increase as stages are added to the design where the path delay is an
equivalent to the speed which the application could run at.
Table 1.1: Performance and CLB usage in an XC4010E
No. of Iterations
8
9
10
11
12
13
complexity [CLB]
184
208
232
256
280
304
max path delay[ns]
163.75
177.17
206.9
225.72
263.86
256.87
As described earlier, the area in FPGAs can be measured in CLBs, each
of which consist of two lookup tables as well as storage cells with
additional control components [12]. For the purely
combinatorial design the CLB's function generators perform the add and
shift operations and no storage cells are used. This means registers
could be inserted easily without significantly increasing the
area. Pipelining adds some latency, of course, but the
application needs to output values at 48kHz and the latency for 14
iterations equals 312.5s which is known to be inperceptible. However,
inserting registers between stages would also reduce the maximum path
delays and correspondingly a higher maximum speed can be
achieved. Table
1.2 shows how the area versus speed trade
off is affected by different pipelining methods.
Table 1.2: Performance and CLB usage for various methods of pipelining
in an XC4010E
No. of Iterations between Registers
1
2
3
4
8
13
Complexity [CLB]
313
308
304
304
304
304
max. Frequency [MHz]
24.4
18.3
14.2
9.7
6.2
3.7
The values are taken from report files generated by the XILINX
Foundation Series software when implementing the unrolled designs. It
can be seen, that the number of CLBs stays almost the same while the
maximum frequency increases as registers are inserted. The reason for
that is the decreasing amount of combinatorial logic between
sequentiell cells. Obviously, the gain of speed when inserting
registers exceeds the cost of area and makes therefore the fully
pipelined CORDIC a suitable solution for generating a sinewave in
FPGAs. Especially if a sufficient number of CLBs is at one's disposal,
as is the case in high density devices like XILINX's Virtex or
ALTERA's FLEX families, this type of architecture becomes more and
more attractive.
Next:A Bit-Serial Iterative CORDIC Up:Implementation of various CORDIC Previous:A Bit-Parallel Iterative CORDIC Home Norbert Lindlbauer 2000-01-19