In the previous sections, we described various methods of implementing
the CORDIC algorithm using an FPGA. The resulting structures show
differences in the way of using resources available in the target FPGA
device. Table 1.3 illustrates how the
architectures for the iterative bit-serial and iterative bit-parallel
designs for 16 bit resolution vary in terms of speed and area. The
bit-serial design stands out due to it's low area usage and high
achievable speed. Whereas the latency and hence the maximum throughput
rate is much lower compared to the bit-parallel designs. The
bit-parallel unrolled and fully pipelined design (see Table
1.2) uses the resources extensively
but shows the best latency per sample and maximum throughput rate. The
prototyping environment limited the implementation of the unrolled
design to 13 iterations. The iterative bit-parallel design provides a
balance between unrolled and bit-serial design and shows an optimum
usage of the resources in a XILINX XC4010E.
In actual fact it would be more accurate to look at the resources
available in the specific target devices rather than the specific
needs in order to determine what architecture to use. The bit-serial
structure is definitely the best choice for relatively small devices,
but for FPGAs where sufficient CLBs are available one might choose the
bit-parallel and fully pipelined architecture since latency is minimal
and no control unit is needed.
Table 1.3: Performance and CLB usage for the bit-parallel and
bit-serial iterative designs.