A PARALLEL AND PIPELINED ARCHITECTURE FOR CORDIC ALGORITHM

The COordinate Rotation DIgital Computer (CORDIC) algorithm is an efficient algorithm to calculate the iteratively phase and magnitude or the vector rotations in linear, hyperbolic and circular coordinate system. The existing CORDIC method takes less clock frequency with high delay. To overcome this problem, a new version of updated parallel and pipelined architecture is designed without degrading the performance. It provides highest maximum frequency with less delay by splitting the critical path into several smaller delay paths with enhanced circuit processing time. The designed architecture in this study can be used in navigation application. This method is implemented in the Xilinx ISE tool.


I. INTRODUCTION
In June 1956, the CORDIC algorithm and a computerized design for executing CORDIC algorithm introduced to Convair management as a technical report. During the preparation of the report, it was realized by VOLDER that the same computerized design could be comfortably alter to generate, logarithmic functions, hyperbolic coordinate rotation and exponential expressions. In [1] different characteristics of CORDIC algorithm are described. Also, it is implemented in the Field Programmable Gate Array (FPGA) processor. In [2] CORDIC rotator algorithm is described by setting the scale factor constant in order to perform the iteration. It offers 50% reduction in iteration.
In [3], a modified CORDIC algorithm with a new attractive Fast Fourier Transform (FFT) is described. It is used in the opposite ends of the computer power spectrum. In [4], the architecture of FPGA implementation and optimization measures is described according to hardware sources, angular coverage and computing precision of the algorithm. The speed and accuracy of this algorithm is high.
In [5], a serial pipelined FFT on FPGA using CORDIC algorithm is described. To enhance the performance of FFT, it utilizes the pipelined structure, dual port structure and radix-2 decimation in time. In [6], three reconfigurable CORDIC designs; CORDIC that works either hyperbolic circular in rotation mode, CORDIC that works both hyperbolic and circular in vectoring mode and CORDIC that work in both hyperbolic and circular in any mode are described.
In [7], a design of direct digital synthesizer utilizing CORDIC algorithm is discussed. It is programmed into FPGA for verification. In [8], an enhanced mixed scaling rotation CORDIC algorithm is described. It offers higher signal to noise ratio performance by amplifying the factor by multiplying the rational sequence to the equivalent signed-power-of-two conditions.
In [9], a reduced memory CORDIC architecture with pipeline is described. It can be used for any radix size FFT and this avoid the storing the angles and twiddle factors. In [10], a complex 128 point FFT processor utilizing rolling back and parallel method is described. This method provides fast speed with low power.
In [11], a power of two point discrete cosine transform based CORDIC algorithm is described. It overcomes the lack of synchronization problem by reusing the uniform processing cell. In [12], a FFT design using radix 2/4/8 with single path delay feedback structure is described. It includes complex multipliers that contains 3 real multiplications and decreased cosine/sine tables.
In this paper, a parallel-pipelined architecture for CORDIC algorithm is presented. The organization of this paper is as follows: The methods and materials used in this study are explained in section 2. Section 3 gives the results and discussion and section 4 describes the conclusion.

II. METHODS AND MATERIALS
The hardware implementation of CORDIC arithmetic is shown in Fig. 1. It consists of three inputs X, Y and Z and also a look up table to store the values of tan-12-i and two shifters to supply the values 2-iX and 2-iY. Here all the multiplication operations are converted to simple shift operations. At the start of a calculation of initial values that are fed into the register by the multiplexer wherever the MSB of the stored value within the Z-branch determines the operation mode for the adder-subtractor. The signals of X and Y branch passes the shift units and then they are added to or subtracted from different path of non-shifted signals. Sine and Cosine waveforms are directly given by the CORDIC algorithm which acts as a quadrature phase-to-amplitude converter [13].

Fig.1. Iterative Architecture of CORDIC
The hardware structure of the parallel CORDIC architecture is shown the Fig. 2. It is a data-driven circuit. The numbers of iterations performed is based on the precision of the bits required. Thus "n" number of iterations is required for "n" bits of precision. Moreover, "n-1" sets of are required for "n" bits of precision. So, 3(n-1) adder/subtractor circuit, 2 (n-1) Shifters circuits are required. The various components required in one set or each iteration is:

Fig.2 Parallel Architecture of CORDIC
The proposed parallel pipelined architecture is shown in Fig. 3. Previously the circuit was dependent on data-driven property. But now with the presence of register in middle of stages, it has changed to clock driven based circuit. Thus, the stages are now independent and are not adjacent.

Fig. 3 Parallel-Pipelined Architecture of CORDIC
If the first iteration begins at first clock cycle, the second iteration begins at second clock cycle, then third at third cycle and so on. As a result, the delay gets reduced and the speed of computation increases. But due to the registers the area gets increased. Consider a unit length vector with one end point at vertex and . If this is rotated by an angle  , its new point will be Fig. 4. Thus,  Cos and  Sin can be calculated by finding the co-ordinates of the new point [14].

Fig. 4 Vector in Cartesian coordinates
In case of the vector length not equal to unity and it is rotated by an angle θ and the new coordinates of the point     Fig. 5 of the vector after rotation is given by the Cartesian geometry formulas: (2)

Fig. 5 Rotation of a vector
It is clear in the above equations that the  Cos term provides scaling, which means it reduces the magnitude of the vector as 1  

Cos
. So, by removing the  Cos term from the above equations, the magnitude of the vectors is getting increased by the factor  1  Cos as shown in Fig. 6.

Fig. 6 Pseduo Rotations
With the increase of i values, the values of i Tan and i  goes on decreasing.
After each iteration the gets added or subtracted to the angle accumulator. Let z represents the angle accumulator.
Thus, the equations become Here the di term act as the deciding factor to perform the addition and the subtraction operation in the equations. The value of i d is nothing but the sign of The different modes of CORDIC algorithms [15] are used to calculate different functions. There are two (2) modes: Rotation and Vector

 CORDIC Rotation Mode
In this, the sign of i d depends on the sign of the   1  i Z and this which makes "z" converge to 0, and it is known as "rotation mode".
In conventional CORDIC these angles are to form all other angles 45, 26.6, 14, 7.1, 3.6, 1.8, .9, 0.4. Taking 30 degree as an example as mentioned in Table 1 and described in Fig. 7.

Fig.7 Rotations for 30 degrees
To avoid the storing of constant value of scaling constant in order to save area, we start with values as X = 1/K [X = 0. 6072529 and Y = 0].For result with "n" bits of precision, "n" CORDIC iterations are necessary [16]. As Z(n) tends to "0", X(n) and Y(n) tends to Z Cos and Z Sin . Range of angles covered is -99.7 o ≤ z ≤ 99.7 o .where 99.7 o is total sum of all the angles in look-up-table.

 CORDIC Vectoring Mode
The CORDIC equation becomes: Figure 8 describes the vectoring mode of a point inclined at 30 degree and the iterative values of the process are mentioned in Table 2.   , at the beginning we take X = 1 and Z = 0 However, one can take advantage of below formula to limit the range of fixed-point numbers encountered.

III. RESULTS AND DISCUSSION
For verifying the results, a simulation in Xilinx of the CORDIC algorithm is realized and the output is shown in Fig. 9. In VHDL, all these implementations are designed with the help of ISE environment and ISIM simulator. The earlier circuits are synthesized by Xilinx spartan XC5VTX240T device.

Fig. 9 Simulation Output
The generation of input vector is done in such a way that all the four quadrants gets covered and standardized to have a magnitude equal to 1. For 16bit input, A (1,6) is the format of fixed point. As the CORDIC convergence range is , each input vector has to be rotated by an angle of 2 /   , moving every vector to the forth and first quadrant such that the range can be increased. Table 3 gives the results of comparison among the three methodssequential, parallel, pipelined and parallel methods. It is inferred from Table 3 that the parallel-pipelined method has the highest maximum-clock frequency which is 464.563 (approx.). The length of the critical path of the circuit is reduced with the introduction of pipelining. Thus, this method has the lowest delay when compared to all three.

IV. CONCLUSION
In this paper, a parallel-pipelined architecture for CORDIC algorithm is discussed. Though there is an increase in area with the addition of registers, delay is reduced drastically. When compared to existing methods the parallel-pipelined architecture has the highest max-clock frequency. Since real time data acquisition is the need of the hour, this method has enormous scope in real time processing. It can be used in navigation applications, radar signal processors and unmanned aerial vehicle (UAV"s) that require high computational speed. CORDIC is definitely a light at the end of the tunnel because they are used in super computers, which is an evolving technology.