

## Key Design Features

- Synthesizable, technology independent VHDL Core
- 32-bit floating-point arithmetic
- IEEE 754 compliant<sup>1</sup>
- High-speed fully pipelined architecture
- Only 4 clock-cycles of latency

## Applications

- Floating-point pipelines and arithmetic units
- Floating-point processors

### Pin-out Description

| Pin name    | <i>I/O</i> | Description                           | Active state |
|-------------|------------|---------------------------------------|--------------|
| clk         | in         | Synchronous clock                     | rising edge  |
| en          | in         | Clock enable                          | high         |
| v1 [31:0]   | in         | Input operand 1 in IEEE<br>754 format | data         |
| v2 [31:0]   | in         | Input operand 2 in IEEE<br>754 format | data         |
| vout [31:0] | out        | Output result in IEEE 754 format      | data         |

# Functional Specification

| Operand v1    | Operand v2    | Result                                                                 |
|---------------|---------------|------------------------------------------------------------------------|
| Standard IEEE | Standard IEEE | v1 * v2                                                                |
|               |               | If  v1 * v2  > MaxFloat then result is:<br>[sign(v1) xor sign(v2)] Inf |
|               |               | If  v1 * v2  ≤ MinFloat then result is:<br>[sign(v1) xor sign(v2)] 0   |
| NaN           | Anything      | NaN                                                                    |
| Anything      | NaN           | NaN                                                                    |
| +/- Inf       | +/- 0         | NaN                                                                    |
| +/- 0         | +/- Inf       | NaN                                                                    |
| +/- Inf       | Standard IEEE | [sign(v1) xor sign (v2)] Inf                                           |
| Standard IEEE | +/- Inf       | [sign(v1) xor sign (v2)] Inf                                           |
| +/- 0         | Standard IEEE | [sign(v1) xor sign (v2)] 0                                             |
| Standard IEEE | +/- 0         | [sign(v1) xor sign (v2)] 0                                             |







### **General Description**

IEEE\_MULT (Figure 1) is a high-speed fully pipelined 32-bit bit floatingpoint multiplier based on the IEEE 754 standard. The arrangement of the 32-bit floating-point number is summarized below:



All input and output values comply with the IEEE 754 specification The real number representation is calculated according to the formula:

$$Value = -1(S) * 2^{(E-127)} * 1.M$$

There are two exceptions to the IEEE 754 specification. The first being that denormalized numbers are treated as zero throughout the implementation, and the second being that symmetric arithmetic rounding is employed (round half-up).

Other points to note are that NaN is always generated as the value 0xFFC00000. The maximum floating-point value that may be represented is 0x7F7FFFFF or 0xFF7FFFFF (+/- MaxFloat). Likewise, the minimum floating-point value that may be represented is 0x00800000 or 0x80800000 (+/- MinFloat). This means that a real number lies in the range:

$$2^{-126} \le Value \le 2^{127} (2 - 2^{-23})$$

All values are sampled on the rising clock-edge of clk when en is high. The function has a 4 clock-cycle latency<sup>2</sup>.

<sup>1</sup> Some minor features diverge from the IEEE 754 specification

<sup>2</sup> The design may be optimized for higher-speed or lower latency on request. Please contact Zipcores.



### **Functional Timing**

Figure 2 demonstrates the multiplication: 0x40010000 \* 0x40407000 = 0x40C0880E (or 2.0010 \* 3.0068 = 6.0166 in real numbers). The result has a latency of 4 clock cycles.



Figure 2: Multiplication of two floating-point numbers

#### Source File Description

All source files are provided as text files coded in VHDL. The following table gives a brief description of each file.

| Source file         | Description          |
|---------------------|----------------------|
| ieee_mult.vhd       | Top-level component  |
| ieee_mult_bench.vhd | Top-level test bench |

### **Functional Testing**

An example VHDL testbench is provided for use in a suitable VHDL simulator. The compilation order of the source code is as follows:

- ieee mult.vhd 1
- 2. ieee\_mult\_bench.vhd

The simulation must be run for at least 2 ms during which time an input stimulus of randomized floating-point numbers will generated at the multiplier input.

The simulation generates two text files called: ieee\_mult\_in.txt and ieee\_mult\_out.txt. These files respectively capture the input and output floating-point numbers during the course of the test.

# Synthesis

The source file 'ieee\_mult.vhd' is the only file required for synthesis. There are no sub-modules in the design.

The VHDL core is designed to be technology independent. However, as a benchmark, synthesis results have been provided for the Xilinx Virtex 5 and the Altera Stratix III series of FPGA devices. The lowest and highest speed grade devices have been chosen in both cases for comparison.

Resource usage is specified after Place and Route.

| VIRTEX 5                     |               |
|------------------------------|---------------|
| Resource type                | Quantity used |
| Slice register               | 96            |
| Slice LUT                    | 154           |
| Block RAM                    | 0             |
| DSP48                        | 6             |
| Clock frequency (worst case) | 170 MHz       |

220 MHz

#### STRATIX III

Clock frequency (best case)

ν

| Resource type                | Quantity used |  |
|------------------------------|---------------|--|
| Register                     | 214           |  |
| ALUT                         | 210           |  |
| Block Memory bit             | 0             |  |
| DSP block 18                 | 6             |  |
| Clock frequency (worse case) | 210 MHz       |  |
| Clock frequency (best case)  | 270 MHz       |  |

### **Revision History**

| Revision | Change description                                                                          | Date       |
|----------|---------------------------------------------------------------------------------------------|------------|
| 1.0      | Initial revision                                                                            | 30/04/2008 |
| 1.1      | Updated synthesis results                                                                   | 19/08/2009 |
| 1.2      | Updated functional specification. Updated synthesis results in line with minor code changes | 16/09/2011 |
|          |                                                                                             |            |
|          |                                                                                             |            |