

# Introduction to High Level Synthesis (HLS)

What is High Level Synthesis (HLS)?

Develop with C

- User develops FPGA logic with C++
  - > HLS automatically creates HDL version of C++
  - > HDL module becomes a IPCore
- User generates firmware with the

**IPCore** 



## Difficulty with HLS

- C++ was not developed for FPGAs
  - ➤ Missing features: Arbitrary bit size, parallelized logic, ...
  - ➤ Need to create new C++ syntax to be able to handle missing features. Only used in HLS.
    - ap\_int<N> used for arbitrary bit size
    - #pragma used for handling parallelized logic
    - **\*\***...

## Data type: Arbitrary Precision integers

- C++ only provides limit set of integer types
  - >char (8 bits), short (16 bits), int (32 bits), ...
- HLS types library ————
  - ➤ Has templatized class

```
#include <ap_int.h>
ap_uint<9> x; // 9 bit unsigned
ap_int<10> x; // 9 bit signed
```

➤ Variable inside <N> is number of bits

## Data type: Arbitrary Precision integers

ap\_fixed<Total bits,</li>

Integer bits, Rounding

mode, Overflow mode>

```
#include <ap_int.h>

// 9 bit unsigned, 5 bit fraction
ap_ufixed<9,4,AP_TRN, AP_WRAP> x;

// 11 bit unsigned, 7 bit fraction
ap_fixed<11,4,AP_TRN, AP_WRAP> y;
```

Rounding mode: AP\_TRN ( =truncation), AP\_RND, ...

Overflow mode: What to do when number goes above number of bits.

AP\_WRAP: Just removes MSB bit. Doesn't use resources.

AP\_SAT: Saturate values to min and max possible bits.

## Designing multiple HDL modules with C++

- C++ functions usually translated to HDL modules.
  - >Arguments of functions become ports of HDL module



#### C++ arrays in HLS

HLS assumes that C++ arrays are a

memory component.

ap\_uint<16> higgs[4]



- Default memory component is BRAM.
  - ➤ If we use only one input address ports.



| 0 | higgs[0] |
|---|----------|
| 1 | higgs[1] |
| 2 | higgs[2] |
| 3 | higgs[3] |

Address Data

Memory

## C++ arrays in HLS

Array memory can be split up into parts.

➤ block, cycle: Use BRAM, but Figure 57: Array Partitioning block, cycle: Use BRAM, but split array data.

0 1 2 ... N-3 N-2 N-1

Complete: Use CLBs instead (LUTs) for the memory. Can access any LUT any time.



## C++ arrays for interface of TOP function

C++ TOP function can have array as argument

```
void top (ap_uint<16> input[4]; ap_uint<16> output);
```

HLS assumes the input array is a memory chip giving

data to function.



Data is put in sequentially

## Loops in HLS

• A C++ loop can be

A loop with multiple

statements can be pipelined.

translated to many

types of HLS loops

Some loops can

not be pipelined





## Loops in HLS (Unrolling)

Some loop code can be done totally in parallel.

They can be "unrolled". Default in HLS is rolled loops. Iterations



## Concepts/Terms that HLS uses

Iteration latency: Same as

pipeline latency

• Initiation Interval (II): Number of clock cycles for new input.

There is also total latency



Let's say we have below

- Many ways of implementation...
- Set using #pragam



Figure 7: Task Parallelism within a Run



Figure 8: Task Parallelism with Pipelining



#### What does HLS need to make HDL?

C++ function/module source files

C++ testbench source files

Configuration of project: FPGA model, clock freq., ...

#### What can HLS do?

- C++ simulation: Using testbench C++ code, simulate C++ function code.
- C++ synthesis: Translates C++ into HDL
- C++/HDL co-simulation: Using stimuli and golden output from C++, compare with HDL simulation output.

#### C++ simulation

Need to write function file.

• Need to write testbench file. □□



•Simulate!



```
INFO: [vitis-run 60-791] Total elapsed time: 0h 0m 15s
 C-simulation finished successfully
INFO: [SIM 211-210] Code Analyzer finished
```

```
#include "ffn.h"
int main() {
 // Get input
  ap fixed<18,4,AP TRN,AP SAT> x in = 1;
  ap fixed<18,4,AP TRN,AP SAT> y in = 2;
  ap fixed<18,4,AP TRN,AP SAT> z out;
  // Run function
  ffn(x in, y in, z out);
  // Get gold ouptut
  ap fixed<18,4,AP TRN,AP SAT> gold;
  // Could get gold values from elsewhere
  ffn(x in, y in, z_out: gold);
  // Compare with gold output
  int result = 0;
  if (gold == z out) {
    printf(format: "Good");
   result = 0:
 } else {
    printf(format: "Error");
   result = 1:
  return result:
```

#### FFN function file

Feedforward network

with 2-8-1 node

structure.

```
#include "ffn.h"
void ffn(
    ap fixed<18,4,AP TRN,AP SAT> x in,
    ap_fixed<18,4,AP_TRN,AP_SAT> y_in,
    ap_fixed<18,4,AP_TRN,AP_SAT> &z_out
    #pragma HLS pipeline II=1
    static const ap_fixed<18,4,AP_TRN,AP_SAT> w1_x[8] = {1.7,1.5,0.3,0.13,1.11,0.17,0.19,0.23};
    static const ap_fixed<18,4,AP_TRN,AP_SAT> w1_y[8] = \{0.13,1.23,0.11,1.17,0.19,0.17,1.27,0.23\};
    static const ap_fixed<18,4,AP_TRN,AP_SAT> b1[8] = {0.3,0.27,1.19,0.17,0.13,1.37,0.29,0.1};
    static const ap_fixed<18,4,AP_TRN,AP_SAT> w2[8] = {0.23,1.11,0.27,1.3,0.11,0.17,1.13,0.7};
    static const ap fixed<18,4,AP TRN,AP SAT> b2 = 3;
    ap fixed<18,4,AP TRN,AP SAT> s0 inputs[2];
    ap_fixed<18,4,AP_TRN,AP_SAT> l1_mul_products[2][8];
    ap fixed<18,4,AP TRN,AP SAT> l1 pre acts wide[8];
    ap_fixed<18,4,AP_TRN,AP_SAT> l1_hidden_act_q[8];
    ap_fixed<18,4,AP_TRN,AP_SAT> l2_mul_products[8];
    ap fixed<18,4,AP TRN,AP SAT> 12 pair sums stage4[4];
    ap_fixed<18,4,AP_TRN,AP_SAT> l2_group_sums_stage5[2];
    ap_fixed<18,4,AP_TRN,AP_SAT> z_reg = 0;
    ap_fixed<18,4,AP_TRN,AP_SAT> acc = 0;
    s0 \text{ inputs}[0] = x \text{ in};
    s0 inputs[1] = y in;
    for (int i=0; i<8; ++i) {
     l1 mul products [0][i] = s0 \text{ inputs}[0] * w1 x[i];
     l1 mul products[1][i] = s0 inputs[1] * w1 y[i];
     l1_pre_acts_wide[i] = l1_mul_products[0][i] + l1_mul_products[1][i] + b1[i];
      if (l1_pre_acts_wide[i] <= 0) l1_hidden_act_q[i] = 0;</pre>
      else l1 hidden act q[i] = l1 pre acts wide[i];
     l2 mul products[i] = l1 hidden act q[i]*w2[i];
      acc += l2_mul_products[i];
    z out = acc + b2;
                                                                                    17
```

## C++ synthesis

Can change HLS C++ code to HDL.

Automatically generated HDL code.

```
add ln32 6 fu 1194 p2 <= std logic vector(signed(sext ln32 10 fu 1191 p1) + signed(sext ln32 9 fu 1188 p1));
add ln32 7 fu 1200 p2 <= std logic vector(unsigned(add ln32 6 fu 1194 p2) + unsigned(ap const lv19 AE1));
add ln32 8 fu 1220 p2 <= std logic vector(signed(sext ln32 13 fu 1217 p1) + signed(sext ln32 12 fu 1214 p1));
add_ln32_9_fu_1226_p2 <= std_logic_vector(unsigned(add_ln32_8_fu_1220_p2) + unsigned(ap_const_lv19_851));
add ln32 fu 1029 p2 <= std_logic_vector(signed(sext_ln32_1_fu_1026_p1) + signed(sext_ln32_fu_1023_p1));
add ln37 1 fu 2046 p2 <= std logic vector(signed(sext ln37 fu 2036 p1) + signed(zext ln37 fu 2033 p1));
add_ln37_2_fu_2162_p2 <= std_logic_vector(unsigned(zext_ln37_1_fu_2154_p1) + unsigned(sext_ln37_1_fu_2151_p1));
add ln37 3 fu 2238 p2 <= std_logic_vector(signed(sext_ln37_3_fu_2228_p1) + signed(sext_ln37_2_fu_2224_p1));
add_ln37_4_fu_2305_p2 <= std_logic_vector(unsigned(zext_ln37_2_fu_2296_p1) + unsigned(sext_ln37_4_fu_2292_p1));
add ln37_5_fu_2402_p2 <= std_logic_vector(unsigned(zext_ln37_3_fu_2394_p1) + unsigned(sext_ln37_5_fu_2391_p1));
add ln37_6_fu_2478_p2 <= std_logic_vector(signed(sext_ln37_7_fu_2468_p1) + signed(sext_ln37_6_fu_2464_p1));
add ln37 7 fu 2564 p2 <= std logic vector(unsigned(zext ln37 4 fu 2555 p1) + unsigned(sext ln37 8 fu 2551 p1));
add ln40 fu 2619 p2 <= std logic vector(signed(sext ln40 fu 2615 p1) + signed(ap const lv19 C000));
and ln30 1 fu 457 p2 <= (tmp reg 2697 and or ln30 1 fu 451 p2);
and ln30 2 fu 524 p2 <= (xor ln30 2 fu 519 p2 and or ln30 3 fu 514 p2);
and ln30 3 fu 546 p2 <= (tmp 9 reg 2720 and or ln30 4 fu 540 p2);
and ln30 4 fu 805 p2 <= (xor ln30 4 fu 800 p2 and or ln30 6 fu 795 p2);
and ln30 5 fu 827 p2 <= (tmp 35 reg 2789 and or ln30 7 fu 821 p2);
and ln30 fu 435 p2 <= (xor ln30 fu 430 p2 and or ln30 fu 425 p2);
          fu 619 p2 <= (tmp 12 reg 2743 and or ln31 1 fu 613 p2)
```

Also prints out latency of logic and estimated resources.

| MODULES &<br>LOOPS | LATENCY(CYCLES) | LATENCY(NS) | INTERVAL | PIPELINED | BRAM | DSP | FF   | LUT  | URAM |
|--------------------|-----------------|-------------|----------|-----------|------|-----|------|------|------|
| • ffn              | 10              | 80.000      | 1        | yes       | 0    | 23  | 1385 | 2609 | 0    |

## C++/HDL co-simulation

Compare between HDL output with golden output.

```
INFO: [Common 17-206] Exiting xsim at Thu Aug 21 13:02:01 2025...
INFO: [COSIM 212-316] Starting C post checking ...
GoodINFO: [COSIM 212-1000] *** C/RTL co-simulation finished: PASS ***
```

Also shows latency



#### Make into IPCore

```
INFO: calling package hls ip ip types=vitis sysgen json file=/home/hepdream/Work/FPGA class/HLS FFN/hls/hls/hls/hls data.json
INFO: Copied 1 ipmisc file(s) to /home/hepdream/Work/FPGA_class/HLS_FFN/hls/hls/impl/ip/misc
INFO: Copied 10 verilog file(s) to /home/hepdream/Work/FPGA class/HLS FFN/hls/hls/impl/ip/hdl/verilog
INFO: Copied 10 vhdl file(s) to /home/hepdream/Work/FPGA class/HLS FFN/hls/hls/impl/ip/hdl/vhdl
ipx::create core: Time (s): cpu = 00:00:08; elapsed = 00:00:08. Memory (MB): peak = 1642.391; gain = 39.836; free physical
INFO: Import ports from HDL: /home/hepdream/Work/FPGA class/HLS FFN/hls/hls/hls/impl/ip/hdl/vhdl/ffn.vhd (ffn)
INFO: Add clock interface ap clk
INFO: Add reset interface ap rst
INFO: Add ap ctrl interface ap_ctrl
INFO: Add data interface x in
INFO: Add data interface y in
INFO: Add data interface z out
INFO: [IP Flow 19-234] Refreshing IP repositories
INFO: [IP Flow 19-1704] No user IP repositories specified
INFO: [IP Flow 19-2313] Loaded Vivado IP repository '/opt/Xilinx/Vivado/2024.1/data/ip'.
INFO: Calling post process vitis to specialize IP
INFO: Calling post_process_sysgen to specialize IP
INFO: Created IP /home/hepdream/Work/FPGA_class/HLS_FFN/hls/hls/hls/impl/ip/component.xml
INFO: Created IP archive /home/hepdream/Work/FPGA_class/HLS_FFN/hls/hls/hls/impl/ip/xilinx_com_hls_ffn_1_0.zip
INFO: [Common 17 206] Exiting Vivado at Thu Aug 21 13:04:34
INFO: [HLS 200-802] Generated output file hls/ffn.zip
```

#### Make into IPCore

After adding your HLS
 IPCore repository to
 Vivado

You can use your IP in Vivado!



How much did you understand? www.kahoot.it