

# Introduction to HLS4ML

#### HLS4ML

HLS4ML changes python

neural networks to HLS C++.

HLS changes C++ to HDL.

Vivado changes HDL to

firmware.

• Will go through this pipeline.



# Example case used in HLS4ML tutorial [link]

- Train a network to find what generated a AK8 jet
  - $\triangleright$ AK8 jet could be from  $W^+W^-, ZZ, t\bar{t}, q\bar{q}, gg, ...$
- Dataset: DOI: 10.5281/zenodo.3602254
  - ➤ Generated with Madgraph5, Pythia8

  - > 5 labels  $(W^+W^-, ZZ, t\bar{t}, q\bar{q}, gg)$  for AK8 jet

#### **Observables**

```
m_{
m mMDT}
N_2^{eta=1,2}
M_2^{eta=1,2}
C_1^{eta=0,1,2}
C_2^{eta=1,2}
D_2^{eta=1,2}
D_2^{eta=1,2}
\sum z \log z
Multiplicity
```

### Dataset

- 16 features of AK8 jet
- Each class 160k events





c1 b0 mmdt

Jaebak Kim (Korea University)

August 28, 2025 (1st Korea HEP FPGA Forum)

# Making model



Can train with Keras



Model saved into HDF5 format.

## Convert python model with HLS4ML

• HLS4ML makes C++

code for the network.

>Integerizes the

weights and inputs.

```
// hls-fpga-machine-learning insert layer-precision
typedef ap_fixed<16,6> input_t;
typedef ap_fixed<16,6> model_default_t;
Weights
typedef ap_fixed<16,6> layer2_t; precision
typedef ap_uint<1> layer2_index;
typedef ap_fixed<16,6> layer4_t;
typedef ap_fixed<18,8> relu1_table_t;
typedef ap_fixed<16,6> layer5_t;
typedef ap_uint<1> layer5_index;
typedef ap_fixed<16,6> layer7_t;
typedef ap_fixed<18,8> relu2_table_t;
```

```
layer2_t layer2_out[N_LAYER_2];
#pragma HLS ARRAY_PARTITION variable=layer2_out complete dim=0
nnet::dense<input_t, layer2_t, config2>(fc1_input, layer2_out, w2, b2); // fc1
layer4_t layer4_out[N_LAYER_2];
#pragma HLS ARRAY_PARTITION variable=layer4_out complete dim=0
nnet::relu<layer2_t, layer4_t, relu_config4>(layer2_out, layer4_out); // relu1
layer5_t layer5_out[N_LAYER_5];
#pragma HLS ARRAY_PARTITION variable=layer5_out complete dim=0
nnet::dense<layer4_t, layer5_t, config5>(layer4_out, layer5_out, w5, b5); // fc2
layer7_t layer7_out[N_LAYER_5];
#pragma HLS ARRAY_PARTITION variable=layer7_out complete dim=0
nnet::relu<layer5_t, layer7_t, relu_config7>(layer5_out, layer7_out); // relu2
layer8_t layer8_out[N_LAYER_8];
#pragma HLS ARRAY_PARTITION variable=layer8_out complete dim=0
nnet::dense<layer7_t, layer8_t, config8>(layer7_out, layer8_out, w8, b8); // fc3
layer10_t layer10_out[N_LAYER_8];
#pragma HLS ARRAY_PARTITION variable=layer10_out complete dim=0
nnet::relu<layer8_t, layer10_t, relu_config10>(layer8_out, layer10_out); // relu3
layer11_t layer11_out[N_LAYER_11];
#pragma HLS ARRAY_PARTITION variable=layer11_out complete dim=0
nnet::dense<layer10_t, layer11_t, config11>(layer10_out, layer11_out, w11, b11); // output
nnet::softmax<layer11_t, result_t, softmax_config13>(layer11_out, layer13_out); // softmax
```

### HLS4ML tools to checks network

- How good is the network?
  - ➤ HLS4ML can check how good the integerization is (Dotted lines)
  - Compare with float point calcuation.



#### Convert HLS C++ to HDL

- Using Vitis HLS, C++ can be converted to HDL.
- Too many DSPs for small

```
Pynq-Z2 FPGA...
```

```
entity myproject is
port (
    ap clk : IN STD LOGIC;
    ap rst : IN STD LOGIC;
    ap start : IN STD LOGIC;
    ap done : OUT STD LOGIC;
    ap idle : OUT STD LOGIC;
    ap ready : OUT STD LOGIC;
    fc1 input ap vld : IN STD LOGIC;
    fc1 input : IN STD LOGIC VECTOR (255 downto 0);
    layer13 out 0 : OUT STD LOGIC VECTOR (15 downto 0);
    layer13 out 0 ap vld : OUT STD LOGIC;
    layer13 out 1 : OUT STD LOGIC VECTOR (15 downto 0);
    layer13 out 1 ap vld : OUT STD LOGIC;
    layer13_out_2 : OUT STD_LOGIC_VECTOR (15 downto 0);
    layer13 out 2 ap vld : OUT STD LOGIC;
    layer13 out 3 : OUT STD LOGIC VECTOR (15 downto 0);
    layer13 out 3 ap vld : OUT STD LOGIC;
    layer13 out 4 : OUT STD LOGIC VECTOR (15 downto 0);
    layer13 out 4 ap vld : OUT STD LOGIC );
end:
```



#### Convert HLS C++ to HDL

Can make it into a IPCore with Vitis HLS.

INFO: Created IP /home/hepdream/Work/FPGA\_class/ML4HLS\_helloworld/src/model\_1/hls4ml\_prj/hls/hls/hls/impl/ip/component.xml INFO: Created IP archive /home/hepdream/Work/FPGA\_class/ML4HLS\_helloworld/src/model\_1/hls4ml\_prj/hls/hls/hls/impl/ip/xilinx\_com\_hls\_myproject\_1\_0.zip

Can use IPCore in Vivado.



# **HLS4ML** scripts

- HLS4ML has scripts that use Vitis HLS and Vivado.
- Firmware can be made only using the HLS4ML python script.
  - >Sometimes it will be easier to do things with python.
  - Sometimes, it will be eaiser to do things with the Vitis HLS and Vivado GUI.

## Optimization

- The network could be optimized
- There are many types of optimization
  - ➤ Performance optimization (Ex: Better AUC)
  - ➤ Resource reduction optimization (Ex: Less DSPs)
  - >Latency reduction optimization
  - >Initiation Interval (II) reduction optimization
- But they often counteract each other

# Many tricks in optimization

- The integerization precision could be changed.
- The weights can be set to integers when training.
  - ➤ No difference between python model and integerized C++ model.
- Small weights could be set to 0 to reduce resources.
- Resources could be reused using "rolling" loops.

# Example of optimization

Previously we saw that too many resources were used.

Pynq-z2 resources: BRAM(140), DSP(220), FF(106k), LUT(54k)



We can try to "roll" loops to "reuse" resources and

reduce bit size to fit it on the FPGA. (But performance is

bad, and II is large.)



How much did you understand? www.kahoot.it