Hello! It is now time to create our first implementation of the Smith-Waterman algorithm, using Xilinx SDAccel. I will start by opening a terminal on my machine, and sourcing the “settings64.sh” in the SDAccel installation folder. Now, by just typing “sdx” SDAccel will boot up. The first thing we need to specify, is a workspace for our projects. For this example, I will call it workspaceTutorial. The ones of you who are familiar with Eclipse, will notice that the SDaccel GUI is based on this software. On the left part of the screen we have the project explorer, with a tab called reports. In the lower central part, there are three tabs: Problems, Console and Properties and finally, in the right part of the screen we have the outline tab. To start a new project, we click on new > Xilinx SDx Project and we provide a name for the project. Now we have to choose our target architecture. Let’s start by using the FPGA board based on the Xilinx Virtex 7. Let’s click on Next, and then Finish as we want an empty project. Now we have to import the kernel and host file for our application. Right click on SRC on the project explorer and then Import > File System. Search for the folder where the files are located, and then click Ok. Let’s not consider the Makefile for now, I will show you later how to use it. Now we have the compute_matrices.cpp file, that we used before to calculate the operational intensity, and a file called maincl.cpp. We need now to tell SDAccel what is our kernel file. On the left side, let’s click on project.sdx and then on Add hardware function in the central part of the screen. This button will automatically create a binary container for our kernel. In order to see our kernel function, we need to flag “show non-qualified function”. This is necessary because we are providing to SDAccel a C/C++ kernel, and not an OpenCL one and SDAccel is not able to automatically detect the kernel file, unless we specify some instructions in the code that we will see later on. So, let’s pick up our compute_matrices function, without changing the number of compute units, or the witdth of the memory ports. Let’s try now to build our kernel for running Software Emulation. As you can see from the Console, SDAccel is complaining and the compilation of the kernel is failing. This is because we are missing some key instructions in our kernel file. It is possible to modify the kernel file directly from the SDAccel GUI, however, I suggest using Vivado HLS, as it helps the users in specifying different pragmas inside the code. To launch Vivado HLS, we just need to click on the Hardware Function name, and then click on the Vivado HLS symbol. The button automatically creates a Vivado HLS project. If now we analyze the compute_matrices.cpp file, we can see that the framework automatically added some comments, telling us that a C/C++ kernel needs specific interfaces with the memory in order to be compliant with SDAccel. In particular we need to create master AXI interfaces for each pointer and AxiLite interface for each scalar. Finally, we need to provide a specific interface for the return, that is used to control our kernel. To specify the interfaces, we can use the directive tab on the right part of the screen, or specify it manually in the code. To use the directive tab, click on the input/output parameter, and specify all the required fields. In this example, as the tool is already suggesting the pragma to put into the source file, we will just copy and paste it and then change the specific fields. Let’s specify a Master AXI port and an AXI Lite one for each of our inputs. Finally, let’s specify the pargma for the return, and delete the comment. You may now observe that the directive tab on the right reports all the interfaces that we have specified in our kernel file. Let’s try to synthesize our kernel inside Vivado HLS to see if we specified all the pragmas in the correct way. Synthesizing is a good way to understand if everything is correct and to have an insight on if the pragmas we are inserting are being applied in the correct way. The result of the synthesis is a “Synthesis Report” and provides information regarding the estimated latency at different levels, as well as information regarding loops in the code. As an example here, Vivado HLS automatically tried to pipeline the for loop in the code, obtaining an initiation interval equal to 20. If we look into the console, we can see that Vivado HLS is not able to obtain the initiation interval of 1 for the specific loop because it found multiple dependencies inside the for loop. For each of these dependencies, it provides reference in the code to help the designer in removing, if possible, the dependency. For now, let’s save and close Vivado HLS so that we can come back to SDAccel to perform the emulations, and then build to execute on the FPGA.