Intel FPGA SDK for OpenCL: Intel Stratix 10 GX FPGA Development Kit Reference Platform Porting Guide
Intel FPGA SDK for OpenCL Intel Stratix 10 GX FPGA Development Kit Reference Platform Porting Guide
OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of the Khronos Group™.
The Intel® FPGA SDK for OpenCL™ is based on a published Khronos specification, and has passed the Khronos Conformance Testing Process. Current conformance status can be found at www.khronos.org/conformance.
Introduction to Intel Reference Platform
It allows you to target Intel® FPGA devices either on reference platforms provided by Intel® or Intel® board partners, or on your own custom platforms. A typical setup for using the SDK is illustrated in the following image:
The setup consists of a host application running on the host processor and offloading kernel tasks to the FPGA. The OpenCL kernel is converted to a hardware circuit by the SDK compiler. Leveraging this capability for your FPGA platform requires an Intel® FPGA SDK for OpenCL™ -compatible Board Support Package (BSP). The BSP describes the reference platform to the SDK.
The following illustration depicts segments of the Intel® FPGA SDK for OpenCL™ solution:
Your host application communicates with the BSP layers through the Hardware Abstraction Layer (HAL). A typical Intel® BSP consists of software layers and a hardware project created using the Intel® Quartus® Prime Pro Edition software. The hardware project consists of FPGA board peripheral IPs and custom IPs.
The following illustration depicts the hardware project components, and how these components communicate with software layers:
In Figure 3, left side depicts the host application running on host processor while right side depicts the FPGA hardware acceleration board. If you are developing a custom platform to run your software applications, then you must create a custom BSP for your platform. In that case, everything that appears in blue in the image must be included in your custom BSP as follows:
- On the FPGA side, your custom BSP must include all hardware necessary to communicate with the host and the memory, that is, DDR and/or QDR memory interfaces, the DMA host interface (which can be PCIe), and any streaming interfaces to be implemented as channels. The Intel® FPGA SDK for OpenCL™ compiles your OpenCL kernel into a data flow circuit, connects to the BSP hardware components, and generates an FPGA image for this combined circuit, which is used to configure the FPGA.
- On the host side, your custom BSP must provide the Memory Mapped Device layer (MMD) (in the form of a library) to facilitate communication between OpenCL libraries and your hardware. When you compile your host application, the host application links with both the Intel® FPGA SDK for OpenCL™ and MMD libraries to form a host executable.
Intel® provides reference BSPs for Intel® FPGA development kits. Most of these reference BSPs are included in the installed directory for Intel® FPGA SDK for OpenCL™ :
Reference BSP | Install Path |
---|---|
Intel® Arria® 10 GX FPGA Reference Platform | Installed with the Intel® FPGA SDK for OpenCL™ in INTELFPGAOCLSDKROOT/board/a10_ref. |
Intel® Stratix® 10 GX FPGA Reference Platform | Installed with the Intel® FPGA SDK for OpenCL™ in INTELFPGAOCLSDKROOT/board/s10_ref. |
Intel® Arria® 10 SoC FPGA Reference Platform | Installed with the Intel® FPGA SDK for OpenCL™ in INTELFPGAOCLSDKROOT/board/a10soc. |
You can use one of the above BSPs as a reference to get started with custom BSP development. You can also use reference platforms from one of the following Intel® FPGA’s preferred board partners and download BSP from their website if it matches your hardware requirements:
Intel Stratix 10 GX FPGA Development Kit Reference Platform: Prerequisites
Prerequisites for the s10_ref Reference Platform:
- An
Intel®
Stratix® 10
-based accelerator card with working
PCI Express*
(
PCIe*
) and memory interfaces
Attention:
The native Stratix 10 GX FPGA Development Kit does not automatically work with the SDK. Before using the Stratix 10 GX FPGA Development Kit with the SDK, you must setup the board by following the steps provided in the bring-up guide (included in the INTELFPGAOCLSDKROOT/board/s10_ref/bringup directory) or contact your field applications engineer or regional support center representative to configure the development kit for you.
Alternatively, contact Intel® Premier Support for assistance.
- Intel® Quartus® Prime Pro Edition software
- Designing with Logic Lock regions
General prerequisites:
- FPGA architecture, including clocking, global routing, and I/Os
- High-speed design
- Timing analysis
- Platform Designer design and Avalon® interfaces
- Tcl scripting
- PCIe
- DDR4 external memory
This document also assumes that you are familiar with the following Intel® FPGA SDK for OpenCL™ -specific tools and documentation:
- Custom Platform Toolkit and the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide
-
Intel®
Arria® 10 Reference Platform (a10_ref)
and the
Intel® FPGA SDK for OpenCL™
Intel®
Arria® 10 GX FPGA Development Kit Reference Platform
Porting Guide
The whole software stack in the s10_ref is derived from the a10_ref Reference Platform.
Features of the Intel Stratix 10 GX FPGA Development Kit Reference Platform
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform targets a subset of the hardware features available in the Intel® Stratix® 10 GX FPGA Development Kit.

Features of the s10_ref Reference Platform:
- OpenCL Host
The s10_ref Reference Platform uses a PCIe-based host that connects to the Intel® Stratix® 10 PCIe Gen3x8 hard IP core.
- OpenCL Global Memory
The hardware provides one 2-gigabyte (GB) DDR4 SDRAM daughtercard that is mounted on the HiLo connector and instantiated in the design using Intel® Stratix® 10 External Memory Interface IP (J14 in Figure 4).
- FPGA Programming
Via external cable and the Intel® Stratix® 10 GX FPGA Development Kit's on-board Intel® FPGA Download Cable II interface.
- Guaranteed Timing
The s10_ref Reference Platform relies on the Intel® Quartus® Prime Pro Edition compilation flow to provide guaranteed timing closure. The timing-clean s10_ref Reference Platform is preserved in the form of a precompiled post-fit netlist (that is, the base.qdb Intel® Quartus® Prime Database Export File). The Intel® FPGA SDK for OpenCL™ Offline Compiler imports this preserved post-fit netlist into each OpenCL kernel compilation.
Intel Stratix 10 GX FPGA Development Kit Reference Platform Board Variants
To compile your OpenCL kernel for a specific board variant, include the -board=<board_name> option in your aoc command (for example, aoc -board=s10gx myKernel.cl).
Contents of the Intel Stratix 10 GX FPGA Development Kit Reference Platform
Windows File or Folder | Linux File or Directory | Description |
---|---|---|
board_env.xml | board_env.xml | eXtensible Markup Language (XML) file that describes the Reference Platform to the Intel® FPGA SDK for OpenCL™ . |
bringup | bringup | Contains initialization binaries and Intel® Stratix® 10 Development Kit Initialization guide (S10_DevKit_Initialization). |
hardware | hardware | Contains the
Intel®
Quartus® Prime project
templates for the s10gx board variant. See Table 3 for a list of files in this directory. |
windows64 | linux64 | Contains the MMD library, kernel mode driver, and executable files of the SDK utilities (that is, install, uninstall, flash, program, diagnose) for your 64-bit operating system. |
source | source |
For Windows, the source folder contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the windows64 folder. For Linux, the source directory contains source codes for the MMD library and SDK utilities. The MMD library and the SDK utilities are in the linux64 directory. |
scripts | scripts | Contains the find_jtag_cable.tcl script that is useful in identifying the cable and index number required for FPGA programming. |
File | Description |
---|---|
mem.qsys | Platform Designer system that, together with the .ip files in the ip/mem/ sub-directory, implements the mem component. It can be recognized as the memory subsystem instantiated in the board.qsys. It contains EMIF hard IP, AVMM S10 CCBs (Clock Crossing Bridge), ACL Uniphy Status and ACL SW Reset (Calibration) components. |
board.qsys | Platform Designer system that implements the board interfaces (that is, the static region) of the OpenCL hardware system. |
device.tcl | Tcl file that is included in all revisions and contains all device-specific information (for example, device family, ordering part number (OPN), voltage settings, pin assignments and so on). It is sourced in other QSF files, such as opencl_bsp_ip.qsf and flat.qsf. |
opencl_bsp_ip.qsf |
Intel®
Quartus® Prime
Settings File that collects all the required .ip files in a unique location. During flat and base revision compilations, the board.qsys and mem.qsys related IP files are added to the opencl_bsp_ip.qsf file. |
flat.qsf |
Intel®
Quartus® Prime Settings File for the
flat project revision. This file includes all common settings, such
as VID and global signal settings, that are used in other revisions
of the project (that is, base and top). The base.qsf and top.qsf files include, by reference,
all settings in the flat.qsf
file. The Intel® Quartus® Prime software compiles the flat revision with minimal constraints. The flat revision compilation does not generate a base.qar file that you can use for future import compilations and does not implement the guaranteed timing flow. It is used to make edits and check functionality of the design. |
base.qsf |
Intel®
Quartus® Prime Settings File for the
base project revision. This file includes, by reference, all the
settings in the flat.qsf
file. The Intel® Quartus® Prime Pro Edition software compiles this base project revision from source code unlike the top compile that uses the base.qar output. |
top.qsf | Intel® Quartus® Prime Settings File for the SDK-user compilation flow (import compilation flow). |
top.v | Top-level Verilog Design File for the OpenCL hardware system. |
top.sdc | Synopsys Design Constraints File that contains board-specific timing constraints. |
top_post.sdc | Platform Designer and Intel® FPGA SDK for OpenCL™ IP-specific timing constraints. |
ip/mem/<file_name> | Directory containing the .ip files that the
Intel®
Quartus® Prime Pro Edition software needs to
parameterize the mem component. Along with mem.qsys, files in this directory are required for flat and base revision compiles. These are added to the flow by pre_flow_pr.tcl. |
ip/board/<file_name> | Directory containing the .ip
files that the
Intel®
Quartus® Prime Pro Edition
software needs to parameterize the board instance. Along with board.qsys, files in this directory are required for flat and base revision compiles. These are added to the flow by pre_flow_pr.tcl |
ip/freeze_wrapper.v | Verilog Design File that implements the freeze logic. Freeze logic allows user to construct the building blocks for a design that is suitable for Partial Reconfiguration. |
ip/pr_region.v | Verilog Design File that contains the Partial Reconfiguration (PR) region logic. |
ip/temperature/<file_name> | A wrapper to the actual temperature IP that has an Avalon streaming interface. The wrapper sets the necessary default values and converts the interface to an AVMM interface so that it can be used by other IPs. |
ip/irq_controller/<file_name> | IP that receives interrupts from the OpenCL
kernel system and sends message signaled interrupts (MSI) to the
host. Refer to the Message Signaled Interrupts section for more information. |
compile_script.tcl | Tcl script for SDK compilation flows. |
scripts/create_fpga_bin_pr.tcl | Tcl script that generates the ELF binary file, fpga.bin from .sof, .rbf, and pr_base.id files. The fpga.bin file contains all files necessary for configuring the FPGA. |
scripts/qar_ip_files.tcl | Tcl script that packages up base.qdb, pr_base.id, base.sdc, board and mem Platform Designer system generation output during base revision compile. |
scripts/helpers.tcl | Tcl script with helper functions used by qar_ip_files.tcl. |
scripts/post_flow_pr.tcl | Tcl script that runs after every Intel® Quartus® Prime Pro Edition software compilation. It facilitates the guaranteed timing flow by setting the kernel clock PLL, generating a small report in the acl_quartus_report.txt file, and rerunning STA with the modified kernel clock settings. |
scripts/pre_flow_pr.tcl | Tcl script that executes before the invocation of the Intel® Quartus® Prime software compilation. Running the script generates the Platform Designer HDL for board.qsys. |
scripts/get_static_region_kernel_fmax.tcl | Tcl script to generate reports for kernel clk worst paths in static
region. Note: The region where pre-compiled BSP hardware
design is placed is called Static region.
|
scripts/regenerate_cache.tcl | Helper scripts for bak flow. |
scripts/base_write_sdc.tcl | Tcl script to save the SDC from a base revision compile. |
scripts/create_acds_ver_hex.tcl | Tcl script to burn Intel® Quartus® Prime software version to ROM during compile. |
adjust_plls.tcl | Tcl script that is not part of the scripts directory, but it is an important script to know about. This PLL adjustment script for the kernel clock PLL guarantees timing closure on the kernel clock by setting it to the maximum allowed frequency. |
base.qar |
Intel®
Quartus® Prime Archive File that
contains base.qdb, pr_base.id,
base.sdc, board and
mem
Platform Designer generation output from
base revision compile. This
Intel®
Quartus® Prime Archive file is generated by the scripts/post_flow_pr.tcl file during
base revision compile and is used during top revision compilation.
|
top.qpf | Intel® Quartus® Prime Project File for the OpenCL hardware system. |
quartus.ini | Contains any special Intel® Quartus® Prime software options that you need to compile OpenCL kernels for the s10_ref Reference Platform. |
iface.ipx | Specifies the relative path of directories to search for IP cores. In general, .ipx (that is, IP Index Files) files facilitate faster searches. iface.ipx is a top-level .ipx file that references hw_iface.iipx and sw_iface.iipx (intermediate-ipx) files. |
hw_iface.ipx | Intermediate IP Index file. It is an XML file that consists of <component> elements with attributes to define some of the BSP components. |
sw_iface.ipx | Intermediate IP Index file. |
board_spec.xml | XML file that provides the definition of the board hardware interfaces to the SDK. |
Intel Stratix 10 GX FPGA Development Kit Reference Platform Design Architecture
- Host-to- Intel Stratix 10 FPGA Communication over PCIe
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform instantiates the Intel® Stratix® 10 PCIe* hard IP in board.qsys file to implement a host-to-device connection over PCIe* . - DDR4 as Global Memory for OpenCL Applications
The Intel® Stratix® 10 GX FPGA Development Kit has one bank of 2GB x72 DDR4-1866 SDRAM. - Host Connection to OpenCL Kernels
The PCIe® host needs to pass commands and arguments to the OpenCL™ kernels via the control register access (CRA) Avalon® slave port that each OpenCL kernel generates. - Partial Reconfiguration
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform uses partial reconfiguration (PR) as a default mechanism to reconfigure the OpenCL kernel-related partition of the design without altering the static board interface that is in a running state. - Other Components in the Reference Design
- Intel Stratix 10 FPGA System Design
To integrate all components, close timing, and deliver a post-fit netlist that functions in the hardware, you must first address several additional FPGA design complexities. - Guaranteed Timing Closure of the Intel Stratix 10 GX FPGA Development Kit Reference Platform Design
One of the key features of the Intel® FPGA SDK for OpenCL™ is that it abstracts away hardware details, such as timing closure, for software developers. - Intel FPGA SDK for OpenCL Compilation Flows
The BSP contains scripts that facilitate all kernel compiles based on the revision, that is, flat, base or top. These scripts are located inside the hardware/s10gx/scripts and hardware/s10gx directory together. - Addition of Timing Constraints
In the Intel® Stratix® 10 FPGA Development Kit Reference Platform, the top.sdc file contains all timing constraints applicable before IP instantiation in Platform Designer. The top_post.sdc file contains timing constraints applicable after Platform Designer generation is run. - Connection of the Intel Reference Platform to the Intel FPGA SDK for OpenCL
A Custom Platform must include a board_env.xml file to describe its general contents to the Intel® FPGA SDK for OpenCL™ Offline Compiler. - Intel Stratix 10 FPGA Programming Flow
There are two ways to program the Intel® Stratix® 10 FPGA for the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform: Flash and quartus_pgm - Implementation of Intel FPGA SDK for OpenCL Utilities
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform includes a set of Intel® FPGA SDK for OpenCL™ utilities for managing the FPGA board. - Considerations in Intel Stratix 10 GX FPGA Development Kit Reference Platform Implementation
The implementation of the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform includes some workarounds that address certain Intel® Quartus® Prime Pro Edition software known issues.
Host-to- Intel Stratix 10 FPGA Communication over PCIe
Instantiation of Intel Stratix 10 PCIe Hard IP with Direct Memory Access
Dependencies
- Intel® Stratix® 10 PCIe* hard IP core
- Parameter Settings section of the Intel® Stratix® 10 Avalon® -MM DMA Interface for PCIe Solutions User Guide
Parameter(s) | Setting |
---|---|
System Settings | |
Application interface type |
Avalon®
-MM with DMA
This Avalon® Memory-Mapped ( Avalon® -MM) interface instantiates the embedded DMA of the PCIe® hard IP core. Check the Enable Avalon-MM DMA option under Avalon-MM settings. |
Hard IP mode |
Gen3x8, Interface: 256-bit, 250 MHz Number of Lanes: x8 Lane Rate: Gen3 (8.0 Gbps) |
Avalon® -MM Settings | |
Export MSI/MSI-X conduit interfaces |
Enabled
Export the MSI interface in order to connect the interrupt sent from the kernel interface to the MSI. |
Instantiate Internal Descriptor Controller |
Enabled
Instantiates the descriptor controller in the Avalon® -MM DMA bridge. Use the 128-entry descriptor controller that the PCIe* hard IP core provides. |
Address width of accessible PCIe memory space |
64 bits
This value is machine dependent. To avoid truncation of the MSI memory address, 64-bit machines should allot 64 bits to access the PCIe* address space. |
Base Address Register (BAR) Settings | |
Base Address Registers (BARs) | This design uses two BARs. For BAR 0, set Type to 64-bit prefetchable memory. The Size parameter setting is disabled because the Instantiate Internal Descriptor Controller parameter is enabled in the Avalon® -MM system settings. BAR 0 is only used to access the DMA Descriptor Controller, as described in the Intel® Stratix® 10 Avalon® -MM DMA for PCI Express section of the Intel® Stratix® 10 Avalon® -MM DMA Interface for PCIe* Solutions User Guide. For Bar 4, set Type to 64-bit prefetchable memory, and set Size to 18 bits (256 KBytes). BAR 4 is used to connect PCIe to the OpenCL kernel systems and other board modules. |
Device Identification Registers for Intel Stratix 10 PCIe Hard IP
ID Register Name | ID Provider | Description | Parameter Name in PCIe IP Core |
---|---|---|---|
Vendor ID | PCI-SIG® | Identifies the FPGA manufacturer.
Always set this register to 0x1172, which is the Intel® vendor ID. |
Vendor ID |
Device ID | Intel® | Describes the PCIe configuration
on the FPGA according to
Intel®
's internal guideline. Set the device ID to the device code of the FPGA on your accelerator board. For the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform, set the Device ID register to 0x5170, which signifies Gen 3 speed, 8 lanes, Intel® Stratix® 10 device family, and Avalon® -MM interface, respectively. Refer to Table 6 for more information. |
Device ID |
Revision ID | When setting this ID, ensure that it matches the
following revision IDs:
|
Revision ID | |
Class Code | Intel® |
The Intel® FPGA SDK for OpenCL™ utility checks the base class value to verify whether the board is an OpenCL™ device. Do not modify the class code settings.
|
Class Code |
Subsystem Vendor ID | Board vendor | Identifies the manufacturer of the
accelerator board. Set this register to the vendor ID of manufacturer of your accelerator board. For the s10_ref Reference Platform, the subsystem vendor ID is 0x1172. If you are a board vendor, set this register to your vendor ID. |
Subsystem Vendor ID |
Subsystem Device ID | Board vendor | Identifies the accelerator
board. The SDK uses this ID to identify the board because the software might perform differently on different boards. If you create a Custom Platform that supports multiple boards, use this ID to distinguish between the boards. Alternatively, if you have multiple Custom Platforms, each supporting a single board, you can use this ID to distinguish between the Custom Platforms. Important: Make this ID unique to your
Custom Platform. For example, for the
s10_ref Reference Platform, the ID is
0x5170.
|
Subsystem Device ID |
The kernel driver uses the Vendor ID, Subsystem Vendor ID and the Subsystem Device ID to identify the boards it supports. The SDK's programming flow checks the Device ID to ensure that it programs a device with a .aocx Intel® FPGA SDK for OpenCL™ Offline Compiler executable file targeting that specific device.
Location in ID | Definition |
---|---|
15:14 | RESERVED |
13:12 | Speed
|
11 | RESERVED |
10:8 | Number of lanes
|
7:4 | Device family
|
3 | 1 — Soft IP (SIP) This ID indicates that the PCIe protocol stack is implemented in soft logic. If unspecified, the IP is considered a hard IP. |
2:0 |
Platform Designer
PCIe interface type
|
Instantiation of the version_id Component
The version ID for the s10_ref Reference Platform is 0xA0C7C1E5 (decimal from signed two's compliment is 1597521435).
Before communicating with any part of the FPGA system, the host first reads from this version_id register to confirm the following:
- The PCIe can access the FPGA fabric successfully
- The address map matches the map in the MMD software
Update the VERSION_ID parameter in the version_id component to a new value with every slave addition or removal from the PCIe BAR 4 bus, or whenever the address map changes.
Board Support Package Software Layer
The following image illustrates the platform software stack:
The runtime and Hardware Abstraction Layer (HAL) are part of the Intel® FPGA SDK for OpenCL™ or Intel® FPGA RTE for OpenCL. The Host-to-Device Memory Mapped Device (MMD) and PCIe Kernel Driver are delivered as a part of the BSP. Hence, apart from rebranding the BSP, you might need to update some code in the MMD or driver based on changes in the hardware project.
- Common Hardware Constants in Software Headers Files
To enable communication between the board and the host interface, define the hardware constants for the software in header files. - PCIe Kernel Driver
A PCIe® kernel driver is necessary for the OpenCL™ runtime library to access your board design via a PCIe bus. - Host-to-Device MMD Software Implementation
The Intel® Stratix® 10 GX FPGA Development Kit Reference Platform's MMD layer is a thin software layer that is essential for communication between the host and the board.
Common Hardware Constants in Software Headers Files
The two header files that describe the hardware design to the software are in the following locations:
- For Windows systems, the header files are in the INTELFPGAOCLSDKROOT\board\s10_ref\source\include folder, where INTELFPGAOCLSDKROOT is the path to the SDK installation.
- For Linux systems, the header files are in the INTELFPGAOCLSDKROOT/board/s10_ref/linux64/driver directory.
Following is a snapshot from INTELFPGAOCLSDKROOT/board/s10_ref/linux64/driver/hw_pcie_constants.h
//Version ID and Uniphy Status #define ACL_VERSIONID_BAR 4 #define ACL_VERSIONID_OFFSET 0xcfc0
Header File Name | Description |
---|---|
hw_pcie_constants.h |
Header file that defines most of the hardware constants for the board design specially in board.qsys . This file includes constants such as the IDs described in PCIe Device Identification Registers, BAR number, and offset for different components in your design. In addition, this header file also defines the name strings of ACL_BOARD_PKG_NAME, ACL_VENDOR_NAME, and ACL_BOARD_NAME. Update the information in this file whenever you change the board design. |
hw_pcie_dma.h |
Header file that defines DMA-related hardware constants.
Update these addresses whenever you change the board design. Refer to the Direct Memory Access section for more information.
|
PCIe Kernel Driver
This driver is installed using the Intel® FPGA SDK for OpenCL™ install utility.
The s10_ref Reference Platform
- For Windows systems, the driver is in the <path_to_sl0_ref>\windows64\driver folder.
- For Linux, an open-source MMD-compatible kernel driver is in the <path_to_sl0_ref>/linux64/driver directory. The table below highlights some of the files that are available in this directory.
File | Description |
---|---|
pcie_linux_driver_exports.h | Header file that defines the special commands
that the kernel driver supports. The installed kernel driver works as a character device. The basic operations to the driver are open(), close(), read(), and write(). To execute a complicated command, create a variable as an acl_cmd struct type, specify the command with the proper parameters, and then send the command through a read() or write() operation. This header file defines the interface of the kernel driver, which the MMD layer uses to communicate with the device. |
aclpci.c | File that implements the Linux kernel driver's
basic structures and functions, such as the init, remove, and
probe functions, as well as
hardware design-specific functions that handle interrupts. For more information on the interrupt handler, refer to the Message Signaled Interrupts section. |
aclpci fileio.c | File that implements the kernel driver's file
I/O operations. The kernel driver that is available with the s10_ref Reference Platform supports four file I/O operations: open(), close(), read(), and write(). Implementing these file I/O operations allows the OpenCL* user program to access the kernel driver through the file I/O system calls (that is, open, read, write, or close). |
aclpci cmd.c | File that implements the specific commands defined in the pcie_linux_driver_exports.h file. These special commands include SAVE_PCI_CONTROL_REGS, LOAD_PCI_CONTROL_REGS, and GET_PCI_SLOT_INFO. |
aclpci dma.c | File that implements DMA-related routines in the
kernel driver. Refer to the Direct Memory Access section for more information. |
aclpci queue.c | File that implements a queue structure for use in the kernel driver to simplify programming. |
Host-to-Device MMD Software Implementation
The source codes of an MMD library that demonstrates good performance are available in the INTELFPGAOCLSDKROOT/board/s10_ref/source/host/mmd directory. Refer to the Host-to-Device MMD Software Implementation section in the Stratix V Network Reference Platform Porting Guide for more information.
For more information on the MMD API functions, refer to the MMD API Descriptions section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
Direct Memory Access
Hardware Considerations
The instantiation process exports the DMA controller slave ports (that is, rd_dts_slave and wr_dts_slave) and master ports (that is, rd_dcm_master and wr_dcm_master) into the PCIe module. Two additional master ports, dma_rd_master and dma_wr_master, are exported for DMA read and write operations, respectively. For the DMA interface to function properly, all these ports must be connected correctly in the board.qsys Platform Designer system, where the PCIe hard IP is instantiated.
At the start of DMA transfer, the DMA Descriptor Controller reads from the DMA descriptor table in user memory, and stores the status and the descriptor table into a FIFO address. There are two FIFO addresses: Read Descriptor FIFO address and Write Descriptor FIFO address. After storing the descriptor table into a FIFO address, DMA transfers into the FIFO address can occur. The dma_rd_master port, which moves data from user memory to the device, must connect to the rd_dts_slave and wr_dts_slave ports. Because the dma_rd_master port connects to DDR4 memory also, the locations of the rd_dts_slave and wr_dts_slave ports in the address space must be defined in the hw_pcie_dma.h file.
The rd_dcm_master and wr_dcm_master ports must connect to the txs port. At the end of the DMA transfer, the DMA controller writes the MSI data and the done status into the user memory via the txs slave. The txs slave is part of the PCIe* hard IP in board.qsys.
All modules that use DMA must connect to the dma_rd_master and dma_wr_master ports. For DDR4 memory connection, Intel® recommends implementing an additional pipeline to connect the two 256-bit PCIe DMA ports to the 512-bit memory slave. For more information, refer to the DDR4 Connection to PCIe* Host section.
Software Considerations
The MMD layer uses DMA to transfer data if it receives a data transfer request that satisfies both of the following conditions:
- A transfer size that is greater than 1024 bytes
- The starting addresses for both the host buffer and the device offset are aligned to 64 bytes
Implementing a DMA Transfer
On Windows, polling is the default method for maximizing PCIe DMA bandwidth at the expense of CPU run time. To use interrupts instead of polling, assign a non-NULL value to the ACL_PCIE_DMA_USE_MSI environment variable.
To implement a DMA transfer:
- Verify that the previous DMA transfer sent all the requested bytes of data.
-
Map the virtual memories that are requested for DMA transfer to physical
addresses.
Note: The amount of virtual memory that can be mapped at a time is system dependent. Large DMA transfers will require multiple mapping or unmapping operations. For a higher bandwidth, map the virtual memory ahead in a separate thread that is in parallel to the transfer.
- Set up the DMA descriptor table on local memory.
- Write the location of the DMA descriptor table, which is in user memory, to the DMA control registers (that is, RC Read Status and Descriptor Base and RC Write Status and Descriptor Base).
- Write the Platform Designer address of descriptor FIFOs to the DMA control registers (that is EP Read Descriptor FIFO Base and EP Write Status and Descriptor FIFO Base).
- Write the start signal to the RD_DMA_LAST_PTR and WR_DMA_LAST_PTR DMA control registers.
- After the current DMA transfer finishes, repeat the procedure to implement the next DMA transfer.
Message Signaled Interrupt
Two different modules generate the signal for the MSI line. The DMA controller in the PCIe hard IP core generates the DMA's MSI. The PCI Express interrupt request (IRQ) module (that is, the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/ip/irq_controller directory) generates the kernel interface's MSI.
For more information on the PCI Express IRQ module, refer to Handling PCIe Interrupts webpage.
Hardware Considerations
In INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/board.qsys, the DMA MSI is connected internally; however, you must connect the kernel interface interrupt manually. For the kernel interface interrupt, the PCI Express IRQ module is instantiated as pcie_irq in board.qsys. The kernel interface interrupts connections are as follows:
- The kernel_irq_to_host port from the OpenCL Kernel Interface (kernel_interface) connects to the interrupt receiver, which allows the OpenCL kernels to signal the PCI Express IRQ module to send an MSI.
- The PCIe hard IP's msi_intfc port connects to the MSI_Interface port in the PCI Express IRQ module. The kernel interface interrupt receives the MSI address and the data necessary to generate the interrupt via msi_intfc.
- The IRQ_Gen_Master port on the PCI Express IRQ module, which is used to write the MSI, connects to the txs port on the PCIe hard IP.
- The IRQ_Read_Slave and IRQ_Mask_Slave ports connect to the pipe_stage_host_ctrl module on Bar 4. After receiving an MSI, the user driver can read the IRQ_Read_Slave port to check the status of the kernel interface interrupt, and read the IRQ_Mask_Slave port to mask the interrupt.
Software Considerations
The interrupt service routine in the Linux driver checks which module generates the interrupt. For the DMA's MSI, the driver reads the DMA descriptor table's status bit in local memory, as specified in the Read DMA Example section of the Intel® Stratix® 10 Avalon-MM DMA Interface for PCIe Solutions User Guide. For kernel interface's MSI, the driver reads the interrupt line sent by the kernel interface.
The interrupt service routine involves the following tasks:
- Check DMA status on the DMA descriptor table.
- Read the kernel status from the IRQ_READ_SLAVE port on the PCI Express IRQ module.
- If a kernel interrupt was triggered, mask the interrupt by writing to the IRQ_MASK_SLAVE port on the PCI Express IRQ module. Then, execute the kernel interrupt service routine.
- If a DMA interrupt was triggered, reset the DMA descriptor table and execute the DMA interrupt service routine.
- If applicable, unmask a masked kernel interrupt.
Instantiation of board_cade_id_0 Component – JTAG Cable Autodetect Feature
You can set the ACL_PCIE_JTAG_CABLE or ACL_PCIE_JTAG_DEVICE_INDEX environment variables to disable the auto-detect feature and use values that you define.
Cable autodetect is useful when you have multiple devices connected to a single host.
The memory-mapped device (MMD) uses in-system sources and probes to identify the cable connected to the target board. You must instantiate the board_cade_id_0 register block and connect it to Bar 4 with the correct address map. You must also instantiate board_in_system_sources_probes_0, which is an in-system sources and probe component, and connect it to board_cade_id_0 register.
The MMD must be updated to take in the relevant changes. Add the scripts/find_jtag_cable.tcl script to your custom platform.
When the FPGA is being programmed via the Intel FPGA Download Cable, the MMD invokes quartus_stp to execute the find_jtag_cable.tcl script. The script identifies the cable and index number which is then used to program the FPGA through the quartus_pgm command.
DDR4 as Global Memory for OpenCL Applications
In the current version of the s10_ref Reference Platform, all Platform Designer components related to the DDR4 global memory are now part of the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/mem.qsys Platform Designer subsystem within board.qsys.
Dependencies
DDR4 external memory interfaces
For more information on the DDR4 external memory interface IP, refer to the DDR3 Board Design Guidelines and DDR4 Board Design Guidelines sections in Intel Stratix 10 External Memory Interfaces IP User Guide.
DDR4 IP Instantiation
Configuration Setting | Description |
---|---|
Timing Parameters | As per the computing card's data specifications. |
Avalon Width Power of 2 | |
EMIF S10 IP > Memory > DQ Width | Currently, OpenCL™ does not support non-power-of-2 bus widths. As a result, the s10_ref Reference Platform uses the option that forces the DDR4 controller to power of 2. Use the additional pins of this x72 core for error checking between the memory controller and the physical module. |
Byte Enable Support | |
EMIF S10 IP > Memory > Data Mask | Enabled. Check the Data Mask option in the
Memory tab of the EMIF
Intel®
Stratix® 10 IP. Byte enable support is necessary in the core because the Intel® FPGA SDK for OpenCL™ requires byte-level granularity to all memories. |
Performance | |
EMIF S10 IP > Controller > Enable Reordering | Enabling the reordering of DDR4 memory accesses
and a deeper command queue look-ahead depth might provide increased
bandwidth for some OpenCL kernels. For a target application, adjust
these and other parameters as necessary. Note: Increasing the command queue look-ahead depth allows
the DDR4 memory controller to reorder more memory accesses to
increase efficiency, which improves overall memory
throughput.
|
Debug | Disabled for production. |
DDR4 Connection to PCIe Host
The OpenCL™ Memory Bank Divider component sits in the datapath of host and FPGA memory. It accepts input from the DMA engine or BAR4 of PCIE and outputs to the FPGA memory. It is mainly useful in OpenCL™ BSPs with multiple memory banks, where it creates a larger memory space. Implementations of appropriate clock crossing and pipelining are based on the design floorplan and the clock domains specific to the computing card. The OpenCL Memory Bank Divider section in the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide specifies the connection details of acl_bsp_snoop and acl_bsp_memorg_host ports.
The DDR4 IP core has one bank where its width and address configurations match those of the DDR4 SDRAM. Intel® tunes the other parameters such as burst size, pending reads, and pipelining. These parameters are customizable for an end application or board design.
- When designing a multi-bank OpenCL BSP (two DDR banks), you must change number of banks to 2 at the instantiation of the OpenCL Memory Bank divider. Furthermore, the acl_bsp_memorg_host wire must be connected to the kernel_interface block. The Memory Bank Divider has options of being used in two modes. The DEFAULT option uses banks in an interleaved manner to obtain more aggregated bandwidth. It can also be used in a non-interleaved mode, which creates a contiguous address space. This flag can be set during offline compilation using -no-interleave flag. The Address Span Extender (memwindow) component in the host to DDR path is used to transfer the unaligned portion of the data using BAR4 during DMA transfer or when DMA is disabled.
- Instruct the host to verify the successful calibration of the memory controller.
The INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/board.qsys Platform Designer system uses a custom UniPHY Status to AVS IP component to aggregate different UniPHY status conduits into a single Avalon® slave port named s. This slave port connects to the pipe_stage_host_ctrl component so that the PCIe host can access it.
DDR4 Connection to the OpenCL Kernel
A clock crosser is necessary because the kernel interface for the compiler must be clocked in the kernel clock domain. In addition, the width, address width, and burst size characteristics of the kernel interface must match those specified in the OpenCL Memory Bank Divider connecting to the host. Appropriate pipelining also exists between the clock crosser and the memory controller.
To get maximum kernel clock speed for Intel® Stratix® 10 devices, custom hyper-optimized CCB and AVMM Bridge are used in this path. The CCB is in mem.qsys and the bridge is stall-free that helps in fmax. The two end points, clock crosser and logic in the kernel supports AVMM stall latency.
Host Connection to OpenCL Kernels
The OpenCL Kernel Interface for S10 (kernel_interface) component exports an Avalon® master interface (kernel_cra) that connects to this slave port. The OpenCL Kernel Interface for S10 component also generates the kernel reset (kernel_reset) that resets all logic in the kernel clock domain. The kernel_interface component also bridges the interrupt signal generated by the kernel to the pcie_irq block.
The Intel® Stratix® 10 FPGA Development Kit Reference Platform has one DDR4 memory bank. As a result, the Reference Platform instantiates the OpenCL Kernel Interface component and sets the Number of global memory systems parameter to 1.
Partial Reconfiguration
- Partial Reconfiguration Controller S10 (alt_pr) IP
- Used to help support PR in this reference platform. During the PR process, the bitstream from host is transferred to this IP.
- Partial Reconfiguration Region Controller (pr_region_controller_0) IP
- Helps with the freeze and reset logic during the PR process. The interface of the pr_region.v is gated with the freeze signal from this IP to keep the static region in a known state during PR. For more information, refer to Intel® Quartus® Prime Pro Edition User Guide: Partial Reconfiguration.
- Register (pr_base_id)
- Stores the unique PR BASE ID that is generated on every base compile. During reprogramming of the device, this unique ID is used to identify the top compiles that are generated from the same base build. During the Intel® FPGA SDK for OpenCL™ Offline Compiler program flow, if the design being loaded and the existing design in the FPGA have the same PR BASE ID, then partial reconfiguration is used for reprogramming. Otherwise, full chip reprogramming is performed via JTAG interface.
Constant Address Bridge (constant_address_bridge)
It is an IP with AVMM slave and AVMM master, with all wires in between. This means a direct feed-through except that the address is ignored and the AVMM master always outputs address 0 while also outputting a constant 0x1 burst-count. This is important for the PR IP to ensure that the entire PR bitstream is written to the same target address of the PR IP.
Other Components in the Reference Design
The following are other components in the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform (s10_ref) that you should familiarize yourself with:
- On-Chip Memory ROM (acds_version_rom): This read-only memory is a placeholder to burn Intel® Quartus® Prime software versions during bitstream compile. The version is required for checks during Intel® FPGA SDK for OpenCL™ Offline Compiler programming flow.
- ACL temperature sensor for S10 (temperature_sensor): It is used to read and report the temperature sensed when using aocl diagnose utility.
Intel Stratix 10 FPGA System Design
Examples of design complexities:
- Designing a robust reset sequence
- Establishing a design floorplan
- Managing global routing
- Pipelining
Optimizations of these design complexities occur in tandem with one another to meet timing and board hardware optimization requirements.
Clocks
These clock domains include:
- 100 MHz PCIe* clock
- 116 MHz DDR4 clock
- 50 MHz general clock (config_clk)
- Kernel clock that can have any clock frequency
With the exception of the kernel clock, the s10_ref Reference Platform is responsible for the timing closure of these clocks. However, because the board design must clock cross all interfaces in the kernel clock domain, the board design also has logic in the kernel clock domain. It is crucial that this logic is minimal and achieves an Fmax higher than typical kernel performance.
Resets
These reset drivers include:
- The por_reset_counter in the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/board.qsys Platform Designer system implements the power-on-reset. The power-on-reset resets all the hardware on the device by issuing a reset for a number of cycles after the FPGA completes configuration.
- The PCIe® bus issues a perst reset that resets all hardware on the device.
- The OpenCL™ Kernel Interface component issues the kernel_reset that resets all logic in the kernel clock domain.
The power-on-reset and the perst reset are combined into a single global_reset; therefore, there are only two reset sources in the system (that is, global_reset and kernel_reset). However, these resets are explicitly synchronized across the various clock domains, resulting in several reset interfaces.
Important Considerations Regarding Resets
- Synchronizing resets to different clock domains might cause several high
fan-out resets.
Platform Designer automatically synchronizes resets to the clock domain of each connected component. In doing so, the Platform Designer instantiates new reset controllers with derived names that might change when the design changes. This name change makes it difficult to make and maintain global clock assignments to some of the resets. As a result, for each clock domain, there are explicit reset controllers. For example, global_reset drives reset_controller_pcie and reset_controller_ddr4. However, they are synchronized to the PCIe and DDR4 clock domains, respectively.
- Resets and clocks must work together to propagate reset to all logic.
Resetting a circuit in a given clock domain involves asserting the reset over a number of clock cycles. However, your design may apply resets to the PLLs that generate the clocks for a given clock domain. This means a clock domain can hold in reset without receiving the clock edge that is necessary for synchronous resets. In addition, a clock holding in reset might prevent the propagation of a reset signal because it is synchronized to and from that clock domain. Avoid such situations by ensuring that your design satisfies the following criteria:
- Generate the global_reset signal off the free-running config_clk.
- The ddr4_calibrate IP resets the External Memory Interface controller separately.
- Apply resets to both reset interfaces of a clock-crossing bridge or FIFO
component.
FIFO content corruption might occur if only part of a clock-crossing bridge or a dual-clock FIFO component is reset. These components typically provide a reset input for each clock domain; therefore, reset both interfaces or none at all. For example, in the s10_ref Reference Platform, kernel_reset resets all the kernel clock-crossing bridges between DDR on both the m0_reset and s0_reset interfaces.
Floorplan
Dependencies
- Chip Planner
- Logic Lock regions
Intel® performed the following tasks iteratively to derive the floorplan of the s10_ref Reference Platform:
- Compile a design without any region or floorplanning
constraints in the flat revision.
Intel® recommends that you compile the design with several seeds. For more information, refer to Integrating Your Intel Stratix 10 Custom Platform with the Intel FPGA SDK for OpenCL.
- Examine the placement of the IP cores (for example, PCIe* , DDR4, Avalon® interconnect pipeline stages and adapters) for candidate locations, as determined by the Intel® Quartus® Prime Pro Edition software's Fitter. In particular, Intel® recommends examining the seeds that meet or almost meet the timing constraints.
For the s10_ref Reference Platform, the PCIe* I/O is located in the lower left corner of the Intel® Stratix® 10 FPGA. The DDR4 I/O is located on the top part of the left I/O column of the device. Because the placements of the PCIe and DDR4 IP components tend to be close to the locations of their respective I/Os, you can apply Logic Lock regions to constrain the IP components to those candidate regions.
As shown in this Chip Planner view of the floorplan, the Logic Lock region (freeze_wrapper_inst|pr_region_inst) is spread out between the PCIe I/O and the top region of the left I/O column (that is, the DDR4 I/O area). In case of s10_ref reference platform, the Logic Lock region contains most of the kernel logic. The scatter area (shown in red) depicts the board interface (that is, static region) that is placed outside the ten Logic Lock regions assigned to kernel logic in base.qsf. The following figure illustrates the assignment in base.qsf.

You must create a dedicated Logic Lock region for the OpenCL™ kernel system for your custom platform. Furthermore, if you are logic-locking the board interface logic, ensure that you do not place kernel logic in the board's Logic Lock regions.
Intel® recommends the following strategies to maximize the available FPGA resources for the OpenCL kernel system to improve kernel routability:
- The size of a Logic Lock region should be just large enough to contain the board logic and to meet timing constraints of the board clocks. Oversized Logic Lock regions consume FPGA resources unnecessarily.
- Avoid creating tightly-packed
Logic Lock
regions that cause very
high logic utilization and high routing congestion.
High routing congestion within the Logic Lock regions might decrease the Fitter's ability to route OpenCL kernel signals through the regions.
In the case where the board clocks are not meeting timing and the critical path is between the Logic Lock regions (that is, across region-to-region gap), insert back-to-back pipeline stages on paths that cross the gap. For example, if the critical path is between Region 1 and Region 2, lock down the first pipeline stage (an Avalon-MM Pipeline Bridge component) to Region 1, lock down the second pipeline stage to Region 2, and connect the two pipeline stages directly. This technique ensures that pipeline registers are on both sides of the region-to-region gap, thereby minimizing the delay of paths crossing the gap.
Refer to the Pipelining section for more information.
Global Routing
There is no restriction on the placement location of the OpenCL™ kernel on the device. As a result, the kernel clocks and kernel reset must distribute high fan-out signals globally.
Pipelining
In the Platform Designer, you can implement pipelines via an Avalon® -MM Pipeline Bridge component by setting the following pipelining parameters within the Avalon® MM Pipeline Bridge dialog box:
- Select Pipeline command signals
- Select Pipeline response signals
- Select both Pipeline command signals and Pipeline response signals
Examples of Pipeline Implementation
- Signals that traverse long distances because of the floorplan's shape or the
region-to-region gaps require additional pipelines.
The DMA at the bottom of the FPGA must connect to the DDR4 memory at the top of the FPGA. To achieve timing closure of the board interface logic at a DDR4 clock speed of 233 MHz, additional pipeline stages between the OpenCL™ Memory Bank Divider component and the DDR4 controller IP are necessary. In the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform's board.qsys Platform Designer system, the pipeline stages are named pipe_stage .
The middle pipeline stage, kernel_ddra_bridge , combines both the direct kernel DDR4 accesses and the accesses through the OpenCL Memory Bank Divider. The multistage pipeline approach ensures that the kernel entry point to the pipeline is neither geared towards the OpenCL Memory Bank Divider, which is close to the PCIe* IP core, nor the DDR4 IP core, which is at the very top of the FPGA.
DDR4 Calibration
The driver within the s10_ref Reference Platform can detect a failed calibration via the Uniphy Status to AVS IP, and retrigger calibration through the ddr4_calibrate IP block.
- ACL Uniphy Status to AVS for A10 (uniphy_status_20nm): Helps to read the DDR status from the DDR IP.
- ACL SW Reset (ddr4_calibrate): Issues reset to DDR IP.
Guaranteed Timing Closure of the Intel Stratix 10 GX FPGA Development Kit Reference Platform Design
Both the SDK and the BSP contribute to the implementation of the SDK's guaranteed timing closure feature.
The SDK provides the IP to generate the kernel clock, and a post-flow script that ensures this clock is configured with a safe operating frequency confirmed by timing analysis. The SDK imports a post-fit netlist during a top compile that has already achieved timing closure on all non-kernel clocks.
Supply the Kernel Clock
The REF_CLK_RATE parameter specifies the frequency of the reference clock that connects to the kernel PLL (kernel_pll_refclk). For the s10_ref Reference Platform, the REF_CLK_RATE frequency is 50 MHz.
The KERNEL_TARGET_CLOCK_RATE parameter specifies the frequency that the Intel® Quartus® Prime Pro Edition software attempts to achieve during compilation. The board hardware contains some logic that the kernel clock clocks. At a minimum, the board hardware includes the clock crossing hardware. To prevent this logic from limiting the Fmax achievable by a kernel, the KERNEL_TARGET_CLOCK_RATE must be higher than the frequency that a simple kernel can achieve on your device. For the Intel® Stratix® 10 GX FPGA Development Kit that the s10_ref Reference Platform targets, the KERNEL_TARGET_CLOCK_RATE is 500 GHz.
Guarantee Kernel Clock Timing
In the import (that is, top) revision compilation, the compilation script compile_script.tcl invokes the INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/scripts/post_flow_pr.tcl Tcl script in the s10_ref Reference Platform after every Intel® Quartus® Prime Pro Edition software compilation using quartus_cdb.
The post_flow_pr.tcl script also determines the kernel clock and configures it to a functional frequency.
Provide a Timing-Closed Post-Fit Netlist
Dependencies
Intel® Quartus® Prime Pro Edition compiler
Intel® Quartus® Prime software provides several mechanisms for preserving the placement and routing of some previously compiled logic and importing this logic into a new compilation. For Intel® Stratix® 10 devices, the previously compiled logic is imported into the compilation flow.
The Intel® Quartus® Prime Pro Edition compilation flow can preserve the placement and routing of the board interface partition via the exported Intel® Quartus® Prime Archive File. base.qar and base.qdb files contain all the database files for the base compilation of root_partition. The s10_ref Reference Platform is configured with the project revisions and partitioning that are necessary to implement the compilation flow. By default, the SDK invokes the Intel® Quartus® Prime Pro Edition software on the top revision. This revision is configured to import and restore the base.qdb file, which has been precompiled and exported from a base revision compilation.
When developing your Custom Platform from the s10_ref Reference Platform, it is essential to maintain the flat.qsf, base.qsf, top.qsf, and opencl_bsp_ip.qsf Intel® Quartus® Prime Settings Files.
The s10_ref Reference Platform includes two additional partitions: the Top partition and the kernel partition. The Top partition contains all logic, and the kernel partition contains only the kernel logic.
Intel FPGA SDK for OpenCL Compilation Flows
The image provides an overview of scripts called and respective stages executed during each phase of the compile, depending on which project revision (flat, base or top) is targeted. As soon as you execute the aoc command, for example, aoc -bsp-flow=<flat/base/top> kernel.cl -o kernel.aocx; the Intel® FPGA SDK for OpenCL™ Offline Compiler looks at the board_spec.xml associated with the provided BSP and calls the pre_flow_pr.tcl script for the particular project revision. It then goes into some decision stages and runs compile_script.tcl and post_flow_pr.tcl scripts underneath for a complete successful compile. For more details about the information provided in the flowchart, view scripts under the hardware/s10gx directory inside your board support package.
Compile Flow
Following are the types of compile flows:
- Flat Compile
- A flat revision uses the flat.qsf settings file and performs a flat compilation of the entire design (BSP along with kernel generated hardware). The flat.qsf has minimal location constraints, and generally has all of the pin assignments (sourced using device.tcl) and basic settings to compile a hardware design. To compile a flat revision of your BSP, use -bsp-flow=flat modifier option with the aoc command.
- Base Compile
- A base revision uses the base.qsf settings file to compile the board support package. The base.qsf uses all the flat.qsf settings and adds the required location constraints and Logic Lock regions on top of it. The kernel clock target is relaxed during the base compilation so that the BSP hardware has more freedom to meet timing. A base.qar database is created to preserve the BSP hardware, which is the static region. The revision can be compiled using -bsp-flow=base modifier option with the aoc command.
- Top Compile
- The top flow, also known as the import compile, is generally the default flow of kernel compiles. It uses the top.qsf settings file for compilation and the base.qar from a base revision compile to import the pre-compiled netlist of the static region. It guarantees the timing closed static region and compiles only the kernel generated hardware. It also increases the kernel clock target to obtain the best kernel maximum operating frequency (fmax).
Platform Designer System Generation
The INTELFPGAOCLSDKROOT environment variable points to the location of the Intel® FPGA SDK for OpenCL™ installation directory.
The board.qsys Platform Designer system represents the bulk of the static region. The pre_flow_pr.tcl script generates the Platform Designer systems on the fly before the beginning of the Intel® Quartus® Prime compilation flow in both the flat and base revision compilations.
QAR/QDB File Generation
The INTELFPGAOCLSDKROOT/board/s10_ref/hardware/s10gx/scripts/post_flow_pr.tcl script creates the base.qdb file. The qar_ip_files.tcl file invokes the qar_ip_files proc command to export the entire base revision compilation database to the base.qar file that also contains the base.qdb, pr_base.id and base.sdc files. For your Custom Platform, you do not need to add these files to the board directory (that is, INTELFPGAOCLSDKROOT/board/<custom_platform>/hardware/<board_name> ) separately. They are generated when you do a base revision compile with your custom platform.
Addition of Timing Constraints
The order of the application of time constraints is based on the order of appearance of the top.sdc and top_post.sdc files in the flat.qsf file. To ensure proper SDC ordering, the opencl_bsp_ip.qsf file is sourced between top.sdc and top_post.sdc files. All IPs are added to opencl_bsp_ip.qsf during aoc compile flow. This ensures that the SDC order is top.sdc followed by SDCs for the IP components and then top_post.sdc in all aoc compiles.
#Make the kernel reset multicycle #changes made to the multicycle path here need to also be reflected in the #multicycle value in scripts/adjust_plls_s10.tcl Set_multicycle_path -to * -setup 15 -from {freeze_wrapper_inst|board_kernel_reset_n_reg} Set_multicycle_path -to * -hold 14 -from {freeze_wrapper_isnt|board_kernel_reset_reset_n_reg}
Connection of the Intel Reference Platform to the Intel FPGA SDK for OpenCL
For each hardware design inside your Custom Platform, your Custom Platform requires a board_spec.xml that describes the hardware to the SDK.
The following sections describe the implementation of these files for the Intel® Stratix® 10 GX FPGA Development Kit Reference Platform.
Describe the Intel Stratix 10 GX FPGA Development Kit Reference Platform to the Intel FPGA SDK for OpenCL
Details of each field in the board_env.xml file are available in the Creating the board_env.xml File section of the Intel® FPGA SDK for OpenCL™ Custom Platform Toolkit User Guide.
Describe the Intel Stratix 10 GX FPGA Development Kit Reference Platform Hardware to the Intel FPGA SDK for OpenCL
Device
The device section contains the name of the device model file available in the INTELFPGAOCLSDKROOT/share/models/dm directory of the SDK and in the board spec.xml file. The used_resources element accounts for all logic outside of the kernel partition. The value of used_resources for alms equals the difference between the total number of adaptive logic modules (ALMs) used in final placement and the total number of ALMs available to the kernel partition. You can derive this value from the Partition Statistic section of the Fitter report after a compilation. Consider the following ALM categories within an example Fitter report:

The value of used_resources equals the total number of ALMs in l minus the total number of ALMs in freeze wrapper inst|pr_region_inst. In the example above, used_resources = 80273 - 25596 = 54677 ALMs.
Global Memory
In the board_spec.xml file, there is one global_mem section for DDR memory. Assign the string DDR to the name attribute of the global_mem element. The board instance in Platform Designer provides all of these interfaces. Therefore, the string board is specified in the name attribute of all the interface elements within global_mem.
- DDR
Because DDR memory serves as the default memory for the board that the s10_ref Reference Platform targets, its address attribute begins at zero. Its config_addr is 0x018 to match the acl_bsp_memorg_host0x018 conduit used to connect to the corresponding OpenCL Memory Bank Divider for DDR.
Attention: The width and burst sizes must match the parameters in the OpenCL Memory Bank Divider for DDR ( memory_bank_divider_ddr4a).
Interfaces
The interfaces section describes kernel clocks, reset, CRA, and snoop interfaces. The OpenCL Memory Bank Divider for the default memory (in this case, memory_bank_divider_ddr4a) exports the snoop interface described in the interfaces section. The width of the snoop interface should match the width of the corresponding streaming interface.
Intel Stratix 10 FPGA Programming Flow
In the order from the longest to the shortest configuration time, the two FPGA programming methods are as follows:
- To maintain the previously programmed state after power cycling, use Flash programming.
- To replace both the FPGA periphery and the core, use the Intel® Quartus® Prime Programmer command-line executable (quartus_pgm) to program the device via cables such as the Intel® FPGA Download Cable (formerly USB-Blaster).
For more information about Intel® Stratix® 10 GX FPGA programming flow and board bring-up, refer to the Intel® Stratix® 10 Development Kit Initialization guide in the <S10GX_BSP_Directory>/bringup directory.
Implementation of Intel FPGA SDK for OpenCL Utilities
aocl install
Windows
The install.bat script is in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This install.bat script triggers the install executable to install the SSG Driver on the host machine.
Linux
The install script is located in the <your_custom_platform>/linux64/libexec directory. This install script first compiles the kernel module in a temporary location and then performs the necessary setup to enable automatic driver loading after reboot.
aocl uninstall
Windows
The uninstall.bat script is located in the <your_custom_platform>\windows64\libexec directory, where <your_custom_platform> points to the top-level directory of your Custom Platform. This uninstall.bat script triggers the uninstall executable to uninstall the SSG Driver on the host machine.
Linux
The uninstall script is located in the <your_custom_platform>/linux64/libexec directory. This uninstall script removes the driver module from the linux kernel.
aocl program
aocl flash
aocl diagnose
Without an argument, the utility returns the overall information of all the devices installed in a host machine. If a specific device name is provided as an argument (that is, aocl diagnose <device_name> ), the diagnose utility runs a memory transfer test and then reports the host-device transfer performance.
You can run the diagnose utility for multiple devices (that is, aocl diagnose <device_name1> <device_name2> <device_name3> ). If you want to run the diagnose utility for all devices, use the all option (that is aocl diagnose all).
aocl list-devices
The list-devices utility is similar to the diagnose utility. It first verifies the installation of the kernel driver and then lists all the devices.
Considerations in Intel Stratix 10 GX FPGA Development Kit Reference Platform Implementation
- The quartus_syn executable reads the SDC files. However, it does not support the Tcl command get_current_revision. Therefore, in the top_post.sdc file, a check is in place to determine whether quartus_syn has read the file before checking the current version.
In addition to these workarounds, take into account the following considerations:
- Intel® Quartus® Prime compilation is only ever performed after the Intel® FPGA SDK for OpenCL™ Offline Compiler embeds an OpenCL kernel inside the system.
- Perform Intel® Quartus® Prime compilation after you install the Intel® FPGA SDK for OpenCL™ and set the INTELFPGAOCLSDKROOT environment variable to point to the SDK installation.
- The name of the directory where the Intel® Quartus® Prime project resides must match the name field in the board_spec.xml file within the Custom Platform. The name is case sensitive.
- The PATH or LD_LIBRARY_PATH environment variable must point to the MMD library in the Custom Platform.
Developing Your Intel Stratix 10 Custom Platform
Developing your Custom Platform requires in-depth knowledge of the contents in the following documents and tools:
- Intel® FPGA SDK for OpenCL™ Custom Platform User Guide
- Contents of the SDK Custom Platform Toolkit
- Intel® FPGA SDK for OpenCL™ Intel® Arria® 10 GX FPGA Development Kit Reference Platform Porting Guide
- Documentation for all the Intel® FPGA IP in your Custom Platform
- Intel® FPGA SDK for OpenCL™ Getting Started Guide
- Intel® FPGA SDK for OpenCL™ Programming Guide
In addition, you must independently verify all IP on your computing card (for example, PCIe® controllers and DDR4 external memory).
Initializing Your Intel Stratix 10 Custom Platform
- Copy the INTELFPGAOCLSDKROOT/board/s10_ref directory, where INTELFPGAOCLSDKROOT is the location of the SDK installation.
- Paste the s10_ref directory into a directory that you own (that is, not a system directory) and then rename it ( <your_custom_platform> ).
- Choose the s10gx board variant in the <your_custom_platform>/hardware directory that matches the production silicon for the Intel® Stratix® 10 FPGA as the basis of your design.
- Rename s10gx board variant to match the name of your FPGA board ( <your_custom_platform>/hardware/<board_name> ).
- Modify the <your_custom_platform>/board_env.xml file so that the name and default fields match the changes you made in step 2 and step 4, respectively.
- Modify the my_board name in the inside <your_custom_platform>/hardware/<board_name>/board_spec.xml file to match the change you made in step 2.
Modifying the Intel Stratix 10 GX FPGA Development Kit Reference Platform Design
You can add a component in Platform Designer and connect it to the existing system, or add a Verilog file to the available system. After adding the custom components, connect those components in Platform Designer.
- Instantiate your PCIe controller, as described in Host-to- Intel® Stratix® 10 Communication over PCIe section.
-
Instantiate any memory controllers and I/O channels. You can
add the board interface hardware either as Platform Designer
components in the board.qsys
Platform Designer system or as HDL in the top.v file.
The board.qsys file and the top.v file are in the <your_custom_platform>/hardware/<board_name> directory.
- Modify the device.tcl file to match all the correct settings for the device on your board. The device.tcl file is sourced into opencl_bsp_ip.qsf and flat.qsf files.
-
Modify the <your_custom_platform>/hardware/<board_name>/flat.qsf file
to change settings for your system. The base.qsf and top.qsf files
will include all settings from the flat.qsf file.
All .qsf files are in the <your_custom_platform>/hardware/<board_name> directory. Ensure that the flat.qsf file does not have any IP_FILE assignments after the assignment that adds top_post.sdc to the project since this changes the order in which SDC files are read during compile. Refer to Addition of Timing Constraints for more information about SDC ordering.
Integrating Your Intel Stratix 10 Custom Platform with the Intel FPGA SDK for OpenCL
- Set AOCL_BOARD_PACKAGE_ROOT to point to your custom platform. Use flat.qsf file in INTELFPGAOCLSDKROOT/board/s10_ref reference platform to determine the type of information you must include in the flat.qsf file for your Custom Platform.
- Update the <your_custom_platform>/hardware/<board_name>/board_spec.xml file. Ensure that there is at least one global memory interface, and all global memory interfaces correspond to the exported interfaces from the board.qsys Platform Designer system file.
-
After all your hardware design changes are finalized, compile flat revision with
several seeds of the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel until you generate a design that closes timing cleanly.
To specify the seed number during compile, include the -seed=<N> option in your aoc command. Use -bsp-flow=flat option in your aoc command for flat compile.
aoc -bsp-flow=flat boardtest.cl -o=bin/boardtest.aocx
-
Based on the output of your flat revision compiles, establish the floorplan of your
design in base revision. Add Logic Lock regions in
base.qsf only. base.qsf and
top.qsf files automatically inherit all the settings in the
flat.qsf file.
Important: Consider all design criteria outlined in the Intel® Stratix® 10 FPGA System Design section in this guide. Compile base revision with several seeds of the INTELFPGAOCLSDKROOT/board/ custom_platform_toolkit/tests/boardtest/boardtest.cl kernel until you generate a design that closes timing cleanly. This flow is used to create the timing closed base database for the static region which is needed for guaranteed timing support. Use -bsp-flow=base option in your aoc command for base compile.
aoc -bsp-flow=base boardtest.cl -o=bin/boardtest.aocx
Attention: In a typical development of a custom platform, designers generally validate the functionality of boardtest using the flat flow before starting to add guaranteed timing functionality to their BSP. - From the compiled output directory for base revision compile, copy base.qar file into your Custom Platform hardware directory to replace old base.qar with new base.qar containing post-fit netlist for your reference platform.
- Make sure AOCL_BOARD_PACKAGE_ROOT is set to your reference platform with new base.qar and compile top revision, that is, default compilation flow. Confirm that you can use the .aocx file to reprogram the FPGA by invoking the aocl program acl0 boardtest.aocx command.
-
Using the default compilation flow, test your
base.qar file across several OpenCL design examples and confirm
that the following criteria are satisfied:
- All compilations close timing.
- The OpenCL design examples achieve satisfactory fmax (Check the acl quartus_report.txt for achieved fmax).
Setting up the Intel Stratix 10 Custom Platform Software Development Environment
-
To compile the MMD layer for Windows, perform the following tasks:
- Install the GNU make utility on your development machine.
- Install a version of Microsoft Visual Studio that has the ability to compile 64-bit software (for example, Microsoft Visual Studio version 2010 Professional).
- Set the environment for using Microsoft Visual Studio.
- Set the development environment so that SDK users can invoke commands and utilities at the command prompt.
- Modify the <your_custom_platform_name>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- To check that you have set up the software development environment properly, invoke the make or make clean command.
-
To compile the MMD layer for Linux, perform the following tasks:
- Ensure that you use a Linux distribution that Intel® supports (for example, GNU Compiler Collection (GCC) version 4.47).
- Modify the <your_custom_platform>/source/Makefile.common file so that TOP_DEST_DIR points to the top-level directory of your Custom Platform.
- To check that you have set up the software environment properly, invoke the make or make clean command.
Establishing Intel Stratix 10 Custom Platform Host Communication
-
Program your FPGA device with the
<your_custom_platform>/hardware/<board_name>/base.sof file
and then reboot your system.
The base.sof file is generated during base revision compile when integrating your Custom Platform with the Intel® FPGA SDK for OpenCL™ . Refer to the Integrating Your Intel® Stratix® 10 Custom Platform with the Intel® FPGA SDK for OpenCL™ section for more information.
-
Confirm that your operating system recognizes a
PCIe*
device with your vendor and device
IDs.
- For Windows, open the Device Manager and verify that the correct device and IDs appear in the listed information.
- For Linux, invoke the lspci command and verify that the correct device and IDs appear in the listed information.
- Set the environment variable AOCL_BOARD_PACKAGE_ROOT to point to your custom platform.
- Run the aocl install <path_to_customplatform> utility command to install the kernel driver on your machine.
-
For Windows, set the PATH environment variable. For
Linux, set the LD_LIBRARY_PATH environment
variable.
For more information about the settings for PATH and LD_LIBRARY_PATH, refer to Setting the Intel® FPGA SDK for OpenCL™ User Environment Variables in the Intel® FPGA SDK for OpenCL™ Getting Started Guide.
- Modify the version_id_test function in your <your_custom_platform>/source/host/mmd/acl_pcie_device.cpp MMD source code file to exit after reading from the version ID register. Rebuild the MMD software.
- Run the aocl diagnose utility command and confirm that the version ID register reads back the ID successfully. You may set the environment variables ACL_HAL_DEBUG and ACL_PCIE_DEBUG to a value of 1 to visualize the result of the diagnostic test on your terminal.
Branding Your Intel Stratix 10 Custom Platform
- In the software development environment available with the s10_ref Reference Platform, replace all references of "s10_ref" with the name of your Custom Platform.
- Modify the PACKAGE_NAME and MMD_LIB_NAME fields in the <your_custom_platform>/source/Makefile.common file.
- Modify the name, linklib, and mmlibs elements in <your_custom_platform>/board_env.xml file to your custom MMD library name.
-
In your Custom Platform, modify the following lines of code in
the hw_pcie_constants.h file to include
information of your Custom Platform:
#define ACL_BOARD_PKG_NAME "s10_ref" #define ACL_VENDOR_NAME "Intel Corporation" #define ACL_BOARD_NAME "Stratix 10 Reference Platform"
For Windows, the hw_pcie_constants.h file is in the <your_custom_platform>\source\include folder. For Linux, the hw_pcie_constants.h file is in the <your_custom_platform>/linux64/driver directory.
Note: The ACL_BOARD_PKG_NAME variable setting must match the name attribute of the board_env element that you specified in the board_env.xml file. -
Define the Device ID, Subsystem Vendor ID, Subsystem Device
ID, and Revision ID, as defined in the Device
Identification Registers for
Intel®
Stratix® 10
PCIe Hard IP
section.
Note: The PCIe* IDs in the hw_pcie_constants.h file must match the parameters in the PCIe® controller hardware.
- Update your Custom Platform's board.qsys Platform Designer system and the hw_pcie_constants.h file with the IDs defined in step 5.
- For Windows, update the <your_custom_platform>\windows64\driver\Shim.inf file.
- Run make in the <your_custom_platform>/source directory to generate the driver.
Changing the Device Part Number
Update the device part number in the following files within the <your_custom_platform>/hardware/<board_name> directory:
-
In the device.tcl file,
change the device part number in the set global
assignment -name DEVICE 1SG280LU2F50E2VG QSF assignment.
The updated device number will appear in the base.qsf, top.qsf, flat.qsf and opencl_bsp_ip.qsf files.
- In the board.qsys and mem.qsys files, change all occurrences of 1SG280LU2F50E2VG.
Connecting the Memory in the Intel Stratix 10 Custom Platform
- In your Custom Platform, instantiate your external memory IP based on the information in the DDR4 as Global Memory for OpenCL Applications section. Update the information pertaining to the global_mem element in the <your_custom_platform>/hardware/<board_name>/board_spec.xml file.
- Remove the boardtest hardware configuration file (that is, aocx) that you created during the integration of your Custom Platform with the Intel® FPGA SDK for OpenCL™ .
-
Recompile the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the SDK installation.
- Reprogram the FPGA with the new boardtest hardware configuration file and then reboot your machine.
-
Modify the wait_for_uniphy function in the
acl_pcie_device.cpp MMD source code
file to exit after checking the UniPHY status register. Rebuild the MMD
software.
For Windows/Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder.
-
Run the
aocl
diagnose
SDK utility and confirm that the host
reads back both the version ID and the value 0 from the uniphy_status component.
The utility should return the message Uniphy are calibrated.
-
Consider analyzing your design in the Signal Tap logic analyzer to confirm the successful calibration of
all memory controllers.
Note: For more information on Signal Tap logic analyzer, download the Signal Tap II Logic Analyzer tutorial from the University Program Tutorial page.
Modifying the Kernel PLL Reference Clock
- In the <your_custom_platform>/hardware/<board_name>/board.qsys file, update the REF_CLK_RATE parameter value on the kernel_clk_gen IP module.
- In the <your_custom_platform>/hardware/<board_name>/top.sdc file, update the create_clock assignment for config_clk in top.sdc .
- In the <your_custom_platform>/hardware/<board_name>/top.v file, update the comment for the config_clk input port, which is connected to kernel_pll_refclk in board.qsys .
Integrating an OpenCL Kernel in Your Intel Stratix 10 Custom Platform
-
Perform the steps outlined in
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/README.txt
file to build the hardware configuration file from the
INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/boardtest.cl
kernel source file.
The environment variable INTELFPGAOCLSDKROOT points to the location of the Intel® FPGA SDK for OpenCL™ installation.
- Program your FPGA device with the hardware configuration file you created in step 1 and then reboot your machine.
-
Remove the early-exit modification in the version_id_test function in the acl_pcie_device.cpp file that you implemented when you
established communication between the board and the host interface.
For Windows/Linux, the acl_pcie_device.cpp file is in the <your_custom_platform>\source\host\mmd folder.
- Recompile the MMD.
-
Invoke the
aocl
diagnose
<device_name>
command,
where <device_name> is the string you
define in your Custom Platform to identify each board.
In case you have only one variant, invoke the aocl diagnose acl0 command.
-
Build the boardtest host
application using the .sln file (Windows)
or
Makefile.linux (Linux) in the
SDK's Custom Platform
Toolkit.
For Windows, the .sln file for Windows is in the INTELFPGAOCLSDKROOT\board\custom_platform_toolkit\tests\boardtest\host folder. For Linux, the Makefile.linux is in the INTELFPGAOCLSDKROOT/board/custom_platform_toolkit/tests/boardtest/host directory.
- Set the environment variable CL_CONTEXT_COMPILER_MODE_INTELFPGA to a value of 3 and run the boardtest host application. The boardtest evaluates host to memory and kernel to memory connections. You may have to make modifications to boardtest host code base on your hardware design changes.
-
Using the default compilation flow, test your custom platform
file across several OpenCL design examples and confirm that the OpenCL design
examples function correctly on the accelerator board.
For more information about CL_CONTEXT_COMPILER_MODE_INTELFPGA, refer to Troubleshooting Intel® Stratix® 10 GX FPGA Development Kit Reference Platform Porting Issues.
Troubleshooting Intel Stratix 10 GX FPGA Development Kit Reference Platform Porting Issues
Environment Variable | Description |
---|---|
ACL_HAL_DEBUG | Set this variable to a value of 1 to 5 to enable increasing debug output from the Hardware Abstraction Layer (HAL), which interfaces directly with the MMD layer. |
ACL_PCIE_DEBUG | Set this variable to a value of 1 to 10000 to enable increasing debug output from the MMD. This variable setting is useful for confirming that the version ID register was read correctly and the UniPHY IP cores are calibrated. |
ACL_PCIE_JTAG_CABLE | Set this variable to override the default quartus_pgm argument that specifies the cable number. The default is cable 1. If there are multiple Intel® FPGA Download Cables, you can specify a particular one here. |
ACL_PCIE_JTAG_DEVICE_INDEX | Set this variable to override the default quartus_pgm argument that specifies the FPGA device index. By default, this variable has a value of 2. If the FPGA is not the first device in the JTAG chain, you can customize the value. |
ACL_PCIE_USE_JTAG_PROGRAMMING | Set this variable to force the MMD to reprogram the FPGA using the JTAG cable. |
ACL_PCIE_DMA_USE_MSI | Set this variable if you want to use MSI for DMA transfers on Windows. |
CL_CONTEXT_COMPILER_MODE_INTELFPGA | Unset this variable or set it to a value of 3. The OpenCL host runtime reprograms the FPGA as needed, which it does at least once during initialization. To prevent the host application from programming the FPGA, set this variable to a value of 3. |
Document Revision History
Document Version | Intel® Quartus® Prime Version | Changes |
---|---|---|
2019.04.01 | 19.1 |
|
2017.11.21 | 17.1 | Initial release. |