Multi Channel DMA for PCI Express IP Design Example User Guide
Version Information
Updated for: |
---|
Intel® Quartus® Prime Design Suite 20.2 |
IP Version 20.0.0 |
1. Terms and Acronyms
Term | Definition |
---|---|
PCIe* | Peripheral Component Interconnect Express ( PCI Express* ) |
DMA | Direct Memory Access |
MCDMA | Multi Channel Direct Memory Access |
PIO | Programmed Input/Output |
H2D | Host-to-Device |
D2H | Device-to-Host |
H2DDM | Host-to-Device Data Mover |
D2HDM | Device-to-Host Data Mover |
QCSR | Queue Control and Status register |
GCSR | General Control and Status Register |
IP | Intellectual Property |
HIP | Hard IP |
PD | Packet Descriptor |
QID | Queue Identification |
TIDX | Queue Tail Index (pointer) |
HIDX | Queue Head Index (pointer) |
TLP | Transaction Layer Packet |
IMMWR | Immediate Write Operation |
MRRS | Maximum Read Request Size |
CvP | Configuration via Protocol |
PBA | Pending Bit Array |
Avalon® -MM | Avalon® Memory-Mapped Interface |
Avalon® -ST | Avalon® Streaming Interface |
2. Design Example Detailed Description
2.1. Design Example Overview
The Multi Channel DMA for PCI Express IP Design Examples demonstrate a Multi Channel DMA solution for Intel® Stratix® 10 devices using the H-Tile Gen3 x16 hard IP and soft IP implemented in the FPGA fabric.
You can generate the design example from the Example Designs tab of the Multi Channel DMA for PCI Express IP Parameter Editor. For user interface, you can choose either Avalon-ST or Avalon-MM Interface. You can allocate up to 8 DMA channels when Avalon-MM Interface type is selected. For Avalon-ST Interface, DMA channel and Avalon-ST port has 1:1 mapping. You can also configure the PCIe BAR2 size that is mapped to the Avalon-MM PIO Master port.
2.2. Hardware and Software Requirements
- Intel® Quartus® Prime Pro Edition Software version 20.2
- Modelsim, VCS, or NCSim
- Intel® Stratix® 10 MX or GX FPGA Development Kit
For details on the design example simulation steps and running Hardware test, refer to the Quick Start Guide.
For more information on development kits, refer to Intel® Stratix® 10 FPGA Development Kits on Intel website.
2.3. Avalon-ST PIO using MCDMA Bypass mode
This design example enables Avalon-MM PIO master which bypasses the DMA path. The Avalon-MM PIO master allows application to perform single, non-bursting register read/write operation with on-chip memory.
- resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire Intel® Stratix® 10 FPGA fabric enters user mode
- MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master) port that is mapped to PCIe BAR2
- PIO test: -o
2.3.1. Simulation Result
Testbench writes 4 KB of incrementing pattern to on-chip memory and read back via Avalon-MM PIO interface. This design example testbench doesn’t simulate H2D/D2H data movers.


2.3.2. Hardware Test Result

2.4. Avalon-ST Packet Generate/Check
This design example performs H2D and D2H multi channel DMA via Avalon-ST streaming. The Multi Channel DMA for PCI Express IP core provides four independent Avalon-ST Source/Sink ports. DMA channel and Avalon-ST port has 1:1 mapping.
For H2D streaming, Multi Channel DMA sends the data to Avalon-ST packet checker via four Avalon-ST Source ports. The Packet Checker validates the received data. For D2H streaming, Multi Channel DMA receives the data from Avalon-ST packet generator via Avalon-ST Sink ports.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block. Also, test application software, perfq_app, uses the Avalon-MM PIO Master port to configure the Packet Generator and Checker.
- resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire Intel® Stratix® 10 FPGA fabric enters user mode
- MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master) port that is mapped to PCIe BAR2
- GEN_CHK – Packet Generator and Checker for MCDMA. Connected to the MCDMA Avalon-ST Source (h2d_st_x) and Avalon-ST Sink (d2h_st_x) ports
- PIO test: -o
- DMA test: -t (Tx), -r (Rx), -z (Bidirection)
2.4.1. Simulation Results




2.4.2. Hardware Test Results







2.5. Avalon-ST Device-side Packet Loopback
This design example performs H2D and D2H multi channel DMA via Avalon-ST streaming. The Multi Channel DMA for PCI Express IP core provides four independent Avalon-ST Source/Sink ports. DMA channel and Avalon-ST port has 1:1 mapping.
For H2D streaming, Multi Channel DMA sends the data to Avalon-ST loopback FIFOs via four Avalon-ST Source ports. For D2H streaming, Multi Channel DMA receives the data from Avalon-ST loopback FIFOs via Avalon-ST Sink ports.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block.
- resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire Intel® Stratix® 10 FPGA fabric enters user mode
- MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master) port that is mapped to PCIe BAR2
- FIFO_ST0 , FIFO_ST1 , FIFO_ST2 , and FIFO_ST3 – Avalon-ST FIFOs for streaming loopback. Connected to the MCDMA Avalon-ST Source (h2d_st_x) and Avalon-ST Sink (d2h_st_x) ports
- PIO test: -o
- DMA test: -i (performance loopback operation
where the Tx and Rx are run in two different threads), -v (enable
data validation)
- -i without -v flag displays the throughput per channel
2.5.1. Simulation Results



2.5.2. Hardware Test Results



2.6. Avalon-MM PIO using MCDMA Bypass mode
This design example enables Avalon-MM PIO master which bypasses the DMA path. The Avalon-MM PIO master allows application to perform single, non-bursting register read/write operation with on-chip memory.
- resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire Intel® Stratix® 10 FPGA fabric enters user mode
- MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master) port that is mapped to PCIe BAR2
- PIO test: -o
2.6.1. Simulation Results
Testbench writes 4 KB of incrementing pattern to on-chip memory and read back via Avalon-MM PIO interface. This design example testbench doesn’t simulate H2D/D2H data movers.


2.6.2. Hardware Test Results

2.7. Avalon-MM DMA
This design example performs H2D and D2H multi channel DMA via Avalon-MM memory-mapped interface. The Multi Channel DMA for PCI Express IP core provides one Avalon-MM Write/Read Master port. You can allocate up to eight DMA channels when generating this example design.
For H2D DMA, Multi Channel DMA H2D data mover writes the data to on-chip memory via Avalon-MM Write Master port. For D2H DMA, Multi Channel DMA D2H data mover reads the data from on-chip memory via Avalon-MM Read Master port.
In addition, the design example enables Avalon-MM PIO master which bypasses the DMA path. It allows application to perform single, non-bursting register read/write operation with on-chip memory block.
- resetIP – Reset Release IP that holds the Multi Channel DMA in reset until the entire Intel® Stratix® 10 FPGA fabric enters user mode
- MEM_PIO – On-chip memory for the PIO operation. Connected to the MCDMA Avalon-MM PIO Master (rx_pio_master) port that is mapped to PCIe BAR2
- MEM – Dual port on-chip memory. One port is connected to the Avalon-MM Write Master (h2ddm_master) and the other port to Avalon-MM Read Master (d2hdm_master)
- PIO test: -o
- DMA test: -t (Tx), -r (Rx)
2.7.1. Simulation Results
Testbench writes 4 KB of incrementing pattern to on-chip memory and read back via Avalon-MM PIO interface. In the current release, this design example testbench does not simulate data movement through Avalon-MM Write and Read Master ports.


2.7.2. Hardware Test Results




3. Design Example Quick Start Guide
Using Intel® Quartus® Prime software, you can generate a design example for the Multi Channel DMA for PCI Express* ( PCIe* ) IP core.
The generated design example reflects the parameters that you specify. The design example automatically creates the files necessary to simulate and compile in the Intel® Quartus® Prime software. You can download the compiled design to your FPGA Development Board. To download to custom hardware, update the Intel® Quartus® Prime Settings File (.qsf) with the correct pin assignments.
3.1. Design Example Directory Structure
Directory / File | Sub-directory / File | Sub-directory / File | Sub-directory / File | Sub-directory / File | Note |
---|---|---|---|---|---|
pcie_ed | sim | pcie_ed.v | Design example top-level HDL | ||
<simulators> | <simulation scripts> | pcie_ed simulation directory | |||
synth | pcie_ed.v | Design example top-level HDL | |||
<Components automatically generated by Platform Designer> |
|||||
pcie_ed_tb | pcie_ed_tb | sim | pcie_ed_tb.v | Testbench including Intel FPGA BFM | |
<simulators> | <simulation script> | Testbench simulation directory | |||
ip | pcie_ed_tb | DUT_pcie_tb_ip | Intel FPGA BFM (RP) | ||
pcie_ed_tb.qsys | Testbench Platform Designer file | ||||
pcie_ed.ipx | |||||
software | kernel | common | |||
driver | kmod | <kernel driver files> | Kernel driver | ||
Licenses | |||||
user | cli | perfq_app | <test application software> | Test Application | |
README | Readme file | ||||
sample | ref.c | Reference API flow | |||
common | include | regs | MCDMA and Pkt Gen/Chk registers | ||
mk | |||||
src | |||||
libmqdma | <user space library files> | User space library | |||
Licenses | |||||
Readme | Readme file | ||||
readme | Readme file | ||||
ip | pcie_ed | <Design example IP components> | |||
pcie_ed.qpf | Quartus project file | ||||
pcie_ed.qsf | Quartus setting file | ||||
pcie_ed.qsys | Design example Platform Designer file |
3.2. Generating the Example Design using Intel Quartus Prime
3.2.1. Procedure
- In the Intel® Quartus® Prime Pro Edition software, create a new project (File → New Project Wizard).
- Specify the Directory, Name, and Top-Level Entity.
- For Project Type, accept the default value, Empty project. Click Next.
- For Add Files click Next.
-
For Family, Device & Board Settings,
select Intel Stratix 10 (GX/SX/MX/TX) and
the Target Device for your design.
Note: The selected device is only used if you select None in Step 10f below.
- Click Finish.
- In the IP Catalog locate and add the Multi Channel DMA for PCI Express* which brings up the IP Parameter Editor.
- In the New IP Variant dialog box, specify a name for your IP. Click Create.
- On the IP Settings tabs, specify the parameters for your IP variation.
-
On the Example Designs tab, make the
following selections:
- For Currently Selected Example Design, select a design example from a pulldown menu.
- Available design examples depends on the Interface
type setting in MCDMA Settings under IP Settings tab.
Available design examples for Avalon-ST Interface type:
- PIO using DMA Bypass Mode
- Packet Generate/Check
- Device-side Packet Loopback
- PIO using DMA Bypass Mode
- Avalon-MM DMA
- For Example Design Files, turn on the Simulation and Synthesis options. If you do not need these simulation or synthesis files, leaving the corresponding option(s) turned off significantly reduces the example design generation time.
- For Select simulation Root
Complex BFM, choose the appropriate BFM:
- Intel FPGA BFM: This bus functional model (BFM) supports x16 configurations by down training to x8.
- Third-party BFM: If you want to simulate all 16 lanes, use a third-party BFM. If you have an Avery BFM installed and need information about simulating with the Avery BFM, contact your local FAE or sales representative.
- For Generated HDL Format, only Verilog is available in the current release.
- For Target Development
Kit, select the appropriate option. Note: If you select None, the generated design example targets the device specified. Otherwise, the design example uses the device on the selected development board. If you intend to test the design in hardware, make the appropriate pin assignments in the .qsf file.
- Select Generate Example Design to create a design example that you can simulate and download to hardware. If you select one of the Intel® Stratix® 10 development boards, the device on that board supersedes the device previously selected in the Intel® Quartus® Prime Pro Edition project if the devices are different. When the prompt asks you to specify the directory for your example design, you can choose to accept the default directory ./intel_pcie_mcdma_0_example_design or choose another directory.
- Click Close on Generate Example Design Completed message.
- Close the IP Parameter Editor. Click File → Exit. When prompted with Save changes?, you do not need to save the .ip. Click Don’t Save.
3.3. Simulating the Design Example
3.3.1. Testbench Overview

The design example, pcie_ed_inst, is generated with x16. The Intel FPGA BFM, DUT_pcie_tb, can support up to x8 link. The BFM supports the testbench simulation by down-training to x8 link. If you want to simulate x16 link, you can use a third-party BFM.
The testbench uses a Root Port driver module, altpcietb_bfm_rp_gen3_x8.sv ( Path: pcie_ed_tb/ip/pcie_ed_tb/DUT_pcie_tb_ip/altera_pcie_s10_tbed_191/sim ), to exercise the target memory and DMA channel in the Endpoint. This is the module that you can modify to vary the transactions sent to the example Endpoint design or your own design.
For more information about Intel FPGA BFM, refer to Intel Stratix 10 Avalon streaming and SR-IOV Interface for PCI Express Solutions User Guide (Section 9.3 Root Port BFM Overview).
3.3.2. Example Testbench Flow for DMA Test with Avalon-ST Packet Generate/Check Design Example
- Host-to-Device: Transferring packets stored in the host memory to the Packet Checker in the design example user logic, where a checker module verifies the integrity of the packet
- Device-to-Host: Packets generated from a Generator module are transferred to the host memory where the host checks the packet integrity
- Set up 4096 bytes of incrementing data pattern for testing data movement from the host to the device and then back to the host.
- Write the expected packet length value (4096 bytes) to the Packet Generation and Checker in the design example user logic through the PIO. This value is used by the Packet checker module for testing packet integrity.
- MSI-X is enabled and configured for launching a memory write to signal the end of each descriptor’s DMA transaction. Write-Back function is kept disabled for the simulation.
- Set up the H2D (Host-to-Device) queue in the Multi Channel DMA.
- Set up three H2D descriptors in the host memory, with the source address pointing to the incrementing data pattern locations in the host memory. The start of packet (SOF) and end of packet (EOF) markers along with packet length are indicated in the descriptors.
- At the last step of the Queue programming, the Multi Channel DMA tail pointer register is written, which triggers the Multi Channel DMA to start the H2D DMA transaction.
- The previous step instructs the H2D Data Mover to fetch the descriptors from the host memory.
- The Multi Channel DMA H2D Data Mover reads the data from the host memory and forwards the packet to the Packet Generator and Checker through the AVST Streaming interface.
- The checker module receives the packet and checks for integrity by testing the data pattern, length as expected and proper receipt of the “end of packet” marker. If the packet is found to be proper, the good packet count is incremented by 1 else the bad packet count is incremented.
- The testbench does a PIO read access of the Good Packet Count and Bad Packet Count registers and displays the test success or failure status.
- MSI-X write commands are triggered for every description or completion which are checked by the testbench for proper receipt.
- Next, set up the D2H (Device-to-Host) Queue.
- Setup three D2H descriptors in the host memory, with the destination address pointing to a new address space in host memory which is pre-filled with all zeroes.
- At the last step of the Queue programming, the Multi Channel DMA tail pointer register is written, which triggers the Multi Channel DMA to start the D2H DMA transaction.
- The previous step instructs the H2D Data Mover to fetch the descriptors from the host memory to start the D2H DMA transaction.
- The Multi Channel DMA D2H Data Mover reads the incoming packet from the Packet Generator and writes the data to the host memory according to the descriptors fetched in the previous step.
- MSI-X write commands are triggered for every description completion which are checked by the testbench for proper receipt.
- Compares the data written back to the system memory in D2H task with the standard incrementing pattern and declare test success/failure.
The simulation reports Simulation stopped due to successful completion if no errors occur.
3.3.3. Run the Simulation Script
- Change to the testbench simulation directory, pcie_ed_tb/pcie_ed_tb/sim/<simulators> .
- Run the simulation script for the simulator of your choice. Refer to the table below.
- Analyze the results.
Simulator | Simulation Directory | Instructions |
---|---|---|
ModelSim |
<example_design>/pcie_ed_tb/ pcie_ed _tb/sim/mentor/ |
|
VCS |
<example_design> /pcie_ed_tb/ pcie_ed _tb/sim/synopsys/vcs |
|
NCSim |
<example_design>/pcie_ed_tb/pcie_ed_tb/sim/cadence |
|
3.3.4. View the Results
To view the Simulation Logs, Simulation Waveforms and Hardware Test Results for each design example, refer to Design Example Detailed Descirption chapter of this document.
3.4. Compiling the Example Design in Intel Quartus Prime
- Navigate to the design example directory, intel_pcie_mcdma_0_example_design , and open the Intel® Quartus® Prime project file, pcie_ed.qpf in Intel® Quartus® Prime Pro Edition software.
- On the Processing menu, select Start Compilation.

3.5. Running the Design Example Application
3.5.1. Program the FPGA
- Connect a FPGA programming cable to the Intel® Stratix® 10 FPGA Development Board
- On the Tools menu, select Programmer
- In the Programmer, click Hardware Setup and verify the Intel® Stratix® 10 FPGA Development Board is detected in Hardware Setting tab and JTAG Settings tab
- Select Auto Detect to detect the JTAG device chain
- Select the target FPGA device in the JTAG chain, select Change File, and select the pcie_ed.sof
-
Select Start to start programming
Figure 40. Programming Stratix 10 MX FPGA Development Board
3.5.2. Set Up the Linux Software
3.5.2.1. Set the default huge pages
-
Modify the default huge pages setting in grub files as follows:
Edit /etc/default/grub file. Add the following in GRUB_CMDLINE_LINUX paramenter:
"default_hugepagesz=1G hugepagesz=1G hugepages=40"
After the edit, the file will look as below:GRUB_TIMEOUT=5 GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)" GRUB_DEFAULT=saved GRUB_DISABLE_SUBMENU=true GRUB_TERMINAL_OUTPUT="console" GRUB_CMDLINE_LINUX="default_hugepagesz=1G intel_iommu=on iommu=pt intel_pstate=disable hugepagesz=1G hugepages=40 panic=1 crashkernel=auto rd.lvm.lv=centos/root rd.lvm.lv=centos/swap rhgb quiet" GRUB_DISABLE_RECOVERY="true"
-
Generate GRUB configuration
files.
Check if /sys/firmware/efi exists. If it exists, the system is
EFI based. Otherwise, the system is a legacy system.
In case of legacy system execute following command
$ grub2-mkconfig -o /boot/grub2/grub.cfg
In case of EFI based system, execute following command$ grub2-mkconfig -o /boot/efi/EFI/centos/grub.cfg
- Reboot the system.
-
Verify the above changes
$ cat /proc/cmdline
The output should include the following:default_hugepagesz=1G hugepagesz=1G hugepages=40
-
Set the huge pages
$ echo 10 > /proc/sys/vm/nr_hugepages
3.5.2.2. Install the Linux Kernel Driver
-
Install the UIO driver
$ modprobe uio
-
Build the
Multi
Channel
DMA kernel driver and load
$ cd software/kernel $ make -C driver/kmod/
-
Install the kernel driver
$ insmod driver/kmod/ifc_uio.ko
-
Verify if the kernel driver is loaded
$ lspci -d 1172:000 -v | grep ifc_uio
Kernel driver in use: ifc_uio
3.5.2.3. Build and Install User Space Library
-
Build the library
$ cd software/user $ make -C libmqdma/
-
Load the library
For 64 bit system:
$ rm -f /usr/lib64/libmqdmasoc.so $ cp libmqdma/libmqdmasoc.so /usr/lib64/
For 32 bit system:$ rm -f /usr/lib/libmqdmasoc.so $ cp libmqdma/libmqdmasoc.so /usr/lib/
-
Verify that ldconfig output contains
libmqdma
$ ldconfig -v | grep libmqdmasoc.so
3.5.3. Run the Test Application Software
-
Build the perfq_app test
application
software
and check the available command line options using
-h.
$ cd cli/perfq_app/ $ make clean $ make $ ./perfq_app -h
Note: For more information on perfq_app command options, refer to the README file located in software/user/cli/perfq_app directory. -
Perform
the PIO test to check if the
hardware
setup is correct, if successful, the application will show as
Pass status as shown below:
- Perform the DMA test with design example. Refer to Hardware Test results for each design example in the Design Example Detailed Description chapter.
4. Revision History
Date | Intel® Quartus® Prime Version | IP Version | Changes |
---|---|---|---|
2020.08.05 | 20.2 | 20.0.0 | Initial Release |