Intel L- and H-tile Avalon Streaming and Single Root I/O Virtualization (SR-IOV) IP for PCI Express User Guide
Version Information
Updated for: |
---|
Intel® Quartus® Prime Design Suite 20.3 |
1. Introduction
This User Guide is applicable to the H-Tile and L-Tile variants of the Intel® Stratix® 10 devices.
1.1. Avalon-ST Interface with Optional SR-IOV for PCIe Introduction
Intel® Stratix® 10 FPGAs include a configurable, hardened protocol stack for PCI Express* that is compliant with PCI Express Base Specification 3.0. The Intel L-/H-Tile Avalon-ST for PCI Express IP Core supports Gen1, Gen2 and Gen3 data rates and x1, x2, x4, x8, or x16 configurations.
Link Width | |||||
---|---|---|---|---|---|
×1 | ×2 | ×4 | ×8 | ×16 | |
PCI Express Gen1 (2.5 Gbps) |
2 |
4 |
8 |
16 |
32 |
PCI Express Gen2 (5.0 Gbps) |
4 |
8 |
16 |
32 |
64 |
PCI Express Gen3 (8.0 Gbps) |
7.87 |
15.75 |
31.5 |
63 |
126 |
1.2. Features
The Intel L-/H-Tile Avalon-ST for PCI Express IP Core supports the following features:
- Complete protocol stack including the Transaction, Data Link, and Physical Layers implemented as hard IP.
- ×1, ×2, ×4, ×8 and ×16 configurations with Gen1, Gen2, or Gen3 lane
rates for Native
Endpoints and Root Ports.
Note: Root Port mode is not available when SR-IOV is enabled.
- Avalon® -ST 256-bit interface to the Application Layer except for Gen3 x16 variants.
- Avalon® -ST 512-bit interface at 250 MHz to the Application Layer for Gen3 x16 variants.
- Instantiation as a stand-alone IP core from the Intel® Quartus® Prime Pro Edition IP Catalog or as part of a system design in Platform Designer.
- Dynamic design example generation.
- Configuration via Protocol (CvP) providing separate images for configuration of the periphery and core logic.
- PHY interface for PCI Express (PIPE) or serial interface simulation using IEEE encrypted models.
- Testbench bus functional model (BFM) supporting x1, x2, x4, and x8 configurations. The x16 configuration downtrains to x8 for Intel (internally created) testbench.
- Support for a Gen3x16 simulation model that you can use in an Avery testbench. The Avery testbench is capable of simulating all 16 lanes. For more information, refer to AN-811: Using the Avery BFM for PCI Express Gen3x16 Simulation on Intel Stratix 10 Devices.
- Native PHY Debug Master Endpoint (NPDME). For more information, refer to Intel Stratix 10 L- and H-Tile Transceiver PHY User Guide.
- Autonomous Hard IP mode, allowing the PCIe IP core to begin operation
before the FPGA fabric is programmed. This mode is enabled by default. It cannot be
disabled.Note: Unless Readiness Notifications mechanisms are used (see Section 6.23 of the PCIe Base Specification), the Root Complex and/or system software must allow at least 1.0 s after a Conventional Reset of a device before it may determine that a device which fails to return a Successful Completion status for a valid Configuration Request is a broken device. This period is independent of how quickly Link training completes.
- Dedicated 69.5 kilobyte (KB) receive buffer.
- End-to-end cyclic redundancy check (ECRC).
- Advanced Error Reporting (AER) for PFs.
Note: In Intel® Stratix® 10 devices, Advanced Error Reporting is always enabled in the PCIe Hard IP for both the L- and H-Tile transceivers.
- Base address register (BAR) checking logic.
New features in the Intel® Quartus® Prime Pro Edition 18.0 Software Release
- SR-IOV support for H-Tile devices.
- Separate Configuration Spaces for up to four PCIe Physical Functions (PFs) and a maximum of 2048 Virtual Functions (VFs) for the PFs in H-Tile devices.
- Address Translation Services (ATS) and TLP Processing Hints (TPH) capabilities.
- Control Shadow Interface to read the current settings for some of the VF Control Register fields in the PCI and PCI Express Configuration Spaces.
- Function Level Reset (FLR) for PFs and VFs.
- Message Signaled Interrupts (MSI) for PFs.
- MSI-X for PFs and VFs.
- A
PCIe*
Link Inspector including the
following features:
- Read and write access to the Configuration Space registers.
- LTSSM monitoring.
- Read and write access to PCS and PMA registers.
- Hardware support for dynamically-generated design examples.
- A Linux software driver to test the dynamically-generated design examples.
1.3. Release Information
Item |
Description |
---|---|
Version |
Intel® Quartus® Prime Pro Edition 18.0 Software |
Release Date | May 2018 |
Ordering Codes |
No ordering code is required |
Intel verifies that the current version of the Intel® Quartus® Prime Pro Edition software compiles the previous version of each IP core, if this IP core was included in the previous release. Intel reports any exceptions to this verification in the Intel IP Release Notes or clarifies them in the Intel® Quartus® Prime Pro Edition IP Update tool. Intel does not verify compilation with IP core versions older than the previous release.
1.4. Device Family Support
The following terms define device support levels for Intel® FPGA IP cores:
- Advance support—the IP core is available for simulation and compilation for this device family. Timing models include initial engineering estimates of delays based on early post-layout information. The timing models are subject to change as silicon testing improves the correlation between the actual silicon and the timing models. You can use this IP core for system architecture and resource utilization studies, simulation, pinout, system latency assessments, basic timing assessments (pipeline budgeting), and I/O transfer strategy (data-path width, burst depth, I/O standards tradeoffs).
- Preliminary support—the IP core is verified with preliminary timing models for this device family. The IP core meets all functional requirements, but might still be undergoing timing analysis for the device family. It can be used in production designs with caution.
- Final support—the IP core is verified with final timing models for this device family. The IP core meets all functional and timing requirements for the device family and can be used in production designs.
Device Family |
Support Level |
---|---|
Intel® Stratix® 10 |
Preliminary support. |
Other device families |
No support. Refer to the Intel PCI Express Solutions web page on the Intel website for support information on other device families. |
1.5. Recommended Fabric Speed Grades
Lane Rate |
Link Width |
Interface Width |
Application Clock Frequency (MHz) |
Recommended Fabric Speed Grades |
---|---|---|---|---|
Gen1 |
x1, x2, x4, x8, x16 |
256 bits |
125 |
–1, –2, -3 |
Gen2 |
x1, x2, x4, x8 |
256 bits |
125 |
–1, –2, -3 |
Gen2 | x16 |
256 bits |
250 |
–1, –2 |
Gen3 |
x1, x2, x4 | 256 bits | 125 |
–1, –2, -3 |
x8 |
256 bits |
250 |
–1, –2 |
|
x16 |
512 bits |
250 |
–1, –2 |
1.6. Performance and Resource Utilization
The SR-IOV Bridge and Gen3 x16 adapter are implemented in soft logic, requiring FPGA fabric resources. Resource utilization numbers are not available in the current release.
1.7. Transceiver Tiles
Tile | Device Type | Channel Capability | Channel Hard IP Access | |
---|---|---|---|---|
Chip-to-Chip | Backplane | |||
L-Tile | GX | 26 Gbps (NRZ) | 12.5 Gbps (NRZ) | PCIe Gen3x16 |
H-Tile | GX | 28.3 Gbps (NRZ) | 28.3 Gbps (NRZ) | PCIe Gen3x16 |
E-Tile | GXE |
30 Gbps (NRZ), 56 Gbps (PAM-4) |
30 Gbps (NRZ), 56 Gbps (PAM-4) |
100G Ethernet |
L-Tile and H-Tile
Both L and H transceiver tiles contain four transceiver banks-with a total of 24 duplex channels, eight ATX PLLs, eight fPLLs, eight CMU PLLs, a PCIe Hard IP block, and associated input reference and transmitter clock networks. L and H transceiver tiles also include 10GBASE-KR/40GBASE-KR4 FEC block in each channel.
L-Tiles have transceiver channels that support up to 26 Gbps chip-to-chip or 12.5 Gbps backplane applications. H-Tiles have transceiver channels to support 28 Gbps applications. H-Tile channels support fast lock-time for Gigabit-capable passive optical network (GPON).
Intel® Stratix® 10 GX/SX devices incorporate L-Tiles or H-Tiles. Package migration is available with Intel® Stratix® 10 GX/SX from L-Tile to H-Tile variants.
E-Tile
E-Tiles are designed to support 56 Gbps with PAM-4 signaling or up to 30 Gbps backplane with NRZ signaling. E-Tiles do not include any PCIe* Hard IP blocks.
1.8. PCI Express IP Core Package Layout
Intel® Stratix® 10 devices have high-speed transceivers implemented on separate transceiver tiles. The transceiver tiles are on the left and right sides of the device.
Each 24-channel transceiver L- or H- tile includes one x16 PCIe IP Core implemented in hardened logic. The following figures show the layout of PCIe IP cores in Intel® Stratix® 10 devices. Both L- and H-tiles are orange. E-tiles are green.
- Intel® Stratix® 10 migration device contains 2 L-Tiles which match Intel® Arria® 10 migration device.
- Intel® Stratix® 10 TX Devices use a combination of E-Tiles and H-Tiles.
- Five E-Tiles support 57.8G PAM-4 and 28.9G NRZ backplanes.
- One H-Tile supports up to 28.3G backplanes and PCIe* up to Gen3 x16.
- Intel® Stratix® 10 TX Devices use a combination of E-Tiles and H-Tiles.
- Three E-Tiles support 57.8G PAM-4 and 28.9G NRZ backplanes.
- One H-Tile supports up to 28.3G backplanes PCIe* up to Gen3 x16..
- Intel® Stratix® 10 TX Devices use a combination of E-Tiles and H-Tiles.
- One E-Tile support 57.8G PAM-4 and 28.9G NRZ backplanes.
- Two H-Tiles supports up to 28.3G backplanes PCIe* up to Gen3 x16..
1.9. Channel Availability
PCIe Hard IP Channel Restrictions
Each L- or H-Tile transceiver tile contains one PCIe Hard IP block. The following table and figure show the possible PCIe Hard IP channel configurations, the number of unusable channels, and the number of channels available for other protocols. For example, a PCIe x4 variant uses 4 channels and 4 additional channels are unusable.
PCIe Hard IP Configuration | Number of Unusable Channels | Usable Channels |
---|---|---|
PCIe x1 | 7 | 16 |
PCIe x2 | 6 | 16 |
PCIe x4 | 4 | 16 |
PCIe x8 | 0 | 16 |
PCIe x16 | 0 | 8 |
The table below maps all transceiver channels to PCIe Hard IP channels in available tiles.
Tile Channel Sequence | PCIe Hard IP Channel | Index within I/O Bank | Bottom Left Tile Bank Number | Top Left Tile Bank Number | Bottom Right Tile Bank Number | Top Right Tile Bank Number |
---|---|---|---|---|---|---|
23 | N/A | 5 | 1F | 1N | 4F | 4N |
22 | N/A | 4 | 1F | 1N | 4F | 4N |
21 | N/A | 3 | 1F | 1N | 4F | 4N |
20 | N/A | 2 | 1F | 1N | 4F | 4N |
19 | N/A | 1 | 1F | 1N | 4F | 4N |
18 | N/A | 0 | 1F | 1N | 4F | 4N |
17 | N/A | 5 | 1E | 1M | 4E | 4M |
16 | N/A | 4 | 1E | 1M | 4E | 4M |
15 | 15 | 3 | 1E | 1M | 4E | 4M |
14 | 14 | 2 | 1E | 1M | 4E | 4M |
13 | 13 | 1 | 1E | 1M | 4E | 4M |
12 | 12 | 0 | 1E | 1M | 4E | 4M |
11 | 11 | 5 | 1D | 1L | 4D | 4L |
10 | 10 | 4 | 1D | 1L | 4D | 4L |
9 | 9 | 3 | 1D | 1L | 4D | 4L |
8 | 8 | 2 | 1D | 1L | 4D | 4L |
7 | 7 | 1 | 1D | 1L | 4D | 4L |
6 | 6 | 0 | 1D | 1L | 4D | 4L |
5 | 5 | 5 | 1C | 1K | 4C | 4K |
4 | 4 | 4 | 1C | 1K | 4C | 4K |
3 | 3 | 3 | 1C | 1K | 4C | 4K |
2 | 2 | 2 | 1C | 1K | 4C | 4K |
1 | 1 | 1 | 1C | 1K | 4C | 4K |
0 | 0 | 0 | 1C | 1K | 4C | 4K |
PCIe Soft IP Channel Usage
PCI Express soft IP PIPE-PHY cores available from third-party vendors are not subject to the channel usage restrictions described above. Refer to Intel FPGA > Products > Intellectual Property for more information about soft IP cores for PCI Express.
2. Quick Start Guide
Using Intel® Quartus® Prime software, you can generate a programmed I/O (PIO) design example for the Intel L-/H-Tile Avalon-ST for PCI Express IP core. The generated design example reflects the parameters that you specify. The PIO example transfers data from a host processor to a target device. It is appropriate for low-bandwidth applications. This design example automatically creates the files necessary to simulate and compile in the Intel® Quartus® Prime software. You can download the compiled design to the Intel® Stratix® 10-GX FPGA Development Board. To download to custom hardware, update the Intel® Quartus® Prime Settings File (.qsf) with the correct pin assignments .
2.1. Design Components
2.2. Directory Structure
2.3. Generating the Design Example
- In the Intel® Quartus® Prime Pro Edition software, create a new project (File > New Project Wizard).
- Specify the Directory, Name, and Top-Level Entity.
- For Project Type, accept the default value, Empty project. Click Next.
- For Add Files click Next.
- For Family, Device & Board Settings under Family, select Intel® Stratix® 10 (GX/SX/MX/TX) and the Target Device for your design.
- Click Finish.
- In the IP Catalog, locate and add the Intel L-/H-Tile Avalon-ST for PCI Express IP.
- In the New IP Variant dialog box, specify a name for your IP. Click Create.
- On the IP Settings tabs, specify the parameters for your IP variation.
-
On the Example Designs
tab, make the following selections:
- For Available Example Designs, select PIO. This example design is for Endpoints only. No Root Port example design is available for the Intel® Stratix® 10 Avalon® Streaming (Avalon-ST) IP for PCIe in the current Intel® Quartus® Prime release.
- For Example Design Files, turn on the Simulation and Synthesis options. If you do not need these simulation or synthesis files, leaving the corresponding option(s) turned off significantly reduces the example design generation time.
-
If you have selected a x16 configuration, for
Select simulation Root
Complex
BFM, choose the appropriate BFM:
- Intel FPGA BFM: for all configurations up to Gen3 x8. This bus functional model (BFM) supports x16 configurations by downtraining to x8.
- Third-party BFM: for x16 configurations if you want to simulate all 16 lanes using a third-party BFM. Refer to AN-811: Using the Avery BFM for PCI Express Gen3x16 Simulation on Intel Stratix 10 Devices for information about simulating with the Avery BFM.
- For Generated HDL Format, only Verilog is available in the current release.
-
For Target Development
Kit, select the appropriate option.
Note: If you select None, the generated design example targets the device you specified in Step 5 above. If you intend to test the design in hardware, make the appropriate pin assignments in the .qsf file. You can also use the pin planner tool to make pin assignments.
- Click Finish. You may save your .ip file when prompted, but it is not required to be able to use the example design.
- The prompt, Recent changes have not been generated. Generate now?, allows you to create files for simulation and synthesis of the IP core variation that you specified in Step 9 above. Click No if you only want to work with the design example you have generated.
- Close the dummy project.
- Open the example design project.
- Compile the example design project to generate the .sof file for the complete example design. This file is what you download to a board to perform hardware verification.
- Close your example design project.
2.4. Simulating the Design Example
- Change to the testbench simulation directory, pcie_example_design_tb.
- Run the simulation script for the simulator of your choice. Refer to the table below.
- Analyze the results.
Simulator | Working Directory | Instructions |
---|---|---|
ModelSim* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/mentor/ |
|
VCS* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/synopsys/vcs |
|
NCSim* | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/cadence |
|
Xcelium* Parallel Simulator | <example_design>/pcie_example_design_tb/pcie_example_design_tb/sim/xcelium |
|
This testbench simulates up to x8 variants. It supports x16 variants by down-training to x8. To simulate all lanes of a x16 variant, you can create a simulation model using the Platform Designer to use in an Avery testbench. For more information refer to AN-811: Using the Avery BFM for PCI Express* Gen3x16 Simulation on Intel Stratix 10 Devices.
The simulation reports, "Simulation stopped due to successful completion" if no errors occur.
2.5. Compiling the Design Example and Programming the Device
- Navigate to <project_dir>/pcie_s10_hip_avmm_bridge_0_example_design/ and open pcie_example_design.qpf.
- On the Processing menu, select Start Compilation.
- After successfully compiling your design, program the targeted device with the Programmer.
2.6. Installing the Linux Kernel Driver
- A PCIe* link test that performs 100 writes and reads
- Memory space DWORD1 reads and writes
- Configuration Space DWORD reads and writes
In addition, you can use the driver to change the value of the following parameters:
- The BAR being used
- The selects device by specifying the bus, device and function (BDF) numbers for the required device
The driver also allows you to enable SR-IOV for H-Tile devices.
Complete the following steps to install the kernel driver:
- Navigate to ./software/kernel/linux under the example design generation directory.
-
Change the permissions on the install, load, and unload
files:
$ chmod 777 install load unload
-
Install the driver:
$ sudo ./install
-
Verify the driver installation:
$ lsmod | grep intel_fpga_pcie_drvExpected result:
intel_fpga_pcie_drv 17792 0
-
Verify that Linux recognizes the
PCIe*
design example:
$ lspci -d 1172:000 -v | grep intel_fpga_pcie_drvNote: If you have changed the Vendor ID, substitute the new Vendor ID for Intel® 'sVendor ID in this command.Expected result:
Kernel driver in use: intel_fpga_pcie_drv
2.7. Running the Design Example Application
- Navigate to ./software/user/example under the design example directory.
-
Compile the design example application:
$ make
-
Run the test:
$ sudo ./intel_fpga_pcie_link_test
You can run the Intel® FPGA IP PCIe* link test in manual or automatic mode.
- In automatic mode, the application automatically selects the device. The test selects the Intel® Stratix® 10 PCIe* device with the lowest BDF by matching the Vendor ID. The test also selects the lowest available BAR.
- In manual mode, the test queries you for the bus, device, and function number and BAR.
For the Intel® Stratix® 10 GX Development Kit, you can determine the BDF by typing the following command:$ lspci -d 1172
-
Here are sample transcripts for automatic and manual
modes:
Intel FPGA PCIe Link Test - Automatic Mode Version 2.0 0: Automatically select a device 1: Manually select a device *************************************************** >0 Opened a handle to BAR 0 of a device with BDF 0x100 *************************************************** 0: Link test - 100 writes and reads 1: Write memory space 2: Read memory space 3: Write configuration space 4: Read configuration space 5: Change BAR 6: Change device 7: Enable SR-IOV 8: Do a link test for every enabled virtual function belonging to the current device 9: Perform DMA 10: Quit program *************************************************** > 0 Doing 100 writes and 100 reads . . Number of write errors: 0 Number of read errors: 0 Number of DWORD mismatches: 0
Intel FPGA PCIe Link Test - Manual Mode Version 1.0 0: Automatically select a device 1: Manually select a device *************************************************** > 1 Enter bus number: > 1 Enter device number: > 0 Enter function number: > 0 BDF is 0x100 Enter BAR number (-1 for none): > 4 Opened a handle to BAR 4 of a device with BDF 0x100
3. Interface Overview
The PCI Express Base Specification 3.0 defines a packet interface for communication between a Root Port and an Endpoint. When you select the Avalon® -ST interface, Transaction Layer Packets (TLP) transfer data between the Root Port and an Endpoint using the Avalon-ST TX and RX interfaces. The interfaces are named from the point-of-view of the user logic.
The following figures show the PCIe hard IP Core top-level interfaces and the connections to the Application Layer and system.
The following sections introduce these interfaces. Refer to the Interfaces section in the Block Description chapter for detailed descriptions and timing diagrams.
3.1. Avalon-ST RX Interface
The Transaction Layer transfers TLPs to the Application on this interface. The Application must assert rx_st_ready before transfers can begin.
This interface is not strictly Avalon® -ST compliant, and does not have a well-defined ready_latency. For all variants other than the Gen3 x16 variant, the latency between the assertion or de-assertion of rx_st_ready and the corresponding de-assertion or assertion of rx_st_valid can be up to 17 cycles. Once rx_st_ready deasserts, rx_st_valid deasserts within 17 cycles. Once rx_st_ready reasserts, rx_st_valid resumes data transfer within 17 cycles. To achieve the best performance the Application must include a receive buffer large enough to avoid the deassertion of rx_st_ready. Refer to Avalon-ST RX Interface for more information.
For the Gen3 x16 variant, once rx_st_ready deasserts, rx_st_valid deasserts within 18 cycles. Once rx_st_ready reasserts, rx_st_valid resumes data transfer within 18 cycles.
3.2. Avalon-ST TX Interface
The Application transmits TLPs to the Transaction Layer of the IP core on this interface. The Transaction Layer must assert tx_st_ready before transmission begins. Transmission of a packet must be uninterrupted when tx_st_ready is asserted. The readyLatency of this interface is three coreclkout_hip cycles. For more detailed information about the Avalon-ST interface, refer to Avalon-ST TX Interface. The packet layout is shown in detail in the Block Description chapter.
3.3. TX Credit Interface
The Transaction Layer TX interface transmits TLPs in the same order as they were received from the Application. To optimize performance, the Application can perform credit-based checking before submitting requests for transmission, allowing the Application to reorder packets to improve throughput. Application reordering is optional. The Transaction Layer always performs a credit check before transmitting any TLP.
3.4. TX and RX Serial Data
This differential, serial interface is the physical link between a Root Port and an Endpoint. The PCIe IP Core supports 1, 2, 4, 8, or 16 lanes. Each lane includes a TX and RX differential pair. Data is striped across all available lanes.
3.5. Clocks
Data Rate | Interface Width | coreclkout_hip Frequency |
---|---|---|
Gen1 x1, x2, x4, x8, and x16 | 256 bits | 125 MHz |
Gen2 x1, x2, x4, and x8 | 256 bits | 125 MHz |
Gen2 x16 | 256 bits | 250 MHz |
Gen3 x1, x2, and x4 | 256 bits | 125 MHz |
Gen3 x8 | 256 bits | 250 MHz |
Gen3 x16 | 512 bits | 250 MHz |
3.6. Function-Level Reset (FLR) Interface
3.7. Control Shadow Interface for SR-IOV
Use the interface for the following reasons:
-
To monitor specific VF registers using the ctl_shdw_update output and the associated output signals.
- To monitor all VF registers using the the ctl_shdw_req_all input to request a full scan of the register fields for all active VFs.
3.8. Configuration Extension Bus Interface
3.9. Hard IP Reconfiguration Interface
The PCI Express link cannot be reset after changing the values of the read-only configuration registers of the Hard IP because the registers will be restored to their original values after reset.
The Hard IP Reconfiguration interface is not accessible when the IP is in reset (i.e, when reset_status = 1).
If the PCIe Link Inspector is enabled, accesses via the Hard IP Reconfiguration interface are not supported. The Link Inspector exclusively uses the Hard IP Reconfiguration interface, and there is no arbitration between the Link Inspector and the Hard IP Reconfiguration interface that is exported to the top level of the IP.
3.10. Interrupt Interfaces
The PCIe IP core support Message Signaled Interrupts (MSI), MSI-X interrupts, and Legacy interrupts. MSI and legacy interrupts are mutually exclusive.
MSI uses the TLP single DWORD memory writes to implement interrupts. This interrupt mechanism conserves pins because it does not use separate wires for interrupts. In addition, the single DWORD provides flexibility for the data presented in the interrupt message. The MSI Capability structure is stored in the Configuration Space and is programmed using Configuration Space accesses.
The Application generates MSI-X messages which are single DWORD memory writes. The MSI-X Capability structure points to an MSI-X table structure and MSI-X PBA structure which are stored in memory. This scheme is different than the MSI capability structure, which contains all the control and status information for the interrupts.
Enable Legacy interrupts by programming the Interrupt Disable bit (bit[10]) of the Configuration Space Command to 1'b0. When legacy interrupts are enabled, the IP core emulates INTx interrupts using virtual wires. The app_int_sts port controls legacy interrupt generation.
3.11. Power Management Interface
3.12. Reset
This interface indicates when the clocks are stable and FPGA configuration is complete.
The PCIe IP core receives the following inputs that can be used for the reset purpose:- pin_perst is the active low reset driven from the PCIe motherboard. Logic on the motherboard autonomously generates this fundamental reset signal.
- npor is an active low reset signal. The Application drives this reset signal.
- ninit_done is an active low input signal. A "1" indicates that the FPGA device is not yet fully configured. A "0" indicates the device has been configured and is in normal operating mode. To use the ninit_done input, instantiate the Reset Release Intel FPGA IP in your design and use its ninit_done output to drive the input of the Avalon® streaming IP for PCIe. For more details on how to use this input, refer to https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an891.pdf.
The PCIe IP core reset logic requires a free-running clock input. This free-running clock becomes stable after the secure device manager (SDM) block asserts iocsrrdy_dly indicating that the I/O Control and Status registers programming is complete.
3.13. Transaction Layer Configuration Interface
3.14. PLL Reconfiguration Interface
This interface is available when you turn on Enable Transceiver dynamic reconfiguration on the Configuration, Debug and Extension Options tab using the parameter editor.
To ensure proper system operation, reset or repeat device enumeration of the PCIe* link after changing the value of read-only PLL registers.
3.15. PIPE Interface (Simulation Only)
4. Parameters
Parameter |
Value |
Description |
---|---|---|
Design Environment |
Standalone
System |
Identifies the environment that the IP is in.
|
Parameter |
Value |
Description |
---|---|---|
Application Interface Type |
Avalon-ST
|
Selects the interface to the Application Layer. |
Hard IP Mode |
Gen3x16, 512-bit interface, 250 MHz Gen3x8, 256-bit interface, 250 MHz Gen3x4, 256-bit interface, 125 MHz Gen3x2, 256-bit interface, 125 MHz Gen3x1, 256-bit interface, 125 MHz Gen2x16, 256-bit interface, 250 MHz Gen2x8, 256-bit interface, 125 MHz Gen2x4, 256-bit interface, 125 MHz Gen2x2, 256-bit interface, 125 MHz Gen2x1, 256-bit interface, 125 MHz Gen1x16, 256-bit interface, 125 MHz Gen1x8, 256-bit interface, 125 MHz Gen1x4, 256-bit interface, 125 MHz Gen1x2, 256-bit interface, 125 MHz Gen1x1, 256-bit interface, 125 MHz |
Selects the following elements:
The width of the data interface between the hard IP Transaction Layer and the Application Layer implemented in the FPGA fabric. Note: If the Mode
selected is not available for the configuration chosen, an error
message displays in the Message pane.
|
Port type |
Native Endpoint Root Port |
Specifies the port type. The Endpoint stores parameters in the Type 0 Configuration Space. The Root Port stores parameters in the Type 1 Configuration Space. |
4.1. Stratix 10 Avalon-ST Settings
Parameter |
Value |
Description |
---|---|---|
Enable Avalon-ST reset output port | On/Off |
When On, the generated reset output port clr_st has the same functionality as the hip_ready_n port included in the Hard IP Reset interface. This option is available for backwards compatibility with Arria® 10 devices. |
Enable byte parity ports on Avalon-ST interface |
On/Off |
When On, the RX and TX datapaths are parity protected. Parity is even. The Application Layer must provide valid byte parity in the Avalon-ST TX direction. This parameter is only available for the Intel L-/H-Tile Avalon-ST for PCI Express IP. |
4.2. Multifunction and SR-IOV System Settings
Parameter |
Value |
Description |
---|---|---|
Total Physical Functions (PFs) : |
1 - 4 |
Supports 1 - 4 PFs (in H-Tile devices). |
Enable SR-IOV Support | On/Off |
When On, the variant supports multiple VFs. When Off, supports PFs only. SR-IOV is only available in H-Tile devices. |
Total Virtual Functions Assigned to Physical Functions: |
1 - 2048 |
Total number of VFs assigned to a PF. The sum of VFs assigned to PF0, PF1, PF2, and PF3 cannot exceed the 2048 VFs total. |
System Supported Page Size: | 4 KB-4 MB |
Specifies the page sizes supported. Sets the Supported Page Sizes register of the SR-IOV Capability structure. Intel recommends that you accept the default value. |
4.3. Base Address Registers
Parameter |
Value |
Description |
---|---|---|
Type |
Disabled 64-bit prefetchable memory 32-bit non-prefetchable memory |
If you select 64-bit prefetchable memory, 2 contiguous BARs are combined to form a 64-bit prefetchable BAR; you must set the higher numbered BAR to Disabled. A non-prefetchable 64‑bit BAR is not supported because in a typical system, the maximum non-prefetchable memory window is 32 bits. Defining memory as prefetchable allows contiguous data to be fetched ahead. Prefetching memory is advantageous when the requestor may require more data from the same region than was originally requested. If you specify that a memory is prefetchable, it must have the following 2 attributes:
Note: BAR0 is not available if the internal
descriptor controller is enabled.
|
Size |
256 Bytes – 8 EBytes |
Specifies the size of the address space accessible to the BAR. |
Expansion ROM |
Disabled 4 KBytes - 16 MBytes |
Specifies the size of the option ROM. |
4.4. Device Identification Registers
The following table lists the default values of the read-only registers in the PCI* Configuration Header Space. You can use the parameter editor to set the values of these registers. At run time, you can change the values of these registers using the optional Hard IP Reconfiguration block signals.
To access these registers using the Hard IP Reconfiguration interface, make sure that you follow the format of the hip_reconfig_address[20:0] as specified in the table Hard IP Reconfiguration Signals of the section Hard IP Reconfiguration. Use the address offsets specified in the table below for hip_reconfig_address[11:0] and set hip_reconfig_address[20] to 1'b1 for a PCIe space access.
You can specify Device ID registers for each Physical Function.
Register Name |
Default Value |
Description |
---|---|---|
Vendor ID |
0x00001172 |
Sets the read-only value of the Vendor ID register. This parameter cannot be set to 0xFFFF per the PCI Express Base Specification. Address offset: 0x000. |
Device ID |
0x00000000 |
Sets the read-only value of the Device ID register. Address offset: 0x000. |
VF Device ID | 0x00000000 |
Sets the read-only value of the Device ID register. |
Revision ID |
0x00000001 |
Sets the read-only value of the Revision ID register. Address offset: 0x008. |
Class code |
0x00000000 |
Sets the read-only value of the Class Code register. You must set this register to a non-zero value to ensure correct operation. Address offset: 0x008. |
Subsystem Vendor ID |
0x00000000 |
Sets the read-only value of Subsystem Vendor ID register in the PCI Type 0 Configuration Space. This parameter cannot be set to 0xFFFF per the PCI Express Base Specification. This value is assigned by PCI-SIG to the device manufacturer. This value is only used in Root Port variants. Address offset: 0x02C. |
Subsystem Device ID |
0x00000000 |
Sets the read-only value of the Subsystem Device ID register in the PCI Type 0 Configuration Space. This value is only used in Root Port variants. Address offset: 0x02C |
4.5. TPH/ATS Capabilities
TLP Processing Hints (TPH) Overview
TPH support PFs that target a TLP towards a specific processing resource such as a host processor or cache hierarchy. Steering Tags (ST) provide design-specific information about the host or cache structure.
Software programs the Steering Tag values that are stored in an ST table. You can store the ST Table in the MSI-X Table or a custom location. For more information about Steering Tags, refer to Section 6.17.2 Steering Tags of the PCI Express Base Specification, Rev. 3.0. After analyzing the traffic of your system, you may be able to use TPH hints to improve latency or reduce traffic congestion.
Address Translation Services (ATS) Overview
ATS extends the PCIe protocol to support an address translation agent (TA) that translates DMA addresses to cached addresses in the device. The translation agent can be located in or above the Root Port. Locating translated addresses in the device minimizes latency and provides a scalable, distributed caching system that improves I/O performance. The Address Translation Cache (ATC) located in the device reduces the processing load on the translation agent, enhancing system performance. For more information about ATS, refer to Address Translation Services Revision 1.1
Parameter |
Value |
Description |
---|---|---|
Enable Address Translation Services | On/Off |
When On, the PF supports ATS. |
Enable TLP Processing Hints (TPH) |
On/Off |
When On, the PF supports TPH. |
Interrupt Mode |
On/Off |
When On, an MSI-X interrupt vector number selects the steering tag. |
Device Specific Mode |
On/Off |
When On, the TPH Requestor Capability structure stores the steering tag table. |
Steering Tag Table Location |
ST table not present MSI-X table |
When On, the MSI-X table stores the steering tag table . |
Steering Tag Table size | 0-2047 | Specifies the number of 2-byte steering table entries. |
Parameter |
Value |
Description |
---|---|---|
Enable Address Translation Services | On/Off |
When On, the PF supports ATS. |
Enable TLP Processing Hints (TPH) |
On/Off |
When On, the PF supports TPH. |
Interrupt Mode |
On/Off |
When On, an MSI-X interrupt vector number selects the steering tag. |
Device Specific Mode |
On/Off |
When On, the TPH Requestor Capability structure stores the steering tag table. |
Steering Tag Table Location |
ST table not present MSI-X table |
Selects the location of the steering tag table in Device Specific Mode is On. |
Steering Tag Table size | 0-2047 | Specifies the number of 2-byte steering table entries. |
4.6. PCI Express and PCI Capabilities Parameters
4.6.1. Device Capabilities
Parameter |
Possible Values |
Default Value |
Address |
Description |
---|---|---|---|---|
Maximum payload sizes supported |
128 bytes 256 bytes 512 bytes 1024 bytes |
512 bytes |
0x074 |
Specifies the maximum payload size supported. This parameter sets the read-only value of the max payload size supported field of the Device Capabilities register. |
PF0 Support extended tag field |
On Off |
Off |
When you turn this option On, the core supports 256 tags, improving the performance of high latency systems. Turning this option on turns on the Extended Tag bit in the Configuration Space Device Capabilities register. The IP core tracks tags for Non-Posted Requests. The tracking clears when the IP core receives the last Completion TLP for a MemRd. |
4.6.2. Link Capabilities
Parameter |
Value |
Description |
---|---|---|
Link port number (Root Port only) |
0x01 |
Sets the read-only value of the port number field in the Link Capabilities register. This parameter is for Root Ports only. It should not be changed. |
Slot clock configuration |
On/Off |
When you turn this option On, indicates that the Endpoint uses the same physical reference clock that the system provides on the connector. When Off, the IP core uses an independent clock regardless of the presence of a reference clock on the connector. This parameter sets the Slot Clock Configuration bit (bit 12) in the PCI Express Link Status register. |
4.6.3. MSI and MSI-X Capabilities
Parameter |
Value |
Address |
Description |
---|---|---|---|
MSI messages requested |
1, 2, 4, 8, 16, 32 |
0x050[31:16] |
Specifies the number of messages the Application Layer can request. Sets the value of the Multiple Message Capable field of the Message Control register. Only PFs support MSI. When you enabled SR-IOV, PFs must use MSI-X. |
MSI-X Capabilities | |||
Implement MSI-X |
On/Off |
When On, adds the MSI-X capability structure, with the parameters shown below. When you enable SR-IOV, you must enable MSI-X. |
|
Bit Range | |||
Table size |
[10:0] |
0x068[26:16] |
System software reads this field to determine the MSI-X Table size <n>, which is encoded as <n–1>. For example, a returned value of 2047 indicates a table size of 2048. This field is read-only in the MSI-X Capability Structure. Legal range is 0–2047 (211). VF’s share a common Table Size. VF Table BIR/Offset, and PBA BIR/Offset are fixed at compile time. BAR4 accesses these tables. The table Offset field = 0x600. The PBA Offset field =0x400 for SRIOV. You must implement an MSI-X table. If you do not intend to use MSI-X, you may program the table size to 1. |
Table offset |
[31:0] |
Points to the base of the MSI-X Table. The lower 3 bits of the table BAR indicator (BIR) are set to zero by software to form a 64-bit qword-aligned offset. This field is read-only. |
|
Table BAR indicator |
[2:0] |
Specifies which one of a function’s BARs, located beginning at 0x10 in Configuration Space, is used to map the MSI-X table into memory space. This field is read-only. Legal range is 0–5. |
|
Pending bit array (PBA) offset |
[31:0] |
Used as an offset from the address contained in one of the function’s Base Address registers to point to the base of the MSI-X PBA. The lower 3 bits of the PBA BIR are set to zero by software to form a 32-bit qword-aligned offset. This field is read-only in the MSI-X Capability Structure. 2 |
|
Pending BAR indicator |
[2:0] |
Specifies the function Base Address registers, located beginning at 0x10 in Configuration Space, that maps the MSI-X PBA into memory space. This field is read-only in the MSI-X Capability Structure. Legal range is 0–5. |
4.6.4. Slot Capabilities
Parameter |
Value |
Description |
---|---|---|
Use Slot register |
On/Off |
This parameter is only supported in Root Port mode. The slot capability is required for Root Ports if a slot is implemented on the port. Slot status is recorded in the PCI Express Capabilities register. Defines the characteristics of the slot. You turn on this option by selecting Enable slot capability. Refer to the figure below for bit definitions. |
Slot power scale |
0–3 |
Specifies the scale used for the Slot power limit. The following coefficients are defined:
The default value prior to hardware and firmware initialization is b’00. Writes to this register also cause the port to send the Set_Slot_Power_Limit Message. Refer to Section 6.9 of the PCI Express Base Specification Revision for more information. |
Slot power limit |
0–255 |
In combination with the Slot power scale value, specifies the upper limit in watts on power supplied by the slot. Refer to Section 7.8.9 of the PCI Express Base Specification for more information. |
Slot number |
0-8191 |
Specifies the slot number. |
4.6.5. Power Management
Parameter |
Value |
Description |
---|---|---|
Endpoint L0s acceptable latency |
Maximum of 64 ns Maximum of 128 ns Maximum of 256 ns Maximum of 512 ns Maximum of 1 us Maximum of 2 us Maximum of 4 us No limit |
This design parameter specifies the maximum acceptable latency that the device can tolerate to exit the L0s state for any links between the device and the root complex. It sets the read-only value of the Endpoint L0s acceptable latency field of the Device Capabilities Register (0x084). This Endpoint does not support the L0s or L1 states. However, in a switched system there may be links connected to switches that have L0s and L1 enabled. This parameter is set to allow system configuration software to read the acceptable latencies for all devices in the system and the exit latencies for each link to determine which links can enable Active State Power Management (ASPM). This setting is disabled for Root Ports. The default value of this parameter is 64 ns. This is a safe setting for most designs. |
Endpoint L1 acceptable latency |
Maximum of 1 us Maximum of 2 us Maximum of 4 us Maximum of 8 us Maximum of 16 us Maximum of 32 us Maximum of 64 nsNo limit |
This value indicates the acceptable latency that an Endpoint can withstand in the transition from the L1 to L0 state. It is an indirect measure of the Endpoint’s internal buffering. It sets the read-only value of the Endpoint L1 acceptable latency field of the Device Capabilities Register. This Endpoint does not support the L0s or L1 states. However, a switched system may include links connected to switches that have L0s and L1 enabled. This parameter is set to allow system configuration software to read the acceptable latencies for all devices in the system and the exit latencies for each link to determine which links can enable Active State Power Management (ASPM). This setting is disabled for Root Ports. The default value of this parameter is 1 µs. This is a safe setting for most designs. |
The Intel L-/H-Tile Avalon-ST for PCI Express and Intel L-/H-Tile Avalon-MM for PCI Express IP cores do not support the L1 or L2 low power states. If the link ever gets into these states, performing a reset (by asserting pin_perst, for example) allows the IP core to exit the low power state and the system to recover.
These IP cores also do not support the in-band beacon or sideband WAKE# signal, which are mechanisms to signal a wake-up event to the upstream device.
4.6.6. Vendor Specific Extended Capability (VSEC)
Parameter |
Value |
Description |
---|---|---|
User ID register from the Vendor Specific Extended Capability |
Custom value |
Sets the read-only value of the 16-bit User ID register from the Vendor Specific Extended Capability. This parameter is only valid for Endpoints. |
4.7. Configuration, Debug and Extension Options
Parameter |
Value |
Description |
---|---|---|
Enable Hard IP dynamic reconfiguration of PCIe read-only registers |
On/Off |
When On, you can use the Hard IP reconfiguration bus to dynamically reconfigure Hard IP read-only registers. For more information refer to Hard IP Reconfiguration Interface. With this parameter set to On, the hip_reconfig_clk port is visible on the block symbol of the Avalon® -MM Hard IP component. In the System Contents window, connect a clock source to this hip_reconfig_clk port. For example, you can export hip_reconfig_clk and drive it with a free-running clock on the board whose frequency is in the range of 100 to 125 MHz. Alternatively, if your design includes a clock bridge driven by such a free-running clock, the out_clk of the clock bridge can be used to drive hip_reconfig_clk. |
Enable transceiver dynamic reconfiguration |
On/Off |
When On, provides an
Avalon®
-MM interface that
software can drive to change the values of transceiver
registers. With this parameter set to On, the xcvr_reconfig_clk, reconfig_pll0_clk, and reconfig_pll1_clk ports are visible on the block symbol of the Avalon® -MM Hard IP component. In the System Contents window, connect a clock source to these ports. For example, you can export these ports and drive them with a free-running clock on the board whose frequency is in the range of 100 to 125 MHz. Alternatively, if your design includes a clock bridge driven by such a free-running clock, the out_clk of the clock bridge can be used to drive these ports. |
Enable Native PHY, LCPLL, and fPLL ADME for Toolkit | On/Off | When On, the generated IP includes an embedded Native PHY Debug Master Endpoint (NPDME) that connects internally to an Avalon® -MM slave interface for dynamic reconfiguration. The NPDME can access the transceiver reconfiguration space. It can perform certain test and debug functions via JTAG using the System Console. |
Enable PCIe* Link Inspector |
On/Off |
When On, the PCIe* Link Inspector is enabled. Use this interface to monitor the PCIe* link at the Physical, Data Link and Transaction layers. You can also use the Link Inspector to reconfigure some transceiver registers. You must turn on Enable transceiver dynamic reconfiguration, Enable dynamic reconfiguration of PCIe read-only registers and Enable Native PHY, LCPLL, and fPLL ADME for Toolkit to use this feature. For more information about using the PCIe* Link Inspector refer to Link Inspector Hardware in the Troubleshooting and Observing Link Status appendix. |
Enable PCIe* Link Inspector AVMM Interface |
On/Off |
When On, the PCIe Link Inspector Avalon® -MM interface is exported. In addition, the JTAG to Avalon® Bridge IP instantiation is included in the Design Example generation for debug. |
4.8. PHY Characteristics
Parameter |
Value |
Description |
---|---|---|
Gen2 TX de-emphasis |
3.5dB 6dB |
Specifies the transmit de-emphasis for Gen2. Intel recommends the following settings:
|
VCCR/VCCT supply voltage for the transceiver |
1_1V 1_0V |
Allows you to report the voltage supplied by the board for the transceivers. |
4.9. Example Designs
Parameter |
Value |
Description |
---|---|---|
Available Example Designs |
PIO |
The DMA example design uses the Write Data Mover, Read Data Mover, and a custom Descriptor Controller. When you select the PIO option, the generated design includes a target application including only downstream transactions. The PIO design example is the only option for the Avalon® -ST interface. |
Simulation | On/Off | When On, the generated output includes a simulation model. |
Synthesis | On/Off | When On, the generated output includes a synthesis model. |
Generated HDL format |
Verilog/VHDL |
Only Verilog HDL is available in the current release. |
Target Development Kit |
None Intel® Stratix® 10 H-Tile ES1 Development Kit Intel® Stratix® 10 L-Tile ES2 Development Kit |
Select the appropriate development board. If you select one of the development boards,
system generation overwrites the device you selected with the device
on that development board.
Note: If you select None, system generation does not make any pin
assignments. You must make the assignments in the .qsf file.
|
5. Designing with the IP Core
5.1. Generation
5.2. Simulation
The Intel® Quartus® Prime Pro Edition software optionally generates a functional simulation model, a testbench or design example, and vendor-specific simulator setup scripts when you generate your parameterized PCI Express* IP core. For Endpoints, the generation creates a Root Port BFM.
The Intel® Quartus® Prime Pro Edition supports the following simulators.
Vendor | Simulator | Version | Platform |
---|---|---|---|
Aldec | Active-HDL * | 10.3 | Windows |
Aldec | Riviera-PRO * | 2016.10 | Windows, Linux |
Cadence | Incisive Enterprise * (NCSim*) | 15.20 | Linux |
Cadence | Xcelium* Parallel Simulator | 17.04.014 | Linux |
Mentor Graphics | ModelSim PE* | 10.5c | Windows |
Mentor Graphics | ModelSim SE* | 10.5c | Windows, Linux |
Mentor Graphics | QuestaSim* | 10.5c | Windows, Linux |
Synopsys | VCS*/VCS MX* | 2016,06-SP-1 | Linux |
Refer to the Example Design for Intel L-/H-Tile Avalon-ST for PCI Express IP chapter to create a simple custom example design using the parameters that you specify.
5.2.1. Selecting Serial or PIPE Simulation
The parameter serial_sim_hwtcl in <testbench_dir>/pcie_<dev>_hip_avst_0_example_design/pcie_example_design_tb/ip/pcie_example_design_tb/DUT_pcie_tb_ip/altera_pcie_<dev>_tbed_<ver>/sim/altpcie_s10_tbed_hwtcl.v determines the simulation mode. When 1'b1, the simulation is serial. When 1'b0, the simulation runs in the 32-bit parallel PIPE mode.
5.3. IP Core Generation Output ( Intel Quartus Prime Pro Edition)
File Name | Description |
---|---|
<your_ip>.ip | Top-level IP variation file that contains the parameterization of an IP core in your project. If the IP variation is part of a Platform Designer system, the parameter editor also generates a .qsys file. |
<your_ip>.cmp | The VHDL Component Declaration (.cmp) file is a text file that contains local generic and port definitions that you use in VHDL design files. |
<your_ip>_generation.rpt | IP or Platform Designer generation log file. Displays a summary of the messages during IP generation. |
<your_ip>.qgsimc (Platform Designer systems only) | Simulation caching file that compares the .qsys and .ip files with the current parameterization of the Platform Designer system and IP core. This comparison determines if Platform Designer can skip regeneration of the HDL. |
<your_ip>.qgsynth (Platform Designer systems only) | Synthesis caching file that compares the .qsys and .ip files with the current parameterization of the Platform Designer system and IP core. This comparison determines if Platform Designer can skip regeneration of the HDL. |
<your_ip>.csv | Contains information about the upgrade status of the IP component. |
<your_ip>.bsf | A symbol representation of the IP variation for use in Block Diagram Files (.bdf). |
<your_ip>.spd | Input file that ip-make-simscript requires to generate simulation scripts. The .spd file contains a list of files you generate for simulation, along with information about memories that you initialize. |
<your_ip>.ppf | The Pin Planner File (.ppf) stores the port and node assignments for IP components you create for use with the Pin Planner. |
<your_ip>_bb.v | Use the Verilog blackbox (_bb.v) file as an empty module declaration for use as a blackbox. |
<your_ip>_inst.v or _inst.vhd | HDL example instantiation template. Copy and paste the contents of this file into your HDL file to instantiate the IP variation. |
<your_ip>.regmap | If the IP contains register information, the Intel® Quartus® Prime software generates the .regmap file. The .regmap file describes the register map information of master and slave interfaces. This file complements the .sopcinfo file by providing more detailed register information about the system. This file enables register display views and user customizable statistics in System Console. |
<your_ip>.svd |
Allows HPS System Debug tools to view the register maps of peripherals that connect to HPS within a Platform Designer system. During synthesis, the Intel® Quartus® Prime software stores the .svd files for slave interface visible to the System Console masters in the .sof file in the debug session. System Console reads this section, which Platform Designer queries for register map information. For system slaves, Platform Designer accesses the registers by name. |
<your_ip>.v <your_ip>.vhd |
HDL files that instantiate each submodule or child IP core for synthesis or simulation. |
mentor/ | Contains a msim_setup.tcl script to set up and run a ModelSim* simulation. |
aldec/ | Contains a Riviera-PRO* script rivierapro_setup.tcl to setup and run a simulation. |
/synopsys/vcs /synopsys/vcsmx |
Contains a shell script vcs_setup.sh to set up and run a VCS* simulation. Contains a shell script vcsmx_setup.sh and synopsys_sim.setup file to set up and run a VCS* MX simulation. |
/cadence | Contains a shell script ncsim_setup.sh and other setup files to set up and run an NCSim simulation. |
/xcelium | Contains an Xcelium* Parallel simulator shell script xcelium_setup.sh and other setup files to set up and run a simulation. |
/submodules | Contains HDL files for the IP core submodule. |
<IP submodule>/ | Platform Designer generates /synth and /sim sub-directories for each IP submodule directory that Platform Designer generates. |
5.4. Integration and Implementation
5.4.1. Clock Requirements
The Intel L-/H-Tile Avalon-ST for PCI Express IP Core has a single 100 MHz input clock and a single output clock. An additional clock is available for PIPE simulations only.
refclk
Each instance of the PCIe IP core has a dedicated refclk input signal. This input reference clock can be sourced from any reference clock in the transceiver tile. Refer to the Stratix 10 GX, MX, TX and SX Device Family Pin Connection Guidelines for additional information on termination and valid locations.
coreclkout_hip
Maximum Link Rate | Maximum Link Width | Avalon-ST Interface Width |
coreclkout_hip Frequency |
---|---|---|---|
Gen1 | x1, x2, x4, x8, x16 | 256 | 125 MHz |
Gen2 | x1, x2, x4, x8 | 256 | 125 MHz |
Gen2 | x16 | 256 | 250 MHz |
Gen3 | x1, x2, x4 | 256 | 125 MHz |
Gen3 | x8 | 256 | 250 MHz |
Gen3 | x16 | 512 | 250 MHz |
sim_pipe_pclk_in
This input clock is for PIPE simulation only. Derived from the refclk input, sim_pipe_pclk_in is the PIPE interface clock for PIPE mode simulation.
5.4.2. Reset Requirements
The Intel L-/H-Tile Avalon-ST for PCI Express IP Core has two, asynchronous, active low reset inputs, npor and pin_perst. Both reset the Transaction, Data Link and Physical Layers.
npor
The Application Layer drives the npor reset input to the PCIe IP core. If you choose to design an Application Layer does not drive npor, you must tie this output to 1'b1. The npor signal resets all registers and state machines to their initial values.
pin_perst
- NPERSTL0 : Bottom Left PCIe IP core and Configuration via Protocol (CvP)
- NPERSTL1: Middle Left PCIe PCIe IP core (When available)
- NPERSTL2: Top Left PCIe IP core (When available)
- NPERSTR0: Bottom Right PCIe IP core (When available)
- NPERSTR1: Middle Right PCIe IP core (When available)
- NPERSTR2: Top Right PCIe IP core (When available)
reset_status
When asserted, this signal indicates that the PCIe IP core is in reset. The reset_status signal is synchronous to coreclkout_hip. It is active high.
clr_st
This signal has the same functionality as reset_status. It is provided for backwards compatibility with Arria® 10 devices. It is active high.
5.5. Required Supporting IP Cores
5.5.1. Hard Reset Controller
The Hard Reset Controller generates the reset for the PCIe IP core logic, transceivers, and Application Layer. To meet 100 ms PCIe configuration time, the Hard Reset Controller interfaces with the SDM. This allows the PCIe Hard IP to be configured first so that PCIe link training occurs when the FPGA fabric is still being configured.
5.5.2. TX PLL
5.6. Channel Layout and PLL Usage
The following figures show the channel layout and PLL usage for Gen1, Gen2 and Gen3, x1, x2, x4, x8 and x16 variants of the Intel L-/H-Tile Avalon-MM for PCI Express IP core. Note that the missing variant Gen3 x16 is supported by another IP core (the Intel L-/H-Tile Avalon-MM+ for PCI Express IP core). For more details on the Avalon® -MM+ IP core, refer to https://www.intel.com/content/www/us/en/programmable/documentation/sox1520633403002.html.
The channel layout is the same for the Avalon® -ST and Avalon® -MM interfaces to the Application Layer.
6. Block Descriptions
The Intel L-/H-Tile Avalon-ST for PCI Express implements the complete PCI Express protocol stack as defined in the PCI Express Base Specification. The protocol stack includes the following layers:
- Transaction Layer—The Transaction Layer contains the Configuration Space, which manages communication with the Application Layer, the RX and TX channels, the RX buffer, and flow control credits.
-
Data Link Layer—The Data Link Layer, located between the
Physical Layer and the Transaction Layer, manages packet transmission and maintains
data integrity at the link level. Specifically, the Data Link Layer performs the
following tasks:
- Manages transmission and reception of Data Link Layer Packets (DLLPs)
- Generates all transmission link cyclical redundancy code (LCRC) values and checks all LCRCs during reception
- Manages the retry buffer and retry mechanism according to received ACK/NAK Data Link Layer packets
- Initializes the flow control mechanism for DLLPs and routes flow control credits to and from the Transaction Layer
- Physical Layer—The Physical Layer initializes the speed, lane numbering, and lane width of the PCI Express link according to packets received from the link and directives received from higher layers. The following figure provides a high‑level block diagram.
Each channel of the Physical Layer is paired with an Embedded Multi-die Interconnect Bridge (EMIB) module. The FPGA fabric interfaces to the PCI Express IP core through the EMIB.
6.1. Interfaces
- Orange text: signal is only available for L-Tile devices
- Deep red/brown text: signal is only available for H-Tile devices
- Blue text: PIPE interface signals, only available for simulation
- <w>: the width of the Avalon® -ST data interface
- <n>: 2 for the 512-bit interface and 1 for the 256-bit interface
- <r>: 6 for the 512-bit interface and 3 for the 256-bit
6.1.1. TLP Header and Data Alignment for the Avalon-ST RX and TX Interfaces
The ordering of bytes in the header and data portions of packets is different. The first byte of the header dword is located in the most significant byte of the dword. The first byte of the data dword is located in the least significant byte of the dword on the data bus.
Packet | TLP |
---|---|
Header0 | pcie_hdr_byte0, pcie_hdr_byte1, pcie_hdr_byte2, pcie_hdr_byte3 |
Header1 | pcie_hdr_byte4, pcie_hdr_byte5, pcie_hdr_byte6, pcie_hdr_byte7 |
Header2 | pcie_hdr_byte8, pcie_hdr_byte9, pcie_hdr_byte10, pcie_hdr_byte11 |
Header3 | pcie_hdr_byte12, pcie_hdr_byte13, pcie_hdr_byte14, pcie_hdr_byte15 |
Data0 | pcie_data_byte3, pcie_data_byte2, pcie_data_byte1, pcie_data_byte0 |
Data1 | pcie_data_byte7, pcie_data_byte6, pcie_data_byte5, pcie_data_byte4 |
Data2 | pcie_data_byte11, pcie_data_byte10, pcie_data_byte9, pcie_data_byte8 |
Data<n> | pcie_data_byte<4n+3>, pcie_data_byte<4n+2>, pcie_data_byte<4n+1>, pcie_data_byte0<4n> |
The following figure illustrates the mapping of Avalon-ST packets to PCI Express* TLPs for a three-dword header and a four-dword header. In the figure, H0 to H3 are header dwords, and D0 to D9 are data dwords.
6.1.2. Avalon-ST 256-Bit RX Interface
Signal |
Direction |
Description |
---|---|---|
rx_st_data[255:0] |
Output |
Receive data bus. The Application Layer receives data from the Transaction Layer on this bus. The data on this bus is valid when rx_st_valid is asserted. Refer to the TLP Header and Data Alignment for the Avalon-ST TX and Avalon-ST RX Interfaces for the layout of TLP headers and data. |
rx_st_sop |
Output |
Marks the first cycle of the TLP when both rx_st_sop and rx_st_valid are asserted. |
rx_st_eop |
Output |
Marks the last cycle of the TLP when both rx_st_eop and rx_st_valid are asserted. |
rx_st_ready |
Input |
Indicates that the Application Layer is ready to accept data. The Application Layer deasserts this signal to throttle the data stream. |
rx_st_valid |
Output |
Qualifies rx_st_data into the Application Layer. The rx_st_ready to rx_st_valid latency for Stratix® 10 devices is 17 cycles. When rx_st_ready deasserts, rx_st_valid will deassert within 17 cycles. When rx_st_ready reasserts, rx_st_valid will reassert within 17 cycles if there is more data to send. To achieve the best throughput, Intel recommends that you size the RX buffer to avoid the deassertion of rx_st_ready. Refer to Avalon-ST RX Interface rx_st_valid Deasserts for a timing diagram that illustrates this behavior. |
rx_st_bar_range[2:0] |
Output |
Specifies the bar for the TLP being output. The following encodings
are defined:
The data on this bus is valid when rx_st_sop and rx_st_valid are both asserted. |
rx_st_vf_active H-Tile | Output |
When asserted, the received TLP targets a VF bar. Valid if rx_st_sop and rx_st_valid are asserted. When deasserted, the TLP targets a PF and the rx_st_func_num port drives the function number. Valid when multiple virtual functions are enabled. |
rx_st_func_num[1:0] H-Tile |
Output |
Specifies the target physical function number of the received TLP. The application uses this information to route packets for both request and completion TLPs. For completion TLPs, specifies the PF number of the requestor for this completion TLP. If the TLP targets a VF[<m>, <n>], this bus carries the PF<m> information. Valid when multiple physical functions are enabled. These outputs are qualified by rx_st_sop and rx_st_valid. |
rx_st_vf_num[log2 <x>-1:0] H-Tile |
Output |
Specifies the target VF number rx_st_data[255:0] of the received TLP. The application uses this information for both request and completion TLPs. For completion TLPs, specifies the VF number of the requester for this completion TLP. <x> is the number of VFs. Valid when rx_st_vf_active is asserted. If the TLP targeting at VF[<m>, <n>] this bus carries the VF<n> information. Valid when multiple virtual functions are enabled. These outputs are qualified by rx_st_sop and rx_st_valid. |
rx_st_empty[2:0] | Specifies the number of dwords that are empty, valid during cycles when the rx_st_eop signal is asserted. Not interpreted when rx_st_eop is deasserted. |
6.1.2.1. Avalon-ST RX Interface Three- and Four-Dword TLPs
6.1.2.2. Avalon-ST RX Interface rx_st_ready Deasserts for the 256-Bit Interface
Avalon-ST RX Interface rx_st_valid Reasserts for the 256-Bit Interface
6.1.2.3. Avalon-ST RX Interface rx_st_valid Deasserts
6.1.2.4. Avalon-ST RX Back-to-Back Transmission
6.1.2.5. Avalon-ST RX Interface Single-Cycle TLPs
6.1.3. Avalon-ST 512-Bit RX Interface
The 512-bit interface supports two locations for the beginning of a TLP, bit[0] and bit[256]. The interface supports two TLPs per cycle only when an end-of-packet cycle occurs in the lower 256 bits. In other words, a TLP can start on bit [256] only if a rx_st_eop pulse occurs in the lower 256 bits.
Signal |
Direction |
Description |
---|---|---|
rx_st_data_o[511:0] |
Output |
Receive data bus. The Application Layer receives data from the Transaction Layer on this bus. For large TLPs, the Application Layer drives 512-bit data on rx_st_data[511:0] until the end-of-packet cycle. For TLPs with an end-of-packet cycle in the lower 256 bits, the 512-bit interface supports a start-of-packet cycle in the upper 256 bits. |
rx_st_sop[1:0] |
Output |
Signals the first cycle of the
TLP
when asserted in conjunction the corresponding bit of rx_st_valid. The following
encodings are defined:
|
rx_st_eop[1:0] |
Output |
Signals the last cycle of the TLP when asserted in conjunction with the corresponding bit of rx_st_valid[1:0]. The following encodings are defined:
|
rx_st_ready_i |
Input |
Indicates that the Application Layer is ready to accept data. The Application Layer deasserts this signal to apply backpressure to the data stream. |
rx_st_valid_o[1:0] |
Output |
Qualifies rx_st_data_o into the Application Layer. The rx_st_ready_i to rx_st_valid_o[1:0] latency for Stratix® 10 devices is 18 cycles. When rx_st_ready_i deasserts, rx_st_valid_o[1:0] will deassert within 18 cycles. When rx_st_ready_i reasserts, rx_st_valid_o will reassert within 18 cycles if there is more data to send. To achieve the best throughput, Intel recommends that you size the RX buffer in your application logic to avoid the deassertion of rx_st_ready_o. The Relationship Between rx_st_ready and rx_st_valid for the 512-bit Avalon-ST Interface timing diagram below illustrates the relationship between rx_st_ready_i and rx_st_valid_o. |
rx_st_bar_range_o[5:0] |
Output |
Specifies the BAR for the TLP being output.
The following encodings are defined:
These outputs are valid when both rx_st_sop and rx_st_valid are asserted. |
rx_st_empty_o[5:0] | Output |
Specifies the number of dwords that are empty during cycles when the rx_st_eop_o[1:0] signal is asserted. Not valid when rx_st_eop[1:0] is deasserted. The following encodings are defined:
|
rx_st_parity_o[63:0] | Output | Byte parity for rx_st_data_o. Bit 0 corresponds to rx_st_data_o[7:0], bit 1 corresponds to rx_st_data_o[15:8] and so on. |
rx_st_vf_active[1:0] H-Tile | Output |
When asserted, the received TLP targets a VF bar. Valid when rx_st_sop is asserted. When deasserted, the TLP targets a PF and the rx_st_func_num port drives the physical function number. For the 512-bit interface, Bit [0] corresponds to the rx_st_data[255:0] and bit 1 corresponds to rx_st_data[511:256] Valid when multiple physical functions are enabled. |
rx_st_func_num[3:0] H-Tile |
Output |
|
rx_st_vf_num[log2 <x>)-1:0] H-Tile |
Output |
Specifies the target VF number for the received TLP. The application uses this information for both request and completion TLPs. For completion TLPs, specifies the VF number of the requester for this completion TLP. <x> is the number of VFs. The following encodings are defined:
Valid when rx_st_vf_active is asserted. If the TLP targeting at VF[<m>, <n>] this bus carries the VF<n> information. Valid when multiple virtual functions are enabled. |
6.1.3.1. PCIe TLP Layout Using the 512-Bit Interface
Example 1: Two, Small TLPs
Example 1 illustrates the transmission of two, single-cycle TLPs. The TLP transmitted in the low-order bits has two empty dwords. The TLP transmitted in the high-order bits has no empty dwords.
Example 2: One, 6-DWord TLP
Example 2 shows the transmission of one, 6-dword PCI Express* . The low-order bits provide the header and the first four dwords of data. The high-order bits have the final two dwords of data. The rx_st_empty_o[5:3] vector indicates six empty dwords in the high-order bits. The rx_st_eop[1] bit indicates the end of the TLP in the high-order bits.
Example 3: One, 4-DWord TLP
Example 3 shows a single cycle packet in the low-order bits and no transmission in the high-order bits. The rx_st_valid_o[1] bit indicates that the high-order bits do not drive valid data.
6.1.4. Avalon-ST 256-Bit TX Interface
Signal |
Direction |
Description |
---|---|---|
tx_st_data[255:0] |
Input |
Data for transmission. The Application Layer must provide a properly formatted TLP on the TX interface. Valid when tx_st_valid is asserted (subject to the ready latency, see below). The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4 dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests. In addition, the TLP must be no larger than the negotiated Max Payload size. For the TLP requester ID field, bits[31:16] in dword1 specify the following information:
Refer to the TLP Header and Data Alignment for the Avalon-ST TX and Avalon-ST RX Interfaces for the layout of TLP headers and data. |
tx_st_sop |
Input |
Indicates first cycle of a TLP when asserted together with tx_st_valid. |
tx_st_eop |
Input |
Indicates last cycle of a TLP when asserted together with tx_st_valid. |
tx_st_ready |
Output |
Indicates that the Transaction Layer is ready to accept data for transmission. The core deasserts this signal to throttle the data stream. tx_st_ready may be asserted during reset. The Application Layer should wait at least 2 clock cycles after the reset is released before issuing packets on the Avalon-ST TX interface. The reset_status signal can also be used to monitor when the IP core has come out of reset. If tx_st_ready is asserted by the Transaction Layer on cycle <n>, then <n> + readyLatency is a ready cycle, during which the Application Layer may assert tx_st_valid and transfer data. If tx_st_ready is deasserted by the Transaction Layer on cycle <n>, then the Application Layer must deassert tx_st_valid within the readyLatency number of cycles after cycle <n>. The readyLatency is 3 coreclkout_hip cycles. This interface is not strictly Avalon® -ST compliant. |
tx_st_valid |
Input |
Clocks tx_st_data to the core when tx_st_ready is also asserted. Between tx_st_sop and the corresponding tx_st_eop, tx_st_valid must not be deasserted, except in response to tx_st_ready deassertion. When tx_st_ready deasserts in the middle of a packet, this signal must deassert exactly 3 coreclkout_hip cycles later. When tx_st_ready reasserts, and tx_st_data is in mid-TLP, this signal must reassert within 3 cycles. The figure entitled Avalon-ST TX Interface tx_st_ready Deasserts illustrates the timing of this signal. To facilitate timing closure, Intel recommends that you register both the tx_st_ready and tx_st_valid signals. |
tx_st_err |
Input |
Forces an error on transmitted TLP. This signal is used to nullify a packet. To nullify a packet, assert this signal for 1 cycle with tx_st_eop. When a packet is nullified, the following packet should not be transmitted until the next clock cycle. Note: You cannot nullify a packet with 8 DW or less of
data.
|
tx_st_parity[31:0] |
Input |
The IP core supports byte parity. Each bit represents even parity of the associated byte of the tx_st_data bus. For example, bit[0] corresponds to tx_st_data[7:0], bit[1] corresponds to tx_st_data[15:8], and so on. |
tx_st_vf_active H-Tile | Input |
When asserted, the transmitting TLP is for a VF. When deasserted, the transmitting TLP is for a PF. Valid when tx_st_sop is asserted. Valid when multiple virtual functions are enabled. |
6.1.4.1. Avalon-ST TX Three- and Four-Dword TLPs
6.1.4.2. Avalon-ST TX Interface tx_st_ready Deassertion
6.1.4.3. Assertion and Deassertion of Avalon-ST TX Interface tx_st_valid
You must not deassert tx_st_valid between the tx_st_sop and tx_st_eop on a ready cycle. For the definition of a ready cycle, refer to Avalon Interface Specifications.
The following timing diagram shows an example where tx_st_ready deasserts in the middle of a packet and then reasserts, causing tx_st_valid to also deassert and then reassert.
6.1.5. Avalon-ST 512-Bit TX Interface
The 512-bit interface supports two locations for the beginning of a TLP, bit[0] and bit[256]. The interface supports multiple TLPs per cycle only when an end-of-packet cycle occurs in the lower 256 bits.
Signal |
Direction |
Description |
---|---|---|
tx_st_data_i[511:0] |
Input |
Application Layer data for transmission. The Application Layer must provide a properly formatted TLP on the TX interface. Valid when the tx_st_valid_o signal is asserted. The mapping of message TLPs is the same as the mapping of Transaction Layer TLPs with 4 dword headers. The number of data cycles must be correct for the length and address fields in the header. Issuing a packet with an incorrect number of data cycles results in the TX interface hanging and becoming unable to accept further requests. For the TLP requester ID field, bits[31:16] in dword1 specify the following information:
For TLPs with an end-of-packet cycle in the lower 256 bits, the 512-bit interface supports a start-of-packet cycle in the upper 256 bits. If a TLP starts on bit 256, bits [319:304] specify the completer ID for Completion packets. |
tx_st_sop_i[1:0] |
Input |
Indicates the first cycle of a TLP when asserted in conjunction with the corresponding bit of tx_st_valid_o. The following encodings are defined:
|
tx_st_eop_i[1:0] |
Input |
Indicates the end of a TLP when asserted in conjunction with the corresponding bit of tx_st_valid_o[1:0]. The following encodings are defined:
|
tx_st_ready_o |
Output |
Indicates that the Transaction Layer is ready to accept data for transmission. The core deasserts this signal to apply backpressure to the data stream. The Application Layer should wait at least 2 clock cycles after the reset is released before issuing packets on the Avalon-ST TX interface. The Application Layer can monitor the reset_status signal to determine when the IP core has come out of reset. If tx_st_ready_o is asserted by the Transaction Layer on cycle <n> , then <n> + readyLatency is a ready cycle, during which the Application Layer may assert tx_st_valid_i and transfer data. If the Transaction Layer deasserts tx_st_ready_o on cycle <n>, then the Application Layer must deassert tx_st_valid_i within a readyLatency number of cycles after cycle <n>. The readyLatency is 3 coreclkout_hip cycles. |
tx_st_valid_i[1:0] |
Input |
Clocks tx_st_data_i into the core on ready cycles. Between tx_st_sop_i and tx_st_eop_i, the tx_st_valid_i signal must not be deasserted in the middle of a TLP except in response to tx_st_ready deassertion. When tx_st_ready_o deasserts in the middle of a packet, this signal must deassert exactly 3 coreclkout_hip cycles later because the readyLatency is 3 cycles for this interface. When tx_st_ready_o reasserts, and tx_st_data is in mid-TLP, this signal must reassert on the next ready cycle. The figure entitled Avalon-ST TX Interface tx_st_ready Deasserts illustrates the timing of this signal. The behavior of this signal is the same for the 256- and 512-bit interfaces. To facilitate timing closure, Intel recommends that you register both the tx_st_ready_o and tx_st_valid_i signals. |
tx_st_err_i[1:0] | Input | When asserted, indicates an error on transmitted TLP. This signal is
asserted with tx_st_eop_i and
nullifies a packet. The following encodings are defined:
Note: You cannot nullify a packet with 8 DW or
less of
data.
|
tx_st_parity_i[63:0] | Input | Byte parity for tx_st_data_i. Bit 0 corresponds to tx_st_data_i[7:0], bit 1 corresponds to tx_st_data_i[15:8], and so on. |
tx_st_vf_active[1:0] H-Tile | Input |
When asserted, the transmitting TLP is for a VF. When deasserted, the transmitting TLP is for a PF. Valid when tx_st_sop is asserted. Valid when multiple functions are enabled. |
6.1.6. TX Credit Interface
Flow Control credits are defined for the following TLP categories:
- Posted transactions - TLPs that do not require a response
- Non-Posted transactions - TLPs that require a completion
- Completions - TLPs that respond to non-posted transactions
TLP Type | Category |
---|---|
Memory Write | Posted |
Memory Read Memory Read Lock |
Non-posted |
I/O Read I/O Write |
Non-posted |
Configuration Read Configuration Write |
Non-posted |
Message | Posted |
Completions Completions with Data Completion Locked Completion Lock with Data |
Completions |
Fetch and Add AtomicOp | Non-posted |
Credit Type | Number of Dwords |
---|---|
Header credit - completions | 4 dwords |
Header credit - requests | 5 dwords |
Data credits | 4 dwords |
Signal |
Direction |
Description |
---|---|---|
tx_ph_cdts[7:0] |
Output |
Header credit net value for the flow control (FC) posted requests. |
tx_pd_cdts[11:0] |
Output |
Data credit net value for the FC posted requests. |
tx_nph_cdts[7:0] |
Output |
Header credit net value for the FC non-posted requests. |
tx_npd_cdts[11:0] L-Tile |
Output |
Data credit net value for the FC non-posted requests. The tx_npd_cdts[11:0] is not available for H-Tile devices. You can monitor the non-posted header credits to determine if there is sufficient space to transmit the next non-posted data TLP. |
tx_cplh_cdts[7:0] |
Output |
Header credit net value for the FC Completion. A value of 0xFF indicates infinite Completion header credits. |
tx_cpld_cdts[11:0]L-Tile |
Output |
Data credit net value for the FC Completion. A value of 0xFF indicates infinite Completion data credit. The tx_cpld_cdts[11:0] is not available for H-Tile devices. You can monitor the completion header credits to determine if there is sufficient space to transmit the next non-posted data TLP. The completion TLP size is either the request data size or 64 bytes when the completion is split. For Root Ports or non peer-to-peer Endpoints as the link partner, assume that this credit is infinite. |
tx_hdr_cdts_consumed[1:0] |
Output |
Asserted for 1 coreclkout_hip cycle, for each header credit consumed by the application layer traffic. Note that credits the Hard IP consumes for internally generated Completions or Messages are not tracked in this signal. For the Gen3 x16 512-bit interface, tx_hdr_cdts_consumed[1] is for the higher bus and tx_hdr_cdts_consumed[0] is for the lower bus. There is only one tx_hdr_cdts_consumed signal for the 256-bit interface. |
tx_data_cdts_consumed[1:0] |
Output |
Asserted for 1 coreclkout_hip cycle, for each data credit consumed. Note that credits the Hard IP consumes for internally generated Completions or Messages are not tracked in this signal. For the Gen3 x16 512-bit interface, tx_data_cdts_consumed[1] is for the higher bus and tx_data_cdts_consumed[0] is for the lower bus. There is only one tx_data_cdts_consumed signal for the 256-bit interface. |
tx_cdts_type[<n>-1:0] | Output |
Specifies the credit type shown on the tx_cdts_data_value[1:0] bus. The following encodings are defined:
For the Gen3 x16 512-bit interface, tx_cdts_type[3:2] is for the higher bus and tx_cdts_type[1:0] is for the lower bus. |
tx_cdts_data_value[3:0] L-Tile, 16 lanes tx_cdts_data_value[1:0] L-Tile, 8 lanes or fewer tx_cdts_data_value[1:0] H-Tile, 16 lanes tx_cdts_data_value H-Tile, 8 lanes or fewer |
Output |
For H-Tile: 1 = 2 data credits consumed; 0 = 1 data credit consumed. For L-Tile: The value of tx_cdts_data_value+1 specifies the data credit consumed. For both H- and L-Tiles: Only valid when tx_data_cdts_consumed asserts. For the Gen3 x16 512-bit interface, tx_cdts_data_value[1] is for the higher bus and tx_cdts_data_value[0] is for the lower bus. |
6.1.7. Interpreting the TX Credit Interface
The following equation defines the available buffer space of the link partner:
RX_buffer_space = (credit_limit - credits_consumed)+ released_credits
where:
credits_consumed = credits_consumedapplication + credits_consumedPCIe_IP_core
The hard IP core consumes a small number of posted credits for the following reasons:
- To send completion TLPs for Configuration Requests targeting internal registers
- To transmit posted TLPs for error and interrupt Messages
The hard IP core does not report the internally consumed credits to the Application. Without knowing the hard IP core's internal usage, the Application cannot maintain a completely accurate count of Posted or Completion credits.
The hard IP core does not consume non-posted credits. Consequently, it is possible to maintain an accurate count of the available non-posted credits. Refer to the TX Credit Adjustment Sample Code for code that calculates the number of non-posted credits available to the Application. This RTL recovers the updated Flow Control credits from the remote link. It drives the value of the link partner's RX_buffer_space for non-posted header and data credits on tx_nph_cdts and tx_npd_cdts, respectively.
6.1.8. Clocks
Signal |
Direction |
Description |
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
refclk |
Input |
This is the input reference clock for the IP core as defined by the PCI Express Card Electromechanical Specification Revision 2.0. The frequency is 100 MHz ±300 ppm. To meet the PCIe* 100 ms wake-up time requirement, this clock must be free-running. Note: This input reference clock must be stable and free-running
at device power-up for a successful device
configuration.
|
||||||||||||
coreclkout_hip |
Output |
This clock drives the Data Link,
Transaction, and Application Layers. For the Application Layer,
the frequency depends on the data rate and the number of lanes
as specified in the table
|
6.1.9. Update Flow Control Timer and Credit Release
The IP core releases credits on a per-clock-cycle basis as it removes TLPs from the local RX Buffer.
6.1.10. Function-Level Reset (FLR) Interface
The function-level reset (FLR) interface can reset the individual SR-IOV functions.
Signal |
Direction |
Description |
|
---|---|---|---|
flr_pf_active[<n>-1:0]H-Tile |
Output |
The SR-IOV Bridge asserts flr_pf_active when bit 15 of the PCIe Device Control Register is set. Bit 15 is the FLR field. Once asserted, the flr_pf_active signal remains high until the Application Layer sets flr_pf_done high for the associated function. The Application Layer must perform actions necessary to clear any pending transactions associated with the function being reset. The Application Layer must assert flr_pf_done to indicate it has completed the FLR actions and is ready to re-enable the PF. |
|
flr_pf_done[<n>-1:0]H-Tile |
Input |
<n> is the number of PFs. When asserted for one or more cycles, indicates that the Application Layer has completed resetting all the logic associated with the PF. Bit 0 is for PF0. Bit 1 is for PF1, and so on. Users decode the FLR write to PF register through the cfg write broadcast bus to know the PF should be FLR-ed. When flr_pf_active is asserted, the Application Layer must assert flr_completed within 100 microseconds to re-enable the function. |
|
flr_rcvd_vf H-Tile |
Output |
The SR-IOV Bridge asserts this output port for one cycle when a 1 is being written into the PCIe Device Control Register FLR field, bit[15], of the of a VF. flr_rcvd_pf_num and flr_rcvd_vf_num drive PF number and the VF offset associated with the Function being reset. The Application Layer responds to a pulse on this output by clearing any pending transactions associated with the VF being reset. It then asserts flr_completed_vf to indicate that it has completed the FLR actions and is ready to re-enable the VF. |
|
flr_rcvd_pf_num[<n>-1:0] H-Tile |
Output | When flr_rcvd_vf is asserted, this output
specifies the PF number associated with the VF being reset. <n> is the PF number. |
|
flr_rcvd_vf_num[log2 <n>-1:0] H-Tile |
Output |
When flr_rcvd_vf is asserted, this output specifies the VF number offset associated with the VF being reset. <n> is the number of VFs. |
|
flr_completed_vfH-Tile |
Input |
When asserted, indicates that the Application Layer has completed resetting all the logic associated with flr_completed_vf_num[<n>-1:0]. When flr_active_vf<n> asserts, the Application Layer it must assert the corresponding bit of flr_completed_vf within 100 microseconds to re-enable the VF. <n> is the total number of VFs. |
|
flr_completed_pf_num[<n>-1:0] H-Tile |
Input | When flr_completed_vf is asserted, this
input specifies the PF number associated with the VF that has
completed . <n> is the number of PFs. |
|
flr_completed_vf_num[log2 <n>-1:0]H-Tile | Input | When flr_completed_vf is asserted, this
input specifies the VF number associated with the VF that has
completed its FLR. <n> is the total number of VFs. |
6.1.11. Resets
The reset logic requires a free running clock that is stable and available to the IP core at configuration time.
Signal |
Direction |
Description |
---|---|---|
currentspeed[1:0] | Output |
Indicates the current speed of the PCIe module. The following encodings are defined:
|
npor |
Input |
The Application Layer drives this active low reset signal. npor resets the entire IP core, PCS, PMA, and PLLs. npor should be held for a minimum of 20 ns. Gen3 x16 variants, should hold npor for at least 10 cycles. This signal is edge, not level sensitive; consequently, a low value on this signal does not hold custom logic in reset. This signal cannot be disabled. |
pin_perst |
Input |
Active low reset from the PCIe reset pin of the device. Resets the datapath and control registers. |
ninit_done | Input | This is an active-low asynchronous input. A "1" on this signal indicates that the FPGA device is not yet fully configured. A "0" indicates the device has been configured and is in normal operating mode. To use the ninit_done input, instantiate the Reset Release Intel FPGA IP in your design and use its ninit_done output to drive the input of the Avalon® streaming IP for PCIe. For more details on how to use this input, refer to https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/an/an891.pdf. |
pld_clk_inuse |
Output | This reset signal has the same effect as reset_status. This signal is provided for backwards compatibility with Arria® 10 devices. |
pld_core_ready |
Input | When asserted, indicates that the Application Layer is ready. The IP core can releases reset after this signal is asserted. |
reset_status |
Output | Active high reset status. When high, indicates that the IP core is not ready for usermode. reset_status is deasserted only when npor is deasserted and the IP core is not in reset. Use reset_status to drive the reset of your application. Synchronous to coreclkout_hip. |
clr_st |
Output |
clr_st has the same functionality as reset_status. It is provided for backwards compatibility with previous device families. |
serdes_pll_locked |
Output |
When asserted, indicates that the PLL that generates coreclkout_hip is locked. In pipe simulation mode this signal is always asserted. |
6.1.12. Interrupts
6.1.12.1. MSI and Legacy Interrupts
Signal |
Direction |
Description |
---|---|---|
app_msi_req |
Input |
Application Layer MSI request. Assertion causes an MSI posted write TLP to be generated based on the MSI configuration register values and the app_msi_tc and app_msi_num input ports. The Application Layer can deassert this MSI request signal any time after app_msi_ack has been asserted to acknowledge the request. |
app_msi_ack |
Output |
The IP core acknowledges theapp_msi_req request. Asserts for 1 cycle to acknowledge the Application Layer's request for an MSI interrupt. The Application Layer can deassert the app_msi_req request as soon as it receives this signal. |
app_msi_tc[2:0] |
Input |
Application Layer MSI traffic class. This signal indicates the traffic class used to send the MSI (unlike INTX interrupts, any traffic class can be used to send MSIs). |
app_msi_num[4:0] |
Input |
MSI number of the Application Layer. The application uses the app_msi_num bus to indicate the offset between the base message data and the MSI to send. When multiple message mode is enabled, it sets the lower five bits of the MSI Data register. Only bits that the MSI Message Control register enables apply. |
app_int_sts[3:0] H-Tile |
Input |
Controls legacy interrupts. Assertion of app_int_sts causes an Assert_INTx message TLP to be generated and sent upstream. Deassertion of app_int_sts causes a Deassert_INTx message TLP to be generated and sent upstream. When you enable multiple PFs, bit 0 is for PF0, bit 1 is for PF1, and so on. |
app_msi_func_num[1:0] H-Tile |
Input | Specifies the function number requesting an MSI transmission. |
app_err_func_num[1:0] H-Tile |
Input | Specifies the function number that is asserting the app_err_valid signal. |
6.1.13. Control Shadow Interface for SR-IOV
Use this interface for the following purposes:
- To monitor specific VF registers using the ctl_shdw_update output and the associated output signals defined below.
- To monitor all VF registers using the the ctl_shdw_req_all input to request a full scan of the register fields for all active VFs.
Signal |
Direction |
Description |
---|---|---|
ctl_shdw_update | Output | The SR-IOV Bridge asserts this output for 1 clock cycle when one or
more of the register fields being monitored is updated. The
ctl_shdw_cfg outputs drive
the new values. ctl_shdw_pf_num,
ctl_shdw_vf_num, and
ctl_shdw_vf_active identify the VF and its
PF. Note: When
ctl_shdw_update is
asserted, the ctl_shdw_* outputs are
valid.
|
ctl_shdw_pf_num[<n>-1:0] |
Output | Identifies the PF whose register settings are on the ctl_shdw_cfg outputs. When the function is a VF, this input specifies the PF number to which the VF is attached. |
ctl_shdw _vf_active | Output | When asserted, indicates that the function whose register settings are on the ctl_shdw_cfg outputs is a VF. ctl_shdw_vf_num drives the VF number offset. |
ctl_shdw_vf_num[10:0] | Output | Identifies the VF number offset of the VF whose register settings are on ctl_shdw_cfg outputs when ctl_shdw _vf_active is asserted, Its value ranges from 0-(<n>-1) , where <n> is the number of VFs attached to the associated PF. |
ctl_shdw_cfg[6:0] | Output |
When ctl_shdw_update is asserted, this output provides the current settings of the register fields of the associated function. The bits specify the following register fields:
|
ctl_shdw_req_all | Input |
When asserted, requests a complete scan of the register fields being monitored for all active Functions. When the ctl_shdw_req_all input is asserted, the SR-IOV bridge cycles through each VF. It provides the current values of all register fields. If a Configuration Write occurs during a scan, the SR-IOV Bridge interrupts the scan to output the new setting. It then resumes the scan, continuing sequentially from the updated VF setting. The SR-IOV Bridge checks the state of ctl_shdw_req_all at the end of each scan cycle. It starts a new scan cycle if this input is asserted. Connect this input to logic 1 to scan the functions continuously. |
6.1.14. Transaction Layer Configuration Space Interface
Signal |
Direction |
Description |
---|---|---|
tl_cfg_add[3:0] H-Tile tl_cfg_add[4:0] L-Tile |
Output |
Address of the TLP register. This signal is an index indicating which Configuration Space register information is being driven onto tl_cfg_ctl. Refer to H-Tile Multiplexed Configuration Register Information Available on tl_cfg_ctl or E-Tile Multiplexed Configuration Register Information Available on tl_cfg_ctl for the available information as appropriate. Address of the TLP register. This signal is an index indicating which Configuration Space register information is being driven onto tl_cfg_ctl. Refer to L-Tile Multiplexed Configuration Register Information Available on tl_cfg_ctl for the available information. |
tl_cfg_ctl[31:0] |
Output |
The tl_cfg_ctl signal is multiplexed and contains a subset of contents of the Configuration Space registers. |
tl_cfg_func[1:0] | Output | Specifies the function whose Configuration
Space register values are being driven tl_cfg_ctl[31:0]. The following encodings are
defined:
|
app_err_hdr[31:0] | Input | Header information for the error TLP. Four, 4-byte transfers send this information to the IP core. |
app_err_info[10:0] | Input | The Application can optionally provide the
following information:
|
app_err_valid | Input | When asserted, indicates that the data on app_err_info[10:0] is valid. For multi-function variants, the app_err_func_num specifies the function. |
TDM | 31 | 24 | 23 | 16 | 15 | 8 | 7 | 0 |
---|---|---|---|---|---|---|---|---|
0 |
[28:24]: Device Number [29]: Relax order enable [30]: No snoop enable [31]: IDO request enable |
Bus Number |
[13:8]: Auto negotiation link width [14]: IDO completion enable [15]: Memory space enable |
Device Control [2:0]: Max payload size [5:3]: Max rd req size [6]: Extended tag enable [7]: Bus master enable |
||||
1 |
[28:24]AER IRQ Msg num [29]: cfg_send_corr_err [30]: cfg_send_nf_err [31]: cfg_send_f_rr |
[16]: RCB cntl [17]: cfg_pm_no_soft_rst [23:18]: auto negotiation link width |
[12:8]: PCIe cap interrupt msg num [13]: interrupt disable [15:14]: Reserved. |
[1:0]: Sys power ind. cntl [3:2]: Sys attention ind cntl [4]: Sys power cntl [7:5]: Reserved |
||||
2 | Index of start VF[6:0] | Num VFs |
[4:1]: STU [11:8]: ATS [15:12]: auto negotiation link speed |
[0]: VF enable [1]: TPH enable [3:2]: TPH ST mode[1:0] [4]: Atomic request enable [5]: ARI forward enable [6]: ATS cache enable [7]: ATS STU[0] |
||||
3 |
MSI Address Lower |
|||||||
4 |
MSI Address Upper |
|||||||
5 | MSI Mask | |||||||
6 |
MSI Data |
Reserved |
[0]: MSI enable [1]: 64-bit MSI [4:2]: Multiple MSI enable [5]: MSI-X enable [6]: MSI-X func mask |
|||||
7 |
Reserved |
[5:0]: Auto negotiation link width [9:6]: Auto negotiation link speed |
TDM | 31 | 24 | 23 | 16 | 15 | 8 | 7 | 0 |
---|---|---|---|---|---|---|---|---|
0 |
[28:24]: Device number [29]: Relaxed Ordering en [30]: No Snoop en [31]: (IDO) req en |
Bus Number |
[ 8]: unsupported_req_rpt_en [ 9]: corr_err_rpt_en [10]: nonfatal_err_rpt_en [11]: fatal_err_rpt_en [12]: serr_err [13]: perr_en [14]: IDO completion en [15]: Memory space en |
Device Control [2:0]: Max payload size [5:3]: Max rd req size [6]: Extended tag en [7]: Bus master en |
||||
1 |
Number of VFs[15:0] |
[12:8]: PCIe Capability IRQ Msg Num [13]: IRQ disable [14]: Rd Cmpl Boundary (RCB) cntl [15]: pm_no_soft_rst |
[1:0]: System ind power cntl [3:2]: Sys attention ind cntl [4]: System power cntl [7:5]: Reserved |
|||||
2 |
[16]: Reserved [27:17]: ]: Index of Start VF[10:0] [31:28]: Auto negotiation link speed |
[8]: ATS cache en [13:9]: ATS STU[4:0] [15:14]: Reserved |
[0]: VF en [2:1]: TPH en [5:3]: TPH ST mode [6:] Atomic req en [7]: ARI forward enable |
|||||
3 |
MSI Address Lower |
|||||||
4 |
MSI Address Upper |
|||||||
5 | MSI Mask | |||||||
6 |
MSI Data |
[12:8]: AER IRQ Msg Num [13]: cfg_send_cor_err [14]: cfg_send_nf_err [15]: cfg_send_f_err |
[0]: MSI en [1]: 64-bit MSI [4:2]: Multiple MSI en [5]: MSI-X en [6]: MSI-X func mask [7]: Reserved |
|||||
7 |
AER Uncorrectable Error Mask |
|||||||
8 |
AER Correctable Error Mask |
|||||||
9 |
AER Uncorrectable Error Severity |
6.1.15. Configuration Extension Bus Interface
Use the Configuration Extension Bus to add capability structures to the IP core’s internal Configuration Spaces. Configuration TLPs with a destination register byte address of 0xC00 and higher route to the Configuration Extension Bus interface. Report the Completion Status Successful Completion (SC) on the Configuration Extension Bus. The IP core then generates a Completion to transmit on the link.
Use the app_err_info[8] signal included in the Transaction Layer Configuration Space Interface to report uncorrectable internal errors.
Signal |
Direction |
Description |
---|---|---|
ceb_req | Output | When asserted, indicates a valid Configuration Extension Bus access cycle. Deasserted when ceb_ack is asserted. |
ceb_ack | Input | Asserted to acknowledged ceb_req. The Application must implement this logic. |
ceb_addr[11:0] |
Output | Address bus to the external register block. The width of the address bus is the value you select for the CX_LBC_EXT_AW parameter. |
ceb_din[31:0] | Input | Read data. |
ceb_cdm_convert_data[31:0] | Input | Acts
as a mask. If the value of a bit is 1, overwrite the value of
the VF register with the value of the corresponding PF register
at that bit position. If the value is 0, do not overwrite the
bit. This signal is available for H-Tile only. |
ceb_dout[31:0] | Output | Data to be written. |
ceb_wr[3:0] | Output |
Indicates the configuration register access type, read or write. For writes, CEB_wr also indicates the byte enables: The following encodings are defined:
Combinations of byte enables, for example,4'b 0101b are also valid. |
ceb_vf_num[10:0] | Output | The VF of the current CEB access. This signal is available for H-Tile only. |
ceb_vf_active | Output | When asserted, indicates a VF is active. This signal is available for H-Tile only. |
ceb_func_num[1:0] | Output | The PF number of the current CEB access. This signal is available for H-Tile only. |
6.1.16. Hard IP Status Interface
Signal |
Direction |
Description |
---|---|---|
derr_cor_ext_rcv |
Output |
When asserted, indicates that the RX buffer detected a 1-bit (correctable) ECC error. This is a pulse stretched output. |
derr_cor_ext_rpl |
Output |
When asserted, indicates that the retry buffer detected a 1-bit (correctable) ECC error. This is a pulse stretched output. |
derr_rpl |
Output |
When asserted, indicates that the retry buffer detected a 2-bit (uncorrectable) ECC error. This is a pulse stretched output. |
derr_uncor_ext_rcv |
Output |
When asserted, indicates that the RX buffer detected a 2-bit (uncorrectable) ECC error. This is a pulse stretched output. |
int_status[10:0](H-Tile) int_status[7:0] (L-Tile) int_status_pf1[7:0] (L-Tile) |
Output |
The int_status[3:0] signals drive legacy interrupts to the application (for H-Tile). The int_status[10:4] signals provide status for other interrupts (for H-Tile). The int_status[3:0] signals drive legacy interrupts to the application for PF0 (for L-Tile). The int_status[7:4] signals provide status for other interrupts for PF0 (for L-Tile). The int_status_pf1[3:0] signals drive legacy interrupts to the application for PF1 (for L-Tile). The int_status_pf1[7:4] signals provide status for other interrupts for PF1 (for L-Tile). The following signals are defined:
|
int_status_common[2:0] |
Output |
Specifies the interrupt status for the following registers. When asserted, indicates that an interrupt is pending:
|
lane_act[4:0] |
Output |
Lane Active Mode: This signal indicates the number of lanes that configured during link training. The following encodings are defined:
|
link_up |
Output |
When asserted, the link is up. |
ltssmstate[5:0] |
Output |
Link Training and Status State Machine (LTSSM) state: The LTSSM state machine encoding defines the following states:
|
rx_par_err |
Output |
Asserted for a single cycle to indicate that a parity error was detected in a TLP at the input of the RX buffer. This error is logged as an uncorrectable internal error in the VSEC registers. For more information, refer to Uncorrectable Internal Error Status Register. If this error occurs, you must reset the Hard IP because parity errors can leave the Hard IP in an unknown state. |
tx_par_err |
Output |
Asserted for a single cycle to indicate a parity error during TX TLP transmission. The IP core transmits TX TLP packets even when a parity error is detected. |
6.1.17. Hard IP Reconfiguration
If the PCIe Link Inspector is enabled, accesses via the Hard IP Reconfiguration interface are not supported. The Link Inspector exclusively uses the Hard IP Reconfiguration interface, and there is no arbitration between the Link Inspector and the Hard IP Reconfiguration interface that is exported to the top level of the IP.
Signal |
Direction |
Description |
---|---|---|
hip_reconfig_clk |
Input |
Reconfiguration clock. The frequency range for this clock is 100–125 MHz. |
hip_reconfig_rst_n |
Input |
Active-low Avalon-MM reset for this interface. |
hip_reconfig_address[20:0] |
Input |
The 21‑bit reconfiguration address. When the Hard IP reconfiguration feature is enabled, the hip_reconfig_address[20:0] bits are programmable. Some bits have the same functions in both H-Tile and L-Tile:
Some bits have different functions in H-Tile versus L-Tile: For H-Tile:
For L-Tile:
|
hip_reconfig_read |
Input |
Read signal. This interface is not pipelined. You must wait for the return of the hip_reconfig_readdata[7:0] from the current read before starting another read operation. |
hip_reconfig_readdata[7:0] |
Output |
8‑bit read data. hip_reconfig_readdata[7:0] is valid on the third cycle after the assertion of hip_reconfig_read. |
hip_reconfig_readdatavalid | Output | When asserted, the data on hip_reconfig_readdata[7:0] is valid. |
hip_reconfig_write |
Input |
Write signal. |
hip_reconfig_writedata[7:0] |
Input |
8‑bit write model. |
hip_reconfig_waitrequest | Output | When asserted, indicates that the IP core is not ready to respond to a request. |
6.1.18. Power Management Interface
Signal |
Direction |
Description |
---|---|---|
pm_linkst_in_l1 |
Output |
When asserted, indicates that the link is in the L1 state. |
pm_linkst_in_l0s |
Output |
When asserted, indicates that the link is in the L0s state. |
pm_state[2:0] |
Output |
Specifies the current power state. |
pm_dstate[2:0] |
Output |
Specifies the power management D-state for PF0. |
apps_pm_xmt_pme |
Input |
Wake Up. The Application Layer asserts this signal for 1 cycle to wake up the Power Management Capability (P MC) state machine from a D1, D2 or D3 power state. Upon wake-up, the core sends a PM_PME Message. This port is functionally identical to outband_pwrup_cmd. You can use this signal or outband_pwrup_cmd to request a return from a low-power state to D0. |
apps_ready_entr_l23 | Input | When asserted, the data on hip_reconfig_readdata[7:0] is valid. |
apps_pm_xmt_turnoff |
Input |
Application Layer request to generate a PM_Turn_Off message. The Application Layer must assert this signal for one clock cycle. The IP core does not return an acknowledgment or grant signal. The Application Layer must not pulse this signal again until the previous message has been transmitted. |
app_init_rst |
Input |
Application Layer request for a hot reset to downstream devices. |
app_xfer_pending | Input | When asserted, prevents the IP core from entering L1 state or initiates exit from the L1 state. |
6.1.19. Serial Data Interface
Signal |
Direction |
Description |
---|---|---|
tx_out[<n-1>:0] |
Output |
Transmit serial data output. |
rx_in[<n-1>:0] |
Input |
Receive serial data input. |
6.1.20. PIPE Interface
Signal |
Direction |
Description |
---|---|---|
txdata[31:0] | Output |
Transmit data. |
txdatak[3:0] | Output | Transmit data control character indication. |
txcompl | Output | Transmit compliance. This signal drives the TX compliance pattern. It forces the running disparity to negative in Compliance Mode (negative COM character). |
txelecidle | Output | Transmit electrical idle. This signal forces the tx_out<n> outputs to electrical idle. |
txdetectrx | Output | Transmit detect receive. This signal tells the PHY layer to start a receive detection operation or to begin loopback. |
powerdown[1:0] | Output | Power down. This signal requests the PHY to change the power state to the specified state (P0, P0s, P1, or P2). |
txmargin[2:0] | Output | Transmit VOD margin selection. The value for this signal is based on the value from the Link Control 2 Register. |
txdeemp | Output | Transmit de-emphasis selection. The Intel L-/H-Tile Avalon-ST for PCI Express IP sets the value for this signal based on the indication received from the other end of the link during the Training Sequences (TS). You do not need to change this value. |
txswing | Output | When asserted, indicates full swing for the transmitter voltage. When deasserted indicates half swing. |
txsynchd[1:0] | Output |
For Gen3 operation, specifies the receive block type. The following encodings are defined:
|
txblkst[3:0] | Output | For Gen3 operation, indicates the start of a block in the transmit direction. pipe spec |
txdataskip | Output |
For Gen3 operation. Allows the MAC to instruct the TX interface to ignore the TX data interface for one clock cycle. The following encodings are defined:
|
rate[1:0] | Output |
The 2‑bit encodings have the following meanings:
|
rxpolarity | Output |
Receive polarity. This signal instructs the PHY layer to invert the polarity of the 8B/10B receiver decoding block. |
currentrxpreset[2:0] | Output | For Gen3 designs, specifies the current preset. |
currentcoeff[17:0] | Output |
For Gen3, specifies the coefficients to be used by the transmitter. The 18 bits specify the following coefficients:
|
rxeqeval |
Output | For Gen3, the PHY asserts this signal when it begins evaluation of the transmitter equalization settings. The PHY asserts Phystatus when it completes the evaluation. The PHY deasserts rxeqeval to abort evaluation. |
rxeqinprogress |
Output | For Gen3, the PHY asserts this signal when it begins link training. The PHY latches the initial coefficients from the link partner. |
invalidreq |
Output | For Gen3, indicates that the Link Evaluation feedback requested a TX equalization setting that is out-of-range. The PHY asserts this signal continually until the next time it asserts rxeqeval. |
rxdata[31:0] | Input | Receive data control. Bit 0 corresponds to the lowest-order byte of rxdata, and so on. A value of 0 indicates a data byte. A value of 1 indicates a control byte. For Gen1 and Gen2 only. |
rxdatak[3:0] | Input | Receive data control. This bus receives data on lane. Bit 0 corresponds to the lowest-order byte of rxdata, and so on. A value of 0 indicates a data byte. A value of 1 indicates a control byte. For Gen1 and Gen2 only. |
phystatus | Input | PHY status. This signal communicates completion of several PHY requests. pipe spec |
rxvalid | Input | Receive valid. This signal indicates symbol lock and valid data on rxdata and rxdatak. |
rxstatus[2:0] | Input | Receive status. This signal encodes receive status, including error codes for the receive data stream and receiver detection. |
rxelecidle | Input | Receive electrical idle. When asserted, indicates detection of an electrical idle. pipe spec |
rxsynchd[3:0] | Input |
For Gen3 operation, specifies the receive block type. The following encodings are defined:
|
rxblkst[3:0] | Input | For Gen3 operation, indicates the start of a block in the receive direction. |
rxdataskip | Input |
For Gen3 operation. Allows the PCS to instruct the RX interface to ignore the RX data interface for one clock cycle. The following encodings are defined:
|
dirfeedback[5:0] |
Input | For Gen3, provides a Figure of Merit for link evaluation for H tile
transceivers. The feedback applies to the following coefficients:
The following feedback encodings are defined:
|
simu_mode_pipe | Input | When set to 1, the PIPE interface is in simulation mode. |
sim_pipe_pclk_in | Input |
This clock is used for PIPE simulation only, and is derived from the refclk. It is the PIPE interface clock used for PIPE mode simulation. |
sim_pipe_rate[1:0] | Output |
The 2-bit encodings have the following meanings:
|
sim_ltssmstate[5:0] | Output |
LTSSM state: The following encodings are defined:
|
sim_pipe_mask_tx_pll_lock |
Input |
Should be active during rate change. This signal Is used to mask the PLL lock signals. This interface is used only for PIPE simulations. In serial simulations, The Endpoint PHY drives this signal. For PIPE simulations, in the Intel testbench, The PIPE BFM drives this signal.
|
6.1.21. Test Interface
Signal |
Direction |
Description |
---|---|---|
test_in[66:0] | Input |
This is a multiplexer to select the test_out[255:0] and aux_test_out[6:0] buses. Driven from channels 8-15. The following encodings are defined:
|
test_out[255:0] | Output |
test_out[255:0] routes to channels 8-15. Includes diagnostic signals from core, adaptor, clock, configuration block, equalization control, miscellaneous, reset, and pipe_adaptor modules. Available only for x16 variants. |
6.1.22. PLL IP Reconfiguration
To ensure proper system operation, reset or repeat device enumeration of the PCIe* link after changing the value of read-only PLL registers.
These signals are present when you turn on Enable Transceiver dynamic reconfiguration on the Configuration, Debug and Extension Options tab using the parameter editor.
Signal |
Direction |
Description |
---|---|---|
xcvr_reconfig_clk |
Input |
Reconfiguration clock. The frequency range for this clock is 100–125 MHz. |
xcvr_reconfig_rst_n |
Input |
Active-low Avalon® -MM reset for this interface. |
xcvr_reconfig_address[14:0] |
Input |
The 11‑bit reconfiguration address. |
xcvr_reconfig_read |
Input |
Read signal. This interface is not pipelined. You must wait for the return of the xcvr_reconfig_readdata[31:0] from the current read before starting another read operation. |
xcvr_reconfig_readdata[31:0] |
Output |
32‑bit read data. xcvr_reconfig_readdata[31:0] is valid on the third cycle after the assertion of xcvr_reconfig_read. |
xcvr_reconfig_write |
Input |
Write signal. |
xcvr_reconfig_writedata[31:0] |
Input |
32‑bit write data. |
xcvr_reconfig_waitrequest | Output | When asserted, indicates that the IP core is not ready to respond to a request. |
6.1.23. Message Handling
6.1.23.1. Endpoint Received Messages
Message Type |
Message |
Message Processing |
---|---|---|
Power Management | PME_Turn_Off | Forwarded to the Application Layer on Avalon® -ST RX interface. Also processed by the IP core. |
Slot Power Limit | Set_Slot_Power_Limit | Forwarded to the Application Layer on Avalon® -ST RX interface. |
Vendor Defined with or without Data | Vendor_Type0 |
Forwarded to the Application Layer on Avalon® -ST RX interface. Not processed by core. You can program the IP core to drop these Messages using the virtual_drop_vendor0_msg. When dropped, Vendor0 Messages are logged as Unsupported Requests (UR). |
ATS | ATS_Invalidate | Forwarded to the Application Layer on Avalon® -ST RX interface. Not processed by the IP core. |
Locked Transaction | Unlock Message | Forwarded to the Application Layer on Avalon® -ST RX interface. Not processed by the IP core. |
All Others | --- | Internally dropped by Endpoint and handled as an Unsupported Request. |
6.1.23.2. Endpoint Transmitted Messages
Message Type |
Message |
Message Processing |
---|---|---|
Power Management | PM_PME, PME_TO_Ack |
The Application Layer transmits the PM_PME request via the app_pm_xmt_pme input. The Application Layer must generate the PME_TO_Ack and transmit it on the Avalon®-ST TX interface. The Application Layer transmits the app_ready_entr_l23 to indicate that it is ready to enter the L23 state. |
Vendor Defined with or without Data | Vendor_Type0 | The Application Layer must generate and transmit this on the Avalon®-ST TX interface. |
ATS | ATS_Request, ATS_Invalidate Completion | The Application Layer must generate and transmit these Messages on the Avalon®-ST TX interface. |
INT | INTx_Assert, INTx_Deassert | The Application Layer transmits the INTx_Assert and INTx_Deassert Messages using the app_int interface. |
ERR | ERR_COR, ERR_NONFATAL, ERR_FATAL | The IP core transmits these error Messages autonomously when it detects internal errors. It also receives and forwards these errors when received from the Application Layer via the app_err_* interface. |
6.2. Errors reported by the Application Layer
The Application Layer reports the following types of errors to the IP core:
- Unexpected Completion
- Completer Abort
- CPL Timeout
Note: The IP core does not contain the completion timeout checking logic. You need to implement this functionality in your application logic.
- Unsupported Request
- Poisoned TLP received
- Uncorrected Internal Error, including ECC and parity errors flagged by the core
- Corrected Internal Error, including Corrected ECC errors flagged by the core
- Advisory NonFatal Error
For Advanced Error Reporting (AER), the Application Layer provide the information to log the TLP header and the error log request via the app_err_* interface.
The Application Layer completes the following steps to report an error to the IP core:
- Sets the corresponding status bits in the PCI Status register, and the PCIe Device Status register
- Sets the appropriate status bits and header log in the AER registers if AER is enabled
- Indicates the Error event to the upstream component:
- Endpoints transmit an Message upstream
- Root Ports assert app_serr_out to the Application Layer if an error is detected or if an error Message is received from a downstream component. The Root Port also forwards the error Message from the downstream component on the Avalon® -ST RX interface. The Application Layer may choose to ignore this information. (Root Ports are not supported in the Quartus® Prime Pro – Stratix 10 Edition 17.1 Interim Release.)
6.2.1. Error Handling
The IP Core completes the following actions when it detects an error in a received TLP:
- Discards the TLP.
- Generates a Completion (for non-posted requests) with the Completion status set to CA or UR.
- Sets the corresponding status bits in the PCI Status register and the PCIe Device Status register.
- Sets the corresponding status bits and header log in the AER registers if AER is enabled.
- Indicates the Error event to the upstream component.
- For Endpoints, the IP core sends an error Message upstream.
- For Root Ports, the IP core asserts app_serr_out asserts to the
Application Layer) when it detects an error or receives an error Message
from a downstream component. Note: The error Message from the downstream component is also forwarded on the Avalon® -ST RX interface. The Application Layer may choose to ignore this information.
6.3. Power Management
Software programs the Device into a D-state by writing to the Power Management Control and Status register in the PCI Power Management Capability Structure. The pm_* interface transmits the D-state to the Application Layer.
The Intel L-/H-Tile Avalon-ST for PCI Express IP and the Intel L-/H-Tile Avalon-MM for PCI Express IP do not support the L1 or L2 low power states. If the link ever gets into these states, performing a reset (by asserting pin_perst, for example) will allow the IP core to exit the low power state and the system to recover.
These IP cores also do not support the in-band beacon or sideband WAKE# signal, which are mechanisms to signal a wake-up event to the upstream device.
6.3.1. Endpoint D3 Entry
All transmission on the Avalon® -ST TX and RX interfaces must have completed before IP core can begin the L1 request (Enter_L1 DLLP). In addition, the RX Buffer must be empty and the Application Layer app_xfer_pending output must be deasserted.
- Software writes the Power Management Control register to put the IP core to the D3hot state.
- The Endpoint stops transmitting requests when it has been taken out of D0.
- The link transitions to L1.
- Software sends the PME_Turn_Off Message to the Endpoint to initiate power down. The Root Port transitions the link back to L0, and Endpoint receives the Message on the Avalon® -ST RX interface.
- The End Point transmits a PME_TO_Ack Message to acknowledge the Turn Off request.
- When ready for power removal, (D3cold), the End Point asserts app_ready_entr_l23. The core sends the PM_Enter_L23 DLLP and initiates the Link transition to L3.
6.3.2. End Point D3 Exit
6.3.3. Exit from D3 hot
6.3.4. Exit from D3 cold
-
To issue a PM_EVENT Message from the D3 cold state, the device must first issue
a wakeup event (WAKE#) to request reapplication of power and the clock.
The wakeup event triggers a fundamental reset which reinitializes the link to L0.
-
The Application Layer requests a wake-up event by asserting
apps_pm_xmt_pme.
Asserting apps_pm_xmt_pme causes the IP core to transmit a PM_EVENT Message. In addition, the IP core sets the PME_status bit in the Power Management Control and Status register to notify software that it has requested the wakeup.
The PCIe Link states are indicated on the pm_* interface. The LTSSM state is indicated on the ltssm_state output.
6.3.5. Active State Power Management
6.4. Transaction Ordering
6.4.1. TX TLP Ordering
- Avalon®
- MSI and MSI-X interrupt
- Internal Configuration Space TLPs
6.4.2. RX TLP Ordering
The IP core implements relaxed ordering as described in the PCI Express Base Specification Revision 3.0. It does not perform ID-Based Ordering (IDO). The Application Layer can implement IDO reordering. It is possible for two different TLP types pending in the RX buffer to have equal priority. When this situation occurs, the IP core uses a fairness-based arbitration scheme to determine which TLP to forward to the Application Layer.
6.5. RX Buffer
- The IP core can rate match the PCIe link to the Application Layer.
- The IP core can store TLPs until error checking is complete.
RX Buffer Segment | Number of Credits | Buffer Size |
---|---|---|
Posted |
Posted headers: 127 credits Posted data: 750 credits |
~14 KB |
Non-posted | Non-posted headers credits: 115 credits Non-posted data credits: 230 credits |
~5.5 KB |
Completions | Completion headers: 770 credits Completion data: 2500 credits |
~50 KB |
The RX buffer operates only in the Store and Forward Queue Mode. Bypass and Cut-through modes are not supported.
Flow control credit checking for the posted and non-posted buffer segments prevents RX buffer overflow. The PCI Express Base Specification Revision 3.0 requires the IP core to advertise infinite Completion credits. The Application Layer must manage the Read Requests so as not to overflow the Completion buffer segment.
6.5.1. Retry Buffer
Retry buffer resources are only freed upon reception of an ACK DLLP.
6.5.2. Configuration Retry Status
7. Interrupts
7.1. Interrupts for Endpoints
The Intel L-/H-Tile Avalon-ST for PCI Express IP provides support for PCI Express MSI, MSI-X, and legacy interrupts when configured in Endpoint mode. The MSI and legacy interrupts are mutually exclusive. After power up, the Hard IP block starts in legacy interrupt mode. Then, software decides whether to switch to MSI or MSI-X mode. To switch to MSI mode, software programs the msi_enable bit of the MSI Message Control Register to 1, (bit[16] of 0x050). You enable MSI-X mode, by turning on Implement MSI-X under the PCI Express/PCI Capabilities tab using the parameter editor. If you turn on the Implement MSI-X option, you should implement the MSI-X table structures at the memory space pointed to by the BARs.
Refer to section 6.1 of PCI Express Base Specification for a general description of PCI Express interrupt support for Endpoints.
7.1.1. MSI and Legacy Interrupts
- The MSI Capability registers
- The traffic class (app_msi_tc)
- The message data specified by app_msi_num
The following figure illustrates a possible implementation of the Interrupt Handler Module with a per vector enable bit. Alternatively, the Application Layer could implement a global interrupt enable instead of this per vector MSI.
There are 32 possible MSI messages. The number of messages requested by a particular component does not necessarily correspond to the number of messages allocated. For example, in the following figure, the Endpoint requests eight MSIs but is only allocated two. In this case, you must design the Application Layer to use only two allocated messages.
The following table describes three example implementations. The first example allocates all 32 MSI messages. The second and third examples only allocate 4 interrupts.
MSI |
Allocated |
||
---|---|---|---|
32 |
4 |
4 |
|
System Error |
31 |
3 |
3 |
Hot Plug and Power Management Event |
30 |
2 |
3 |
Application Layer |
29:0 |
1:0 |
2:0 |
MSI interrupts generated for Hot Plug, Power Management Events, and System Errors always use Traffic Class 0. MSI interrupts generated by the Application Layer can use any Traffic Class. For example, a DMA that generates an MSI at the end of a transmission can use the same traffic control as was used to transfer data.
The following figure illustrates the interactions among MSI interrupt signals for the Root Port. The minimum latency possible between app_msi_req and app_msi_ack is one clock cycle. In this timing diagram app_msi_req can extend beyond app_msi_ack before deasserting. In other words, the earliest that app_msi_req can deassert is on the rising edge of clock cycle 5 (one cycle after app_msi_ack is asserted) as shown, but it can deassert in later clock cycles as well.
7.1.2. MSI-X
You can enable MSI-X interrupts by turning on Implement MSI-X under the PCI Express/PCI Capabilities heading using the parameter editor. If you turn on the Implement MSI-X option, you should implement the MSI-X table structures at the memory space pointed to by the BARs as part of your Application Layer.
The Application Layer transmits MSI-X interrupts on the Avalon®-ST TX interface. MSI-X interrupts are single dword Memory Write TLPs. Consequently, the Last DW Byte Enable in the TLP header must be set to 4b’0000. MSI-X TLPs should be sent only when enabled by the MSI-X enable and the function mask bits in the Message Control for the MSI-X Configuration register. These bits are available on the tl_cfg_ctl output bus.
7.1.3. Implementing MSI-X Interrupts
-
Host software sets up the MSI-X interrupts in the Application
Layer by completing the following steps:
-
Host software reads the Message
Control register at 0x050 register to determine the MSI-X
Table size. The number of table entries is the <value read> + 1.
The maximum table size is 2048 entries. Each 16-byte entry is divided in 4 fields as shown in the figure below. For multi-function variants, BAR4 accesses the MSI-X table. For all other variants, any BAR can access the MSI-X table. The base address of the MSI-X table must be aligned to a 4 KB boundary.
-
The host sets up the MSI-X table. It programs MSI-X
address, data, and masks bits for each entry as shown in the figure
below.
Figure 57. Format of MSI-X Table
-
The host calculates the address of the <n
th
> entry using the following formula:
nth_address = base address[BAR] + 16<n>
-
Host software reads the Message
Control register at 0x050 register to determine the MSI-X
Table size. The number of table entries is the <value read> + 1.
- When Application Layer has an interrupt, it drives an interrupt request to the IRQ Source module.
-
The IRQ Source sets appropriate bit in the MSI-X PBA table.
The PBA can use qword or dword accesses. For qword accesses, the IRQ Source calculates the address of the <m th > bit using the following formulas:
qword address = <PBA base addr> + 8(floor(<m>/64)) qword bit = <m> mod 64
Figure 58. MSI-X PBA Table -
The IRQ Processor reads the entry in the MSI-X table.
- If the interrupt is masked by the Vector_Control field of the MSI-X table, the interrupt remains in the pending state.
- If the interrupt is not masked, IRQ Processor sends Memory Write Request to the TX slave interface. It uses the address and data from the MSI-X table. If Message Upper Address = 0, the IRQ Processor creates a three-dword header. If the Message Upper Address > 0, it creates a 4-dword header.
- The host interrupt service routine detects the TLP as an interrupt and services it.
7.1.4. Legacy Interrupts
Legacy interrupts mimic the original PCI level-sensitive interrupts using virtual wire messages. The Intel® Stratix® 10 signals legacy interrupts on the PCIe link using Message TLPs. The term, INTx, refers collectively to the four legacy interrupts, INTA#, INTB#, INTC# and INTD#. The Intel® Stratix® 10 asserts app_int_sts to cause an Assert_INTx Message TLP to be generated and sent upstream. Deassertion of app_int_sts causes a Deassert_INTx Message TLP to be generated and sent upstream. To use legacy interrupts, you must clear the Interrupt Disable bit, which is bit 10 of the Command register. Then, turn off the MSI Enable bit.
The following figures illustrates interrupt timing for the legacy interface. The legacy interrupt handler asserts app_int_sts to instruct the Intel L-/H-Tile Avalon-ST for PCI Express IP to send a Assert_INTx message TLP.
The following figure illustrates the timing for deassertion of legacy interrupts. The legacy interrupt handler asserts app_int_sts causing the Intel L-/H-Tile Avalon-ST for PCI Express IP to send a Deassert_INTx message.
8. Registers
8.1. Configuration Space Registers
Byte Address |
Configuration Space Register |
Corresponding Section in PCIe Specification |
---|---|---|
0x000-0x03C |
PCI Header Type 0 Configuration Registers |
Type 0 Configuration Space Header |
0x040-0x04C |
Power Management |
PCI Power Management Capability Structure |
0x050-0x05C |
MSI Capability Structure |
MSI Capability Structure, see also and PCI Local Bus Specification |
0x060-0x06C | Reserved | N/A |
0x070-0x0A8 |
PCI Express Capability Structure |
PCI Express Capability Structure |
0x0B0-0x0B8 |
MSI-X Capability Structure |
MSI-X Capability Structure, see also and PCI Local Bus Specification |
0x0BC-0x0FC | Reserved | N/A |
0x100-0x134 | Advanced Error Reporting (AER) (for PFs only) | Advanced Error Reporting Capability |
0x138-0x174 | Virtual Channel Capability Structure (Reserved) | Virtual Channel Capability |
0x178-0x17C |
Alternative Routing-ID Implementation (ARI). Always on for SR-IOV |
ARI Capability |
0x188-0x1B0 |
Secondary PCI Express Extended Capability Header |
PCI Express Extended Capability |
0x1B4 | Reserved | N/A |
0x1B8-0x1F4 | SR-IOV Capability Structure | SR-IOV Extended Capability Header in Single Root I/O Virtualization and Sharing Specification, Rev, 1.1 |
0x1F8-0x1D0 | Transaction Processing Hints (TPH) Requester Capability | TLP Processing Hints (TPH) |
0x1D4-0x280 | Reserved | N/A |
0x284-0x288 | Address Translation Services (ATS) Capability Structure | Address Translation Services Extended Capability (ATS) in Single Root I/O Virtualization and Sharing Specification, Rev. 1.1 |
0xB80-0xBFC |
Intel-Specific |
Vendor-Specific Header (Header only) |
0xC00 |
Optional Custom Extensions |
N/A |
0xC00 | Optional Custom Extensions | N/A |
Byte Address |
Hard IP Configuration Space Register |
Corresponding Section in PCIe Specification |
---|---|---|
0x000 |
Device ID, Vendor ID |
Type 0 Configuration Space Header |
0x004 |
Status, Command |
Type 0 Configuration Space Header |
0x008 |
Class Code, Revision ID |
Type 0 Configuration Space Header |
0x00C |
Header Type, Cache Line Size |
Type 0 Configuration Space Header |
0x010 |
Base Address 0 |
Base Address Registers |
0x014 |
Base Address 1 |
Base Address Registers |
0x018 |
Base Address 2 |
Base Address Registers |
0x01C |
Base Address 3 |
Base Address Registers |
0x020 |
Base Address 4 |
Base Address Registers |
0x024 |
Base Address 5 |
Base Address Registers |
0x028 |
Reserved |
N/A |
0x02C |
Subsystem ID, Subsystem Vendor ID |
Type 0 Configuration Space Header |
0x030 |
Reserved |
N/A |
0x034 |
Capabilities Pointer |
Type 0 Configuration Space Header |
0x038 |
Reserved |
N/A |
0x03C |
Interrupt Pin, Interrupt Line |
Type 0 Configuration Space Header |
0x040 | PME_Support, D1, D2, etc. | PCI Power Management Capability Structure |
0x044 | PME_en, PME_Status, etc. | Power Management Status and Control Register |
0x050 |
MSI-Message Control, Next Cap Ptr, Capability ID |
MSI and MSI-X Capability Structures |
0x054 |
Message Address |
MSI and MSI-X Capability Structures |
0x058 |
Message Upper Address |
MSI and MSI-X Capability Structures |
0x05C |
Reserved Message Data |
MSI and MSI-X Capability Structures |
0x0B0 |
MSI-X Message Control Next Cap Ptr Capability ID |
MSI and MSI-X Capability Structures |
0x0B4 |
MSI-X Table Offset BIR |
MSI and MSI-X Capability Structures |
0x0B8 |
Pending Bit Array (PBA) Offset BIR |
MSI and MSI-X Capability Structures |
0x100 |
PCI Express Enhanced Capability Header |
Advanced Error Reporting Enhanced Capability Header |
0x104 |
Uncorrectable Error Status Register |
Uncorrectable Error Status Register |
0x108 |
Uncorrectable Error Mask Register |
Uncorrectable Error Mask Register |
0x10C |
Uncorrectable Error Mask Register |
Uncorrectable Error Severity Register |
0x110 |
Correctable Error Status Register |
Correctable Error Status Register |
0x114 |
Correctable Error Mask Register |
Correctable Error Mask Register |
0x118 |
Advanced Error Capabilities and Control Register |
Advanced Error Capabilities and Control Register |
0x11C |
Header Log Register |
Header Log Register |
0x12C |
Root Error Command |
Root Error Command Register |
0x130 |
Root Error Status |
Root Error Status Register |
0x134 |
Error Source Identification Register Correctable Error Source ID Register |
Error Source Identification Register |
0x188 |
Next Capability Offset, PCI Express Extended Capability ID |
Secondary PCI Express Extended Capability |
0x18C |
Enable SKP OS, Link Equalization Req, Perform Equalization |
Link Control 3 Register |
0x190 |
Lane Error Status Register |
Lane Error Status Register |
0x194:0x1B0 |
Lane Equalization Control Register |
Lane Equalization Control Register |
0xB80 | VSEC Capability Header | Vendor-Specific Extended Capability Header |
0xB84 | VSEC Length, Revision, ID | Vendor-Specific Header |
0xB88 | Intel Marker | Intel-Specific Registers |
0xB8C | JTAG Silicon ID DW0 | |
0xB90 | JTAG Silicon ID DW1 | |
0xB94 | JTAG Silicon ID DW2 | |
0xB98 | JTAG Silicon ID DW3 | |
0xB9C | User Device and Board Type ID | |
0xBA0:0xBAC | Reserved | |
0xBB0 | General Purpose Control and Status Register | |
0xBB4 | Uncorrectable Internal Error Status Register | |
0xBB8 | Uncorrectable Internal Error Mask Register | |
0xBBC | Correctable Error Status Register | |
0xBC0 | Correctable Error Mask Register | |
0xBC4:BD8 | Reserved | N/A |
0xC00 | Optional Custom Extensions | N/A |
8.1.1. Register Access Definitions
Abbreviation | Meaning |
---|---|
RW | Read and write access |
RO | Read only |
WO | Write only |
RW1C | Read write 1 to clear |
RW1CS | Read write 1 to clear sticky |
RWS | Read write sticky |
8.1.2. PCI Configuration Header Registers
The Correspondence between Configuration Space Registers and the PCIe Specification lists the appropriate section of the PCI Express Base Specification that describes these registers.
8.1.3. PCI Express Capability Structures
8.1.4. Intel Defined VSEC Capability Header
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:20] | Next Capability Pointer: Value is the starting address of the next Capability Structure implemented. Otherwise, NULL. | Variable | RO |
[19:16] |
Version. PCIe specification defined value for VSEC version. | 1 |
RO |
[15:0] |
PCI Express Extended Capability ID. PCIe specification defined value for VSEC Capability ID. | 0x000B |
RO |
8.1.4.1. Intel Defined Vendor Specific Header
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:20] | VSEC Length. Total length of this structure in bytes. | 0x5C | RO |
[19:16] |
VSEC. User configurable VSEC revision. | Not available |
RO |
[15:0] |
VSEC ID. User configurable VSEC ID. You should change this ID to your Vendor ID. | 0x1172 |
RO |
8.1.4.2. Intel Marker
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:0] |
Intel Marker - An additional marker for standard Intel programming software to be able to verify that this is the right structure. |
0x41721172 | RO |
8.1.4.3. JTAG Silicon ID
Bits |
Register Description |
Default Value 4 |
Access |
---|---|---|---|
[31:0] | JTAG Silicon ID DW3 | Unique ID | RO |
[31:0] | JTAG Silicon ID DW2 | Unique ID |
RO |
[31:0] |
JTAG Silicon ID DW1 | Unique ID |
RO |
[31:0] | JTAG Silicon ID DW0 | Unique ID | RO |
8.1.4.4. User Configurable Device and Board ID
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
Allows you to specify ID of the .sof file to be loaded. |
From configuration bits | RO |
8.1.5. General Purpose Control and Status Register
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:16] | Reserved. | N/A | RO |
[15:8] |
General Purpose Status. The Application Layer can read status bits. | 0 |
RO |
[7:0] |
General Purpose Control. The Application Layer can write control bits. | 0 |
RW |
8.1.6. Uncorrectable Internal Error Status Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:13] |
Reserved. |
0 |
RO |
[12] | Debug bus interface (DBI) access error status. | 0 | RW1CS |
[11] |
ECC error from Config RAM block. |
0 |
RW1CS |
[10] |
Uncorrectable ECC error status for Retry Buffer. |
0 |
RO |
[9] |
Uncorrectable ECC error status for Retry Start of the TLP RAM. |
0 |
RW1CS |
[8] |
RX Transaction Layer parity error reported by the IP core. |
0 |
RW1CS |
[7] |
TX Transaction Layer parity error reported by the IP core. |
0 |
RW1CS |
[6] |
Internal error reported by the FPGA. |
0 |
RW1CS |
[5:4] |
Reserved. | 0 |
RW1CS |
[3] |
Uncorrectable ECC error status for RX Buffer Header #2 RAM. |
0 |
RW1CS |
[2] |
Uncorrectable ECC error status for RX Buffer Header #1 RAM. |
0 |
RW1CS |
[1] |
Uncorrectable ECC error status for RX Buffer Data RAM #2. |
0 |
RW1CS |
[0] |
Uncorrectable ECC error status for RX Buffer Data RAM #1. |
0 |
RW1CS |
8.1.7. Uncorrectable Internal Error Mask Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:13] |
Reserved. |
1b’0 |
RO |
[12] | Mask for Debug Bus Interface. | 1b'1 | RO |
[11] |
Mask for ECC error from Config RAM block. |
1b’1 |
RWS |
[10] |
Mask for Uncorrectable ECC error status for Retry Buffer. |
1b’1 |
RO |
[9] |
Mask for Uncorrectable ECC error status for Retry Start of TLP RAM. |
1b’1 |
RWS |
[8] |
Mask for RX Transaction Layer parity error reported by IP core. |
1b’1 |
RWS |
[7] |
Mask for TX Transaction Layer parity error reported by IP core. |
1b’1 |
RWS |
[6] |
Mask for Uncorrectable Internal error reported by the FPGA. |
1b’1 |
RO |
[5] |
Reserved. |
1b’0 |
RWS |
[4] |
Reserved. |
1b’1 |
RWS |
[3] |
Mask for Uncorrectable ECC error status for RX Buffer Header #2 RAM. |
1b’1 |
RWS |
[2] |
Mask for Uncorrectable ECC error status for RX Buffer Header #1 RAM. |
1b’1 |
RWS |
[1] |
Mask for Uncorrectable ECC error status for RX Buffer Data RAM #2. |
1b’1 |
RWS |
[0] |
Mask for Uncorrectable ECC error status for RX Buffer Data RAM #1. |
1b’1 |
RWS |
8.1.8. Correctable Internal Error Status Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:12] |
Reserved. |
0 |
RO |
[11] | Correctable ECC error status for Config RAM. | 0 | RW1CS |
[10] | Correctable ECC error status for Retry Buffer. | 0 | RW1CS |
[9] | Correctable ECC error status for Retry Start of TLP RAM. | 0 | RW1CS |
[8] | Reserved. | 0 | RO |
[7] | Reserved. | 0 | RO |
[6] | Internal Error reported by FPGA. | 0 | RW1CS |
[5] |
Reserved |
0 |
RO |
[4] |
PHY Gen3 SKP Error occurred. Gen3 data pattern contains SKP pattern (8'b10101010) is misinterpreted as a SKP OS and causing erroneous block realignment in the PHY. |
0 |
RW1CS |
[3] | Correctable ECC error status for RX Buffer Header RAM #2. | 0 |
RW1CS |
[2] | Correctable ECC error status for RX Buffer Header RAM #1. | 0 |
RW1CS |
[1] |
Correctable ECC error status for RX Buffer Data RAM #2. |
0 |
RW1CS |
[0] |
Correctable ECC error status for RX Buffer Data RAM #1. |
0 |
RW1CS |
8.1.9. Correctable Internal Error Mask Register
Bits |
Register Description |
Reset Value |
Access |
---|---|---|---|
[31:12] |
Reserved. |
0 |
RO |
[11] | Mask for correctable ECC error status for Config RAM. | 0 | RWS |
[10] | Mask for correctable ECC error status for Retry Buffer. | 1 | RWS |
[9] | Mask for correctable ECC error status for Retry Start of TLP RAM. | 1 | RWS |
[8] | Reserved. | 0 | RO |
[7] | Reserved. | 0 | RO |
[6] | Mask for internal Error reported by FPGA. | 0 | RWS |
[5] |
Reserved |
0 |
RO |
[4] |
Mask for PHY Gen3 SKP Error. |
1 |
RWS |
[3] | Mask for correctable ECC error status for RX Buffer Header RAM #2. | 1 |
RWS |
[2] | Mask for correctable ECC error status for RX Buffer Header RAM #1. | 1 |
RWS |
[1] |
Mask for correctable ECC error status for RX Buffer Data RAM #. |
1 |
RWS |
[0] |
Mask for correctable ECC error status for RX Buffer Data RAM #1. |
1 |
RWS |
8.1.10. SR-IOV Virtualization Extended Capabilities Registers Address Map
Byte Address Offset |
Name |
Description |
---|---|---|
Alternative RID (ARI) Capability Structure |
||
0x178 | ARI Enhanced Capability Header | PCI Express Extended Capability ID for ARI and next capability pointer. |
0x017C | ARI Capability Register, ARI Control Register | The lower 16 bits implement the ARI Capability Register and the upper 16 bits implement the ARI Control Register. |
Single-Root I/O Virtualization (SR-IOV) Capability Structure |
||
0x1B8 |
SR-IOV Extended Capability Header |
PCI Express Extended Capability ID for SR-IOV and next capability pointer. |
0x1BC |
SR-IOV Capabilities Register |
Lists supported capabilities of the SR-IOV implementation. |
0x1C0 |
SR-IOV Control and Status Registers |
The lower 16 bits implement the SR-IOV Control Register. The upper 16 bits implement the SR-IOV Status Register. |
0x1C4 |
InitialVFs/TotalVFs |
The lower 16 bits specify the initial number of VFs attached to PF0. The upper 16 bits specify the total number of PFs available for attaching to PF0. |
0x1C8 |
Function Dependency Link, NumVFs |
The Function Dependency field describes dependencies between Physical Functions. The NumVFs field contains the number of VFs currently configured for use. |
0x1CC |
VF Offset/Stride |
Specifies the offset and stride values used to assign routing IDs to the VFs. |
0x1D0 |
VF Device ID |
Specifies VF Device ID assigned to the device. |
0x1D4 |
Supported Page Sizes |
Specifies all page sizes supported by the device. |
0x1D8 |
System Page Size |
Stores the page size currently selected. |
0x1DC |
VF BAR 0 |
VF Base Address Register 0. Can be used independently as a 32-bit BAR, or combined with VF BAR 1 to form a 64-bit BAR. |
0x1E0 |
VF BAR 1 |
VF Base Address Register 1. Can be used independently as a 32-bit BAR, or combined with VF BAR 0 to form a 64-bit BAR. |
0x1E4 |
VF BAR 2 |
VF Base Address Register 2. Can be used independently as a 32-bit BAR, or combined with VF BAR 3 to form a 64-bit BAR. |
0x1E8 |
VF BAR 3 |
VF Base Address Register 3. Can be used independently as a 32-bit BAR, or combined with VF BAR 2 to form a 64-bit BAR. |
0x1EC |
VF BAR 4 |
VF Base Address Register 4. Can be used independently as a 32-bit BAR, or combined with VF BAR 5 to form a 64-bit BAR. |
0x1F0 |
VF BAR 5 |
VF Base Address Register 5. Can be used independently as a 32-bit BAR, or combined with VF BAR 4 to form a 64-bit BAR. |
0x1F4 |
VF Migration State Array Offset |
Not implemented. |
Secondary PCI Express Extended Capability Structure (Gen3, PF 0 only) |
||
0x280 |
Secondary PCI Express Extended Capability Header |
PCI Express Extended Capability ID for Secondary PCI Express Capability, and next capability pointer. |
0x284 |
Link Control 3 Register |
Not implemented. |
0x288 |
Lane Error Status Register |
Per-lane error status bits. |
0x28C |
Lane Equalization Control Register 0 |
Transmitter Preset and Receiver Preset Hint values for Lanes 0 and 1 of remote device. These values are captured during Link Equalization. |
0x290 |
Lane Equalization Control Register 1 |
Transmitter Preset and Receiver Preset Hint values for Lanes 2 and 3 of remote device. These values are captured during Link Equalization. |
0x294 |
Lane Equalization Control Register 2 |
Transmitter Preset and Receiver Preset Hint values for Lanes 4 and 5 of remote device. These values are captured during Link Equalization. |
0x298 |
Lane Equalization Control Register 3 |
Transmitter Preset and Receiver Preset Hint values for Lanes 6 and 7 of remote device. These values are captured during Link Equalization. |
Transaction Processing Hints (TPH) Requester Capability Structure |
||
0x1F8 |
TPH Requester Extended Capability Header |
PCI Express Extended Capability ID for TPH Requester Capability, and next capability pointer. |
0x1FC |
TPH Requester Capability Register |
PCI Express Extended Capability ID for TPH Requester Capability, and next capability pointer. This register contains the advertised parameters for the TPH Requester Capability. |
0x1D0 |
TPH Requester Control Register |
This register contains enable and mode select bits for the TPH Requester Capability. |
Address Translation Services (ATS) Capability Structure |
||
0x284 |
ATS Extended Capability Header |
PCI Express Extended Capability ID for ATS Capability, and next capability pointer. |
0x288 |
ATS Capability Register and ATS Control Register |
This location contains the 16-bit ATS Capability Register and the 16-bit ATS Control Register. |
8.1.10.1. ARI Enhanced Capability Header
Bits |
Register Description Default Value |
Default Value |
Access |
---|---|---|---|
[15:0] |
PCI Express Extended Capability ID for ARI. |
0x000E |
RO |
[19:16] |
Capability Version. |
0x1 |
RO |
[31:20] |
|
See description |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[0] |
Specifies support for arbitration at the Function group level. Not implemented. |
0 |
RO |
[7:1] | Reserved. |
0 |
RO |
[15:8] |
ARI Next Function Pointer. Pointer to the next PF. |
1 |
RO |
[31:16] |
Reserved. |
0 |
RO |
8.1.10.2. SR-IOV Enhanced Capability Registers
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
PCI Express Extended Capability ID |
0x0010 |
RO |
[19:16] | Capability Version | 1 |
RO |
[31:16] |
Next Capability Pointer: The value depends on data rate. If the number of VFs attached to this PFs is non-zero, this pointer points to the SR-IOV Extended Capability, 0x200. Otherwise, its value is configured as follows:
|
Set in Platform Designer |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[0] |
VF Migration Capable |
0 |
RO |
[1] | ARI Capable Hierarchy Preserved | 1, for the lowest-numbered PF with SR-IOV Capability; 0 for other PFs. |
RO |
[31:2] |
Reserved |
0 Default Value |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[0] |
VF Enable |
0 |
RW |
[1] | VF Migration Enable. Not implemented. | 0 |
RO |
[2] |
VF Migration Interrupt Enable. Not implemented. |
0 |
RO |
[3] | VF Memory Space Enable | 0 | RW |
[4] | ARI Capable Hierarchy | 0 | RW, for the lowest-numbered PF with SR-IOV Capability; RO for other PFs |
[15:5] | Reserved | 0 | RO |
[31:16] | SR-IOV Status Register. Not implemented | 0 | RO |
8.1.10.3. Initial VFs and Total VFs Registers
Bits |
Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
Initial VFs. Specifies the initial number of VFs configured for this PF. |
Same value as TotalVFs |
RO |
[31:16] |
Total VFs. Specifies the total number of VFs attached to this PF. |
Set in Platform Designer |
RO |
Bit Location |
Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
NumVFs. Specifies the number of VFs enabled for this PF. Writable only when the VF Enable bit in the SR-IOV Control Register is 0. |
0 |
RW |
[31:16] |
Function Dependency Link |
0 |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:16] |
VF Stride |
1 |
RO |
8.1.10.4. VF Device ID Register
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
Reserved |
0 |
RO |
[31:16] |
VF Device ID |
Set in Platform Designer |
RO |
8.1.10.5. Page Size Registers
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:0] |
Supported Page Sizes. Specifies the page sizes supported by the device |
Set in Platform Designer |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:0] |
Supported Page Sizes. Specifies the page size currently in use. |
Set in Platform Designer |
RO |
8.1.10.6. VF Base Address Registers (BARs) 0-5
Each PF implements six BARs. You can specify BAR settings in Platform Designer. You can configure VF BARs as 32-bit memories. Or you can combine VF BAR0 and BAR1 to form a 64-bit memory BAR. VF BAR 0 may also be designated as prefetchable or non-prefetchable in Platform Designer. Finally, the address range of VF BAR 0 can be configured as any power of 2 between 128 bytes and 2 GB.
The contents of VF BAR 0 are described below:
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[0] | Memory Space Indicator: Hardwired to 0 to indicate the BAR defines a memory address range. | 0 | RO |
[1] | Reserved. Hardwired to 0. | 0 | |
[2] | Specifies the BAR size.: The
following encodings are defined:
|
0 |
RO |
[3] | When 1, indicates that the data within the address range refined by this BAR is prefetchable. When 1, indicates that the data is not prefetchable. Data is prefetchable if reading is guaranteed not to have side-effects . | Prefetchable: 0 Non-Prefetchable: 1 |
RO |
[7:4] | Reserved. Hardwired to 0. | 0 | RO |
[31:8] |
Base address of the BAR. The number of writable bits is based on the BAR access size. For example, if bits [15:8] are hardwired to 0, if the BAR access size is 64 KB. Bits [31:16] can be read and written. |
0 |
See description |
8.1.10.7. Secondary PCI Express Extended Capability Header
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[15:0] |
PCI Express Extended Capability ID. |
0x0019 |
RO |
[19:16] |
Capability Version. |
0x1 |
RO |
[31:20] | Next Capability Pointer. The following values are possible:
|
0x1FC, 0x284, or 0 | RO |
8.1.10.8. Lane Status Registers
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[7:0] |
Lane Error Status: Each 1 indicates an error was detected in the corresponding lane. Only Bit 0 is implemented when the link width is 1. Bits [1:0] are implemented when the link width is 2, and so on. The other bits read as 0. This register is present only in PF0 when the maximum data rate is 8 Gbps. |
0 |
RW1CS |
[31:8] |
Reserved |
0 |
RO |
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[6:0] |
Reserved |
0x7F |
RO |
[7] |
Reserved |
0 |
RO |
[11:8] |
Upstream Port Lane 0 Transmitter Preset |
0xF |
RO |
[14:12] |
Upstream Port Lane 0 Receiver Preset Hint |
0x7 |
RO |
[15] |
Reserved |
0 |
RO |
[22:16] |
Reserved |
0x7F |
RO |
[23] |
Reserved |
0 |
RO |
[27:24] |
Upstream Port Lane 1 Transmitter Preset |
0xF when link width > 1 0 when link width = 1 |
RO |
[30:28] |
Upstream Port Lane 1 Receiver Preset Hint |
0x7 when link width > 1 0 when link width = 1 |
RO |
[31] |
Reserved |
0 |
RO |
8.1.10.9. Transaction Processing Hints (TPH) Requester Enhanced Capability Header
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:20] | Next Capability Pointer: Points to ATS Capability when preset, NULL otherwise. | 0x0017 | RO |
[19:16] |
Capability Version. | 1 |
RO |
[15:0] |
PCI Express Extended Capability ID. |
RO |
8.1.10.10. TPH Requester Capability Register
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:27] |
Reserved. |
0 |
RO |
[26:16] | ST Table Size: Specifies the number of entries in the Steering Tag Table. When set to 0, the table has 1 entry. When set to 1, the table has 2 entries. The maximum table size is 2048 entries when located in the MSI-X table Each entry is 8 bits. | Set in Platform Designer |
RO |
[15:11] | Reserved | 0 |
RO |
[10:9] | ST Table Location: Setting this field indicates if a
Steering Tag Table is implemented for this Function. The following
encodings are defined:
|
Set in Platform Designer |
RO |
[8] | Extended TPH Requester Supported: When set to 1, indicates that the function is capable of generating requests with 16-bit Steering Tags, using TLP Prefix. This bit is permanently set to 0. | 0 |
RO |
[7:3] |
Reserved. |
0 |
RO |
[2] |
Device-Specific Mode Supported: A setting of 1 indicates that the function supports the Device-Specific Mode for TPH Steering Tag generation. The client typically choses the Steering Tag values from the ST Table, but is not required to do so. |
Set in Platform Designer |
RO |
[1] |
Interrupt Vector Mode Supported: A setting of 1 indicates that the function supports the Interrupt Vector Mode for TPH Steering Tag generation. In the Interrupt Vector Mode, Steering Tags are attached to MSI/MSI-X interrupt requests. The MSI/MSI-X interrupt vector number selects the Steering Tag for each interrupt. |
Set in Platform Designer |
RO |
[0] |
No ST Mode Supported: When set to 1, indicates that the function supports the No ST Mode for the generation of TPH Steering Tags. In the No ST Mode, the device must use a Steering Tag value of 0 for all requests. This bit is hardwired to 1, because all TPH Requesters are required to support the No ST Mode of operation. |
1 |
RO |
8.1.10.11. TPH Requester Control Register
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:9] |
Reserved. |
0 |
RO |
[8] | TPH Requester Enable: When set to 1, the Function can generate requests with Transaction Processing Hints. | 0 |
RW |
[7:3] | Reserved. | 0 |
RO |
[2:0] | ST Mode. The following encodings are defined:
|
0 |
RW |
8.1.10.12. Address Translation Services ATS Enhanced Capability Header
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[31:20] |
Next Capability Pointer: Points to NULL. |
0 |
RO |
[19:16] | Capability Version. | 1 |
RO |
[15:0] | PCI Express Extended Capability ID | 0x003C |
RO |
8.1.10.13. ATS Capability Register and ATS Control Register
Bits |
Register Description |
Default Value |
Access |
---|---|---|---|
[15] |
Enable bit. When set, the Function can cache translations. |
0 |
RW |
[14:5] | Reserved. | 0 |
RO |
[4:0] | Smallest Translation Unit (STU): This value specifies the minimum number of 4096-byte blocks specified in a Translation Completion or Invalidate Request. This is a power of 2 multiplier. The number of blocks is 2STU. A value of 0 indicates one block and a value of 0x1F indicates 231 blocks, or 8 terabyte (TB) total. | 0 |
RW |
[15:6] | Reserved. | 0 | RO |
[5] | Page Aligned Request: If set, indicates the untranslated address is always aligned to a 4096-byte boundary. This bit is hardwired to 1. | 1 | RO |
[4:0] | Invalidate Queue Depth: The number of Invalidate Requests that the Function can accept before throttling the upstream connection. If 0, the Function can accept 32 Invalidate Requests. | Set in Platform Designer | RO |
9. Testbench and Design Example
This chapter introduces the Endpoint design example including a testbench, BFM, and a test driver module. You can create this design example using design flows described in Quick Start Guide.
This testbench simulates up to x8 variants. It supports x16 variants by down-training to x8. To simulate all lanes of a x16 variant, you can create a simulation model in Platform Designer to use in an Avery testbench. For more information refer to AN-811: Using the Avery BFM for PCI Express x16 Simulation on Intel Stratix 10 Devices.
This testbench simulates up to x8 variants. It supports x16 variants by down-training to x8. To simulate all lanes of a x16 variant, you can create a simulation model in Platform Designer to use in an Avery testbench. For more information refer to AN-811: Using the Avery BFM for PCI Express x16 Simulation on Intel Stratix 10 Devices.
When configured as an Endpoint variation, the testbench instantiates a design example and a Root Port BFM which provides the following functions:
- A configuration routine that sets up all the basic configuration registers in the Endpoint. This configuration allows the Endpoint application to be the target and initiator of PCI Express transactions.
- A Verilog HDL procedure interface to initiate PCI Express* transactions to the Endpoint.
This testbench simulates a single Endpoint DUT.
The testbench uses a test driver module, altpcietb_bfm_rp_<gen>_x8.sv, to exercise the target memory. At startup, the test driver module displays information from the Root Port Configuration Space registers, so that you can correlate to the parameters you specified using the parameter editor.
Your Application Layer design may need to handle at least the following scenarios that are not possible to create with the Intel testbench and the Root Port BFM:
- It is unable to generate or receive Vendor Defined Messages. Some systems generate Vendor Defined Messages. Consequently, you must design the Application Layer to process them. The Hard IP block passes these messages on to the Application Layer which, in most cases should ignore them.
- It can only handle received read requests that are less than or equal to the currently set Maximum payload size option specified under PCI Express/PCI Capabilities heading under the Device tab using the parameter editor. Many systems are capable of handling larger read requests that are then returned in multiple completions.
- It always returns a single completion for every read request. Some systems split completions on every 64-byte address boundary.
- It always returns completions in the same order the read requests were issued. Some systems generate the completions out-of-order.
- It is unable to generate zero-length read requests that some systems generate as flush requests following some write transactions. The Application Layer must be capable of generating the completions to the zero length read requests.
- It uses fixed credit allocation.
- It does not support parity.
- It does not support multi-function designs.
9.1. Endpoint Testbench
You can create an Endpoint design for inclusion in the testbench using design flows described in the Quick Start Guide. This testbench uses the parameters that you specify in the Quick Start Guide.
This testbench simulates up to an ×8 PCI Express link using either the PIPE interface of the Endpoint or the serial PCI Express interface. The testbench design does not allow more than one PCI Express link to be simulated at a time. The following figure presents a high level view of the design example.
The top-level of the testbench instantiates the following main modules:
-
altpcietb_bfm_rp_<gen>_x8.sv —This is the
Root Port
PCIe*
BFM. This is the module that you
modify to vary the transactions sent to the example Endpoint design or your own
design.
//Directory path <project_dir>/pcie_<dev>_hip_ast_0_example_design/pcie_example_design_tb/ip/pcie_example_design_tb/DUT_pcie_tb_ip/altera_pcie_<dev>_tbed_<ver>/sim
Note: If you modify the RP BFM, you must also make the appropriate corresponding changes the APPs module. -
pcie_example_design_DUT.ip: This is the Endpoint design with the
parameters that you
specify.
//Directory path <project_dir>/pcie_<dev>_hip_ast_0_example_design/ip/pcie_example_design
-
pcie_example_design_APPS.ip: This module is a target and initiator
of
transactions.
//Directory path <project_dir>/pcie_<dev>_hip_ast_0_example_design/ip/pcie_example_design/
-
altpcietb_bfm_cfpb.v: This module supports Configuration Space
Bypass mode. It drives TLPs to the custom Configuration
Space.
//Directory path <project_dir>/pcie_<dev>_hip_ast_0_example_design/pcie_example_design_tb/ip/pcie_example_design_tb/DUT_pcie_tb_ip/altera_pcie_<dev>_tbed_<ver>/sim
In addition, the testbench has routines that perform the following tasks:
- Generates the reference clock for the Endpoint at the required frequency.
- Provides a PCI Express reset at start up.
9.1.1. Endpoint Testbench for SR-IOV
- A single memory write followed by a single memory read to each PF and VF.
- For PF0 only, the testbench drives memory writes to each VF and followed by memory reads of all VFs.
9.2. Test Driver Module
The test driver module, altpcie_<dev>_tbed_hwtcl.v, instantiates the top-level BFM, altpcietb_bfm_top_rp.v.
The top-level BFM completes the following tasks:
- Instantiates the driver and monitor.
- Instantiates the Root Port BFM.
- Instantiates either the PIPE or serial interfaces.
The configuration module, altpcietb_bfm_configure.v performs the following tasks:
- Configures assigns the BARs.
- Configures the Root Port and Endpoint.
- Displays comprehensive Configuration Space, BAR, MSI and MSI-X, AER, settings..
9.3. Root Port BFM Overview
The basic Root Port BFM provides Verilog HDL task‑based interface to request transactions to issue on the PCI Express link. The Root Port BFM also handles requests received from the PCI Express link. The following figure shows the most important modules in the Root Port BFM.
These modules implement the following functionality:
- BFM Log Interface,altpcietb_bfm_log.v and altlpcietb_bfm_rp_<gen>_x8.v: The BFM log functions provides routine for writing commonly formatted messages to the simulator standard output and optionally to a log file. It also provides controls that stop simulation on errors. For details on these procedures, refer to BFM Log and Message Procedures.
- BFM Read/Write Request Functions, altpcietb_bfm_rp_<gen>_x8.sv: These functions provide the basic BFM calls for PCI Express read and write requests. For details on these procedures, refer to BFM Read and Write Procedures.
- BFM Log Interface, altpcietb_bfm_log.v and altlpcietb_bfm_rp_<gen>_x8.v: The BFM log functions provides routine for writing commonly formatted messages to the simulator standard output and optionally to a log file. It also provides controls that stop simulation on errors. For details on these procedures, refer to BFM Log and Message Procedures.
- BFM Configuration Functions, altpcietb_g3bfm_configure.v : These functions provide the BFM calls to request configuration of the PCI Express link and the Endpoint Configuration Space registers. For details on these procedures and functions, refer to BFM Configuration Procedures.
- BFM shared memory, altpcietb_g3bfm_shmem_common.v: This modules provides
the Root Port BFM shared memory. It implements the following functionality:
- Provides data for TX write operations
- Provides data for RX read operations
- Receives data for RX write operations
- Receives data for received completions
- BFM Request Interface, altpcietb_g3bfm_req_intf.v: This interface provides the low-level interface between the altpcietb_g3bfm_rdwr and altpcietb_bfm_configure procedures or functions and the Root Port RTL Model. This interface stores a write-protected data structure containing the sizes and the values programmed in the BAR registers of the Endpoint. It also stores other critical data used for internal BFM management. You do not need to access these files directly to adapt the testbench to test your Endpoint application.
- Avalon‑ST Interfaces, altpcietb_g3bfm_vc_intf_ast_common.v: These interface modules handle the Root Port interface model. They take requests from the BFM request interface and generate the required PCI Express transactions. They handle completions received from the PCI Express link and notify the BFM request interface when requests are complete. Additionally, they handle any requests received from the PCI Express link, and store or fetch data from the shared memory before generating the required completions.
9.3.1. BFM Memory Map
The BFM shared memory is 2 MBs. The BFM shared memory maps to the first 2 MBs of I/O space and also the first 2 MBs of memory space. When the Endpoint application generates an I/O or memory transaction in this range, the BFM reads or writes the shared memory.
9.3.2. Configuration Space Bus and Device Numbering
Enumeration assigns the Root Port interface device number 0 on internal bus number 0. Use the ebfm_cfg_rp_ep to assign the Endpoint to any device number on any bus number (greater than 0). The specified bus number is the secondary bus in the Root Port Configuration Space.
9.3.3. Configuration of Root Port and Endpoint
Before you issue transactions to the Endpoint, you must configure the Root Port and Endpoint Configuration Space registers.
The ebfm_cfg_rp_ep procedure executes the following steps to initialize the Configuration Space:
- Sets the Root Port Configuration Space to enable the Root Port to send transactions on the PCI Express link.
- Sets the Root Port and
Endpoint PCI Express Capability Device Control registers as follows:
- Disables Error Reporting in both the Root Port and Endpoint. The BFM does not have error handling capability.
- Enables Relaxed Ordering in both Root Port and Endpoint.
- Enables Extended Tags for the Endpoint if the Endpoint has that capability.
- Disables Phantom Functions, Aux Power PM, and No Snoop in both the Root Port and Endpoint.
- Sets the Max Payload Size to the value that the Endpoint supports because the Root Port supports the maximum payload size.
- Sets the Root Port Max Read Request Size to 4 KB because the example Endpoint design supports breaking the read into as many completions as necessary.
- Sets the Endpoint Max Read Request Size equal to the Max Payload Size because the Root Port does not support breaking the read request into multiple completions.
- Assigns values to all the
Endpoint BAR registers. The BAR addresses are assigned by the algorithm outlined
below.
- I/O BARs are assigned smallest to largest starting just above the ending address of BFM shared memory in I/O space and continuing as needed throughout a full 32-bit I/O space.
- The 32-bit non-prefetchable memory BARs are assigned smallest to largest, starting just above the ending address of BFM shared memory in memory space and continuing as needed throughout a full 32-bit memory space.
- The value of the
addr_map_4GB_limit input to the ebfm_cfg_rp_ep procedure controls the
assignment of the 32-bit prefetchable and 64-bit prefetchable memory
BARS.
The default value of the addr_map_4GB_limit
is 0.
If the addr_map_4GB_limit input to the ebfm_cfg_rp_ep procedure is set to 0, then the ebfm_cfg_rp_ep procedure assigns the 32‑bit prefetchable memory BARs largest to smallest, starting at the top of 32-bit memory space and continuing as needed down to the ending address of the last 32-bit non-prefetchable BAR.
However, if the addr_map_4GB_limit input is set to 1, the address map is limited to 4 GB. The ebfm_cfg_rp_ep procedure assigns 32-bit and 64-bit prefetchable memory BARs largest to smallest, starting at the top of the 32-bit memory space and continuing as needed down to the ending address of the last 32-bit non-prefetchable BAR.
- If the addr_map_4GB_limit input to the ebfm_cfg_rp_ep
procedure is set to 0,
then
the ebfm_cfg_rp_ep procedure
assigns the 64-bit prefetchable memory BARs
smallest
to largest starting at the
4 GB address
assigning memory ascending above the
4 GB limit
throughout the full 64-bit memory space.
If the addr_map_4 GB_limit input to the ebfm_cfg_rp_ep procedure is set to 1, the ebfm_cfg_rp_ep procedure assigns the 32-bit and the 64-bit prefetchable memory BARs largest to smallest starting at the 4 GB address and assigning memory by descending below the 4 GB address to memory addresses as needed down to the ending address of the last 32-bit non-prefetchable BAR.
The above algorithm cannot always assign values to all BARs when there are a few very large (1 GB or greater) 32-bit BARs. Although assigning addresses to all BARs may be possible, a more complex algorithm would be required to effectively assign these addresses. However, such a configuration is unlikely to be useful in real systems. If the procedure is unable to assign the BARs, it displays an error message and stops the simulation.
- Based on the above BAR assignments, the ebfm_cfg_rp_ep procedure assigns the Root Port Configuration Space address windows to encompass the valid BAR address ranges.
- The ebfm_cfg_rp_ep procedure enables master transactions, memory address decoding, and I/O address decoding in the Endpoint PCIe* control register.
The ebfm_cfg_rp_ep procedure also sets up a bar_table data structure in BFM shared memory that lists the sizes and assigned addresses of all Endpoint BARs. This area of BFM shared memory is write-protected. Consequently, any application logic write accesses to this area cause a fatal simulation error.
BFM procedure calls to generate full PCIe* addresses for read and write requests to particular offsets from a BAR use this data structure. . This procedure allows the testbench code that accesses the Endpoint application logic to use offsets from a BAR and avoid tracking specific addresses assigned to the BAR. The following table shows how to use those offsets.
Offset (Bytes) |
Description |
---|---|
+0 |
PCI Express address in BAR0 |
+4 |
PCI Express address in BAR1 |
+8 |
PCI Express address in BAR2 |
+12 |
PCI Express address in BAR3 |
+16 |
PCI Express address in BAR4 |
+20 |
PCI Express address in BAR5 |
+24 |
PCI Express address in Expansion ROM BAR |
+28 |
Reserved |
+32 |
BAR0 read back value after being written with all 1’s (used to compute size) |
+36 |
BAR1 read back value after being written with all 1’s |
+40 |
BAR2 read back value after being written with all 1’s |
+44 |
BAR3 read back value after being written with all 1’s |
+48 |
BAR4 read back value after being written with all 1’s |
+52 |
BAR5 read back value after being written with all 1’s |
+56 |
Expansion ROM BAR read back value after being written with all 1’s |
+60 |
Reserved |
The configuration routine does not configure any advanced PCI Express capabilities such as the AER capability.
Besides the ebfm_cfg_rp_ep procedure in altpcietb_bfm_rp_gen3_x8.sv, routines to read and write Endpoint Configuration Space registers directly are available in the Verilog HDL include file. After the ebfm_cfg_rp_ep procedure runs the PCI Express I/O and Memory Spaces have the layout shown in the following three figures. The memory space layout depends on the value of the addr_map_4GB_limit input parameter. The following figure shows the resulting memory space map when the addr_map_4GB_limit is 1.
The following figure shows the resulting memory space map when the addr_map_4GB_limit is 0.
The following figure shows the I/O address space.
9.3.4. Issuing Read and Write Transactions to the Application Layer
The Endpoint Application Layer issues read and write transactions by calling one of the ebfm_bar procedures in altpcietb_g3bfm_rdwr.v. The procedures and functions listed below are available in the Verilog HDL include file altpcietb_g3bfm_rdwr.v. The complete list of available procedures and functions is as follows:
- ebfm_barwr: writes data from BFM shared memory to an offset from a specific Endpoint BAR. This procedure returns as soon as the request has been passed to the VC interface module for transmission.
- ebfm_barwr_imm: writes a maximum of four bytes of immediate data (passed in a procedure call) to an offset from a specific Endpoint BAR. This procedure returns as soon as the request has been passed to the VC interface module for transmission.
- ebfm_barrd_wait: reads data from an offset of a specific Endpoint BAR and stores it in BFM shared memory. This procedure blocks waiting for the completion data to be returned before returning control to the caller.
- ebfm_barrd_nowt: reads data from an offset of a specific Endpoint BAR and stores it in the BFM shared memory. This procedure returns as soon as the request has been passed to the VC interface module for transmission, allowing subsequent reads to be issued in the interim.
These routines take as parameters a BAR number to access the memory space and the BFM shared memory address of the bar_table data structure that was set up by the ebfm_cfg_rp_ep procedure. (Refer to Configuration of Root Port and Endpoint.) Using these parameters simplifies the BFM test driver routines that access an offset from a specific BAR and eliminates calculating the addresses assigned to the specified BAR.
The Root Port BFM does not support accesses to Endpoint I/O space BARs.
9.4. BFM Procedures and Functions
The BFM includes procedures, functions, and tasks to drive Endpoint application testing. It also includes procedures to run the chaining DMA design example.
The BFM read and write procedures read and write data to BFM shared memory, Endpoint BARs, and specified configuration registers. The procedures and functions are available in the Verilog HDL. These procedures and functions support issuing memory and configuration transactions on the PCI Express link.
9.4.1. ebfm_barwr Procedure
The ebfm_barwr procedure writes a block of data from BFM shared memory to an offset from the specified Endpoint BAR. The length can be longer than the configured MAXIMUM_PAYLOAD_SIZE. The procedure breaks the request up into multiple transactions as needed. This routine returns as soon as the last transaction has been accepted by the VC interface module.
Location |
altpcietb_g3bfm_rdwr.v |
|
---|---|---|
Syntax |
ebfm_barwr(bar_table, bar_num, pcie_offset, lcladdr, byte_len, tclass) |
|
Arguments |
bar_table |
Address of the Endpoint bar_table structure in BFM shared memory. The bar_table structure stores the address assigned to each BAR so that the driver code does not need to be aware of the actual assigned addresses only the application specific offsets from the BAR. |
bar_num |
Number of the BAR used with pcie_offset to determine PCI Express address. |
|
pcie_offset |
Address offset from the BAR base. |
|
lcladdr |
BFM shared memory address of the data to be written. |
|
byte_len |
Length, in bytes, of the data written. Can be 1 to the minimum of the bytes remaining in the BAR space or BFM shared memory. |
|
tclass |
Traffic class used for the PCI Express transaction. |
9.4.2. ebfm_barwr_imm Procedure
The ebfm_barwr_imm procedure writes up to four bytes of data to an offset from the specified Endpoint BAR.
Location |
altpcietb_g3bfm_rdwr.v |
|
---|---|---|
Syntax |
ebfm_barwr_imm(bar_table, bar_num, pcie_offset, imm_data, byte_len, tclass) |
|
Arguments |
bar_table |
Address of the Endpoint bar_table structure in BFM shared memory. The bar_table structure stores the address assigned to each BAR so that the driver code does not need to be aware of the actual assigned addresses only the application specific offsets from the BAR. |
bar_num |
Number of the BAR used with pcie_offset to determine PCI Express address. |
|
pcie_offset |
Address offset from the BAR base. |
|
imm_data |
Data to be written. In Verilog HDL, this argument is reg [31:0].In both languages, the bits written depend on the length as follows: Length Bits Written
|
|
byte_len |
Length of the data to be written in bytes. Maximum length is 4 bytes. |
|
tclass |
Traffic class to be used for the PCI Express transaction. |
9.4.3. ebfm_barrd_wait Procedure
The ebfm_barrd_wait procedure reads a block of data from the offset of the specified Endpoint BAR and stores it in BFM shared memory. The length can be longer than the configured maximum read request size; the procedure breaks the request up into multiple transactions as needed. This procedure waits until all of the completion data is returned and places it in shared memory.
Location |
altpcietb_g3bfm_rdwr.v |
|
---|---|---|
Syntax |
ebfm_barrd_wait(bar_table, bar_num, pcie_offset, lcladdr, byte_len, tclass) |
|
Arguments |
bar_table |
Address of the Endpoint bar_table structure in BFM shared memory. The bar_table structure stores the address assigned to each BAR so that the driver code does not need to be aware of the actual assigned addresses only the application specific offsets from the BAR. |
bar_num |
Number of the BAR used with pcie_offset to determine PCI Express address. |
|
pcie_offset |
Address offset from the BAR base. |
|
lcladdr |
BFM shared memory address where the read data is stored. |
|
byte_len |
Length, in bytes, of the data to be read. Can be 1 to the minimum of the bytes remaining in the BAR space or BFM shared memory. |
|
tclass |
Traffic class used for the PCI Express transaction. |
9.4.4. ebfm_barrd_nowt Procedure
The ebfm_barrd_nowt procedure reads a block of data from the offset of the specified Endpoint BAR and stores the data in BFM shared memory. The length can be longer than the configured maximum read request size; the procedure breaks the request up into multiple transactions as needed. This routine returns as soon as the last read transaction has been accepted by the VC interface module, allowing subsequent reads to be issued immediately.
Location |
altpcietb_g3bfm_rdwr.v |
|
---|---|---|
Syntax |
ebfm_barrd_nowt(bar_table, bar_num, pcie_offset, lcladdr, byte_len, tclass) |
|
Arguments |
bar_table |
Address of the Endpoint bar_table structure in BFM shared memory. |
bar_num |
Number of the BAR used with pcie_offset to determine PCI Express address. |
|
pcie_offset |
Address offset from the BAR base. |
|
lcladdr |
BFM shared memory address where the read data is stored. |
|
byte_len |
Length, in bytes, of the data to be read. Can be 1 to the minimum of the bytes remaining in the BAR space or BFM shared memory. |
|
tclass |
Traffic Class to be used for the PCI Express transaction. |
9.4.5. ebfm_cfgwr_imm_wait Procedure
The ebfm_cfgwr_imm_wait procedure writes up to four bytes of data to the specified configuration register. This procedure waits until the write completion has been returned.
Location |
altpcietb_g3bfm_rdwr.v |
|
---|---|---|
Syntax |