FPGA Interface Manager Data Sheet: Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA
1. Overview
- FPGA Interface Unit (FIU): The platform interface layer that acts as a bridge between PCIe* and Core Cache Interface (CCI-P).
- Core Cache Interface (CCI-P): standard interface AFUs use to communicate with the host.
- External Memory Interface (EMIF)
- High-Speed Serial Interface (HSSI) for external transceivers
2. FIM and AFU Parameter Data
Use the following tables in conjunction with the Accelerator Functional Unit (AFU) Developer's Guide and the Intel® Acceleration Stack for Intel® Xeon® CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual to complete your AFU design.
2.1. Memory Interface
The Intel® FPGA PAC with Intel® Arria® 10 GX FPGA features two DDR4 SDRAM memory banks, each of 4 GB capacity. They can be used by the AFU as a local workspace for large datasets. Each bank can be accessed independently by the AFU. Each memory bank interface is 64-bits and operates at 1066 MHz DDR.
Parameter | Value |
---|---|
Memory Protocol | DDR4-SDRAM |
AFU Interface Type | Avalon™ Memory Mapped Interface ( Avalon® -MM) |
Number of Memory Interfaces | 2 |
Density per Memory Interface | 4 GB |
AFU-Accessible Memory Address Bus Width | 27-bit |
AFU-accessible Memory Data Width | 512-bit |
Interface Frequency | 266 MHz |
Maximum Burst Size | 64 beats |
Address Mapping | CS-CID-Row-Bank-Col-BG |
2.1.1. SDRAM Signals
Signal Name | Direction (AFU viewpoint) | Width | Description |
---|---|---|---|
clk | input | 1 | Provides synchronization for internal logic. |
waitrequest | input | 1 | Asserts when AFU is unable to respond to a read or write request. |
readdata | input | 512 | Read data sent from AFU to host. |
readdatavalid | input | 1 | Used for variable-latency, pipelined read transfers. When asserted, indicates that the readdata signal contains valid data. |
burstcount | output | 7 | Used to indicate the number of transfers in each burst. |
writedata | output | 512 | Asserted to indicate a write transfer. |
address | output | 27 | By default, the interconnect translates the byte address into a word address in the slave’s address space. From the perspective of the slave, each slave access is for a word of data. |
write | output | 1 | Asserted to indicate a write transfer. |
read | output | 1 | Asserted to indicate a read transfer. |
byteenable | output | 64 | Enables one or more specific byte lanes during transfers on interfaces of width greater than 8 bits. Each bit in byteenable corresponds to a byte in writedata and readdata. |
2.2. Core Cache Interface (CCI-P) Interface
Parameter | Value | Notes |
---|---|---|
Data Width | 512-bit | CCI-P interface width. |
Maximum CCI-P Frequency | pClk | - |
Host Memory Cache-Line Size | 64-byte | - |
MMIO access width | 32-bit and 64-bit | 64-bit accesses are mandatory for Device Feature Header (DFH) enumeration. |
MMIO Read Response Timeout | 65536 clock cycles | - |
Virtual Channels Supported | VH0, VA | Accesses to VH0 and VA are mapped to the PCIe* link. Accesses to VH1 or VL0 are mapped to VH0. |
2.3. Clocks
Parameter | Value | Notes |
---|---|---|
pClk | 200 MHz |
Primary interface clock. All CCI-P interface signals are synchronous to this clock. |
pClkDiv2 | 100 MHz |
Synchronous and in phase with pClk. 0.5x the pClk clock frequency. |
pClkDiv4 | 50 MHz |
Synchronous and in phase with pClk. 0.25x the pClk clock frequency. |
uClk | 312.5 MHz | This clock can be adjusted by OPAE. |
uClk/2 | 156.25 MHz | This clock can be adjusted by OPAE. |
uClk_usr Min | 10 MHz | Minimum user-defined clock. This clock is not synchronous with the pClk. You can adjust this clock using OPAE. |
uClk_usr Default | 312.5 MHz | Default user-defined clock. This clock is not synchronous with the pClk. You can adjust this clock using OPAE. |
uClk_usr Max | 600 MHz | Minimum user-defined clock. This clock is not synchronous with the pClk. You can adjust this clock using OPAE. |
uClk_usrDiv2 Min | 10 MHz |
Minimum user defined clock that is synchronous with uClk_usr and 0.5x the frequency. Note: You can use OPAE to set the frequency
to a value that is not synchronous with the uClk_usr.
|
uClk_usrDiv2 Default | 156.25 MHz |
User defined clock that is synchronous with uClk_usr and 0.5x the frequency. Note: You can use OPAE to set the frequency
to a value that is not synchronous with the uClk_usr.
|
uClk_usrDiv2 Max | 600 MHz |
Maximum user defined clock that is synchronous with uClk_usr and 0.5x the frequency. Note: You can use OPAE to set the frequency
to a value that is not synchronous with the uClk_usr.
|
2.4. Reset
Subsystem | Parameter | Value | Notes |
---|---|---|---|
Resets | Min Reset Width | 512 pClk cycles | Minimum number of pClk clock cycles the FIM holds the AFU in reset. |
2.5. Networking Interface
The networking interface provides one QSFP that can be configured to 4x10 GbE or 40 GbE.
Parameter | Value | Notes |
---|---|---|
Rate Supported | 8x10GbE or 40 GbE | - |
Layers Supported | PHY | Physical Coding Sublayer (PCS) + Physical Medium Attachment (PMA) Sublayer |
2.5.1. Clock Signals
Port Name | Width | Direction | Description |
---|---|---|---|
f2a_tx_clk | 1 | Output |
4x10GBASE-R Mode: A 156.25 MHz clock derived from the HSSI PHY’s clock generation block (CGB) tx_pma_div_clkout clock output. All transmit data and control from the MAC to the HSSI PHY is synchronous to f2a_tx_clk. 40GBASE-SR4 Mode: A 312.5 MHz clock derived from the HSSI PHY’s CGB tx_pma_div_clkout clock output. All transmit data and control from the MAC/PHY to the HSSI PHY is synchronous to f2a_tx_clk. |
f2a_tx_clkx2 | 1 | Output |
4x10GBASE-R Mode: A 312.5 MHz clock derived from the HSSI PHY’s CGB tx_pma_div_clkout clock output and phase-aligned with f2a_tx_clk. 40GBASE-SR4 Mode:A 312.5 MHz clock derived from the PHY’s CGB tx_pma_div_clkout clock output and phase-aligned with f2a_tx_clk. |
f2a_tx_locked | 1 | Output |
4x10GBASE-R Mode: Locked status for f2a_tx_clk and f2a_tx_clkx2. 40GBASE-SR4 Mode: Locked status for f2a_tx_clk and f2a_tx_clkx2. |
f2a_rx_clk_ln0 | 1 | Output |
4x10GBASE-R Mode: A 156.25 MHz clock derived from the HSSI PHY’s transmitter and receive CDR PLL clock input reference. All receive data and control from the HSSI PHY to the MAC is synchronous to f2a_rx_clk_ln0. 40GBASE-SR4 Mode: A 312.5 MHz clock derived from the HSSI PHY’s receive CDR in lane 0. All receive data and control from the HSSI PHY to the MAC/PHY is synchronous to f2a_rx_clk_ln0. |
f2a_rx_clkx2_ln0 | 1 | Putput |
4x10GBASE-R Mode: A 312.5 MHz clock derived from the HSSI PHY’s transmitter and receive CDR PLL clock input reference and phase-aligned with f2a_rx_clk_ln0. 40GBASE-SR4 Mode: A 312.5 MHz clock derived from the HSSI PHY’s receive CDR in lane 0 and phase-aligned with f2a_rx_clk_ln0. |
f2a_rx_locked_ln0 | 1 | Output |
4x10GBASE-R Mode: Locked status for f2a_rx_clk_ln0 and f2a_rx_clkx2_ln0. 40GBASE-SR4 Mode: Locked status for f2a_rx_clk_ln0 and f2a_rx_clkx2_ln0. |
f2a_rx_clk_ln4 | 1 | Output |
4x10GBASE-R Mode: Reserved 40GBASE-SR4 Mode: Reserved |
f2a_rx_locked_ln4 | 1 | Output |
4x10GBASE-R Mode: Reserved 40GBASE-SR4 Mode: Reserved |
2.5.2. Data Interface and Signals
The HSSI unified data interface conforms to the Intel® Arria® 10 FPGA Transceiver Native PHY IP with enhanced PCS. It consists of generic parallel data and encoding control interfaces for transmit and receive that are mapped to specific signaling behavior based on the configured HSSI PHY mode. The unified data interface also includes flow control ports to manage passing data to and from the HSSI PHY.
The below table provides a cross reference from the hssi:raw_pr unified data interface signals to the Intel® Arria® 10 FPGA Transceiver Native PHY IP with enhanced PCS signal set. For detailed information on these signals, see the Intel Arria 10 Transceiver PHY User Guide as referenced in the below table.
Port Name | Width | Direction | Clock Domain | Native PHY IP Port Name |
---|---|---|---|---|
Transmit and Receive Data and Encoding Control Ports | ||||
a2f_tx_parallel_data | (4*128) | Input | f2a_tx_clk | tx_parallel_data |
a2f_tx_control | (4*18) | Input | f2a_tx_clk | tx_control |
f2a_rx_parallel_data | (4*128) | Output | f2a_rx_clk_ln0 | rx_parallel_data |
f2a_rx_control | (4*20) | Output | f2a_rx_clk_ln0 | rx_control |
Flow Control Ports | ||||
f2a_tx_enh_fifo_full | 4 | Output | f2a_tx_clk | tx_enh_fifo_full |
f2a_tx_enh_fifo_pfull | 4 | Output | f2a_tx_clk | tx_enh_fifo_pfull |
f2a_tx_enh_fifo_empty | 4 | Output | f2a_tx_clk | tx_enh_fifo_empty |
f2a_tx_enh_fifo_pempty | 4 | Output | f2a_tx_clk | tx_enh_fifo_pempty |
a2f_tx_enh_data_valid | 4 | Input | f2a_tx_clk | tx_enh_data_valid |
f2a_rx_enh_fifo_full | 4 | Output | f2a_rx_clk_ln0 | rx_enh_fifo_full |
f2a_rx_enh_fifo_pfull | 4 | Output | f2a_rx_clk_ln0 | rx_enh_fifo_pfull |
f2a_rx_enh_fifo_empty | 4 | Output | f2a_rx_clk_ln0 | rx_enh_fifo_empty |
f2a_rx_enh_fifo_pempty | 4 | Output | f2a_rx_clk_ln0 | rx_enh_fifo_pempty |
f2a_rx_enh_data_valid | 4 | Output | f2a_rx_clk_ln0 | rx_enh_data_valid |
a2f_rx_enh_fifo_rd_en | 4 | Input | f2a_rx_clk_ln0 | rx_enh_fifo_rd_en |
f2a_rx_enh_fifo_pempty | 4 | Output | f2a_rx_clk_ln0 | rx_enh_fifo_pempty |
f2a_rx_enh_data_valid | 4 | Output | f2a_rx_clk_ln0 | rx_enh_data_valid |
a2f_rx_enh_fifo_rd_en | 4 | Input | f2a_rx_clk_ln0 | rx_enh_fifo_rd_en |
2.5.3. Control and Status Signals
This set of ports on the hssi interface provide for HSSI PHY receive Physical Medium Attachment (PMA) clock data recovery (CDR) lock sequencing control, PCS status, and transceiver loopback control. The signaling behavior conforms to the Intel® Arria® 10 FPGA Transceiver Native PHY IP with enhanced PCS. The below table cross references the hssi port names to the Native PHY IP port names.
Port Name | Width | Direction | Clock Domain | Native PHY IP Port Name |
---|---|---|---|---|
a2f_rx_seriallpbken | 4 | Input | Async | rx_seriallpbken |
a2f_rx_set_locktoref | 4 | Input | Async | rx_set_locktoref |
f2a_rx_is_lockedtoref | 4 | Output | Async | rx_is_lockedtoref |
a2f_rx_set_locktodata | 4 | Input | Async | rx_set_locktodata |
f2a_rx_enh_blk_lock | 4 | Output | f2a_rx_clk_ln0 | rx_enh_blk_lock |
f2a_rx_enh_highber | 4 | Output | f2a_rx_clk_ln0 | rx_enh_highber |
3. FPGA Interface Manager (FIM) Resource Utilization
Parameter | FIM Utilization Total | Device Total | Percentage of Resources Used by FIM | Notes |
---|---|---|---|---|
ALMs | 33,686 | 427,200 | 8% | Adaptive Logic Modules blocks. |
M20Ks | 126 | 2713 | 5% | Memory blocks with 20K bits. |
DSPs | 0 | 1518 | 0% | Digital Signal Processing blocks. |
4. Document Revision History for FPGA Interface Manager Data Sheet: Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA
Document Version | Acceleration Stack Version | Changes |
---|---|---|
2020.06.03 | 1.2.1 (compatible with Intel® Quartus® Prime Pro Edition 19.2) | Added the Address Mapping parameter in Table: Local Memory Interface Specifications. |
2020.03.06 | 1.2.1 (compatible with Intel® Quartus® Prime Pro Edition 19.2) | Initial Release |