AN 887: PHY Lite for Parallel Interfaces Reference Design with Dynamic Reconfiguration for Intel Arria 10 Devices
PHY Lite for Parallel Interfaces Reference Design with Dynamic Reconfiguration for Intel Arria 10 Devices
Two instances of PHY Lite for Parallel Interfaces Intel® Arria® 10 FPGA IP cores are placed in different I/O tiles on a single FPGA. Each PHY Lite instance is configured to have two groups and is loopback using a custom HiLo loopback card on the Intel® Arria® 10 GX FPGA development kit. One PHY Lite instance is configured as a transmitter (DUT_OUTPUT) and the other PHY Lite instance is configured as a receiver (DUT_INPUT).
Features
- A Nios® II processor to perform dynamic calibration for the PHY Lite for Parallel Interfaces Intel® Arria® 10 FPGA IP core.
- A set of application program interface (API) to configure delay chains for the PHY Lite for Parallel Interfaces Intel® Arria® 10 FPGA IP core.
Hardware and Software Requirements
Hardware
- Intel® Arria® 10 GX FPGA Development Kit (Device OPN: 10AX115S3F45E2SGE3)
- HiLo loopback card
- Intel® FPGA Download Cable
Software
- Intel® Quartus® Prime software version 19.1
- phylite_top.qar file
Design System Architecture Overview
You can use this reference design as a starting point design and modify as required to suit your design application.
The reference design consists of:
- DUT_MODULE:
- DUT_INPUT
- DUT_OUTPUT
- Traffic generator/checker module
- DYN_CFG Controller
- Clocking Scheme
Functional Description
DUT_MODULE
The DUT_OUTPUT instance acts as transmitter, which transfers data from DYN_CFG controller or the traffic generator module. During configuration mode, the DYN_CFG controller sends the test data to the DUT_OUTPUT instance. In normal operating mode, the DUT_OUTPUT instance takes data from the traffic generator and sends to DUT_INPUT instance. In contrast, the DUT_INPUT instance acts as receiver. The data transmitted by the DUT_OUTPUT instance is looped back to the DUT_INPUT instance.
Traffic Generator/Checker Module
The transmitted data is random data generated by the Linear Feedback Shift Register (LFSR). The received data from DUT_INPUT should match with the transmitted data for result comparison.
DYN_CFG Controller
The DYN_CFG controller module performs address translation to retrieve the physical address of the strobe or data pin to be configured. This module has forward and reverse paths. In the forward path, this module transmits data to the DUT_OUTPUT module. In the reverse path, this module receives data from the DUT_INPUT module.
Dynamic reconfiguration code is written in the Nios® II Software Build Tools for Eclipse and loaded into the instruction memory of the soft Nios® II processor. The Nios® II processor executes this code to perform calibrations. During processing, the Nios® II processor writes to the register in the I/O subsystem manager (I/O SSM) to change the DQS/DQ delay. Once the calibration is done and the data valid window is found, the Nios® II processor sets cfg_done to 1 and the interface to the IP core switches to the traffic generator. The traffic generator begins generating random data pattern and checks against the loopback data that comes back from the IP core input.
Clocking Scheme
This design uses 133.25 MHz clock from the Si5338 programmable oscillator. The PHY Lite for Parallel Interfaces IP core clock transfers data between the FPGA core logic and the IP core. The interface frequency between two PHY Lite for Parallel Interfaces IP core instances is 533 MHz.
Dynamic Reconfiguration Overview
This feature allows you to perform real-time configuration on the delay of DQS/Strobe or DQ/data signals. This feature helps to maximize the data valid window, allowing the design to achieve timing closure at high frequency. You can turn on Use dynamic reconfiguration in the parameter editor of the PHY Lite for Parallel Interfaces Intel® Arria® 10 FPGA IP core in Intel® Quartus® Prime software and the reconfiguration is performed via the Avalon® -MM interface.
Register Address Map
- Pin[4:0]—Physical location of the pin in a lane. Refer to Appendix C: Decoding Parameter Table for more information.
- lane_addr[7:0]—Address of a given lane in an interface. The fitter sets this address value. Refer to Appendix C: Decoding Parameter Table for more information.
- Once the lane and pin addresses of the target PHY Lite for Parallel Interfaces interface is captured, the target pin can get reconfigured by Read/Write through calibration offset address, for example, cal_add = 3’b011.
- ID[3:0]—Interface ID parameter. This parameter distinguishes between different IP instances in an I/O column.
- For the physical addresses of lgc_sel and pin_off, refer to the Address Register for Pin Input Delay Feature table in the PHY Lite for Parallel Interfaces Intel® Arria® 10 FPGA IP and PHY Lite for Parallel Interfaces Intel® Cyclone® 10 GX IP Cores Address Registers section of the PHY Lite for Parallel Interfaces Intel® FPGA IP Core User Guide.
Dynamic Reconfiguration API Functions
API Function | Access Type (R/W) |
Argument | Return Value | Description |
---|---|---|---|---|
Read_Param_table | R | N/A | Parameter contents | Retrieve parameter table contents from the I/O SSM memory. |
get_output_delay | R |
|
DELAY value | Read from the PIN_OUTPUT_DELAY register for the specified ID, group number, and pin number. Specified CSR to:
|
get_data_input_delay | R |
|
DELAY value | Read from the PIN_INPUT_DELAY register for the specified ID, group number, and pin number. |
get_strobe_input_delay | R |
|
DELAY value | Read from the STROBE_INPUT_DELAY register for the specified ID and group number. |
get_strobe_enable_delay | R |
|
DELAY value | Read from the STROBE_EN_DELAY register for the specified ID and group number. Specified CSR to:
|
get_strobe_enable_phase | R |
|
DELAY value | Read from the READ_EN_PHASE register for the specified ID and group number. Specified CSR to:
|
get_read_valid_delay | R |
|
DELAY value | Read from the READ_VALID_DELAY register for the specified ID and group number. Specified CSR to:
|
set_output_delay | W |
|
N/A | Write to PIN_OUTPUT_DELAY register for the specified ID, group number, and pin number. |
set_data_input_delay | W |
|
N/A | Write to PIN_INPUT_DELAY register for the specified ID, group number, and pin number. |
set_strobe_input_delay | W |
|
N/A | Write to STROBE_INPUT_DELAY register for the specified ID and group number. |
set_strobe_enable_delay | W |
|
N/A | Write to STROBE_ENABLE_DELAY register for the specified ID and group number. |
set_strobe_enable_phase | W |
|
N/A | Write to STROBE_ENABLE_PHASE register for the specified ID and group number. |
set_read_valid_delay | W |
|
N/A | Write to READ_VALID_DELAY register for the specified ID and group number. |
- ID—Interface ID set during PHY Lite for Parallel Interfaces instantiation.
- NUM_GROUP—The number of data/strobe groups in the interface.
- PIN—Logical pin of the interface.
- DELAY value—Refer to Control Registers Description section of the PHY Lite for Parallel Interfaces Intel® FPGA IP User Guide.
PHY Lite Per-Bit Overview
When a large amount of DQ pins are used on high-speed transfer, it is very likely that most of the DQ have a narrower passing window. This limits the maximum performance of the system, as well as having the possibility of data corruption.
Per-Bit Deskew Concept
To overcome this, the PHY Lite for Parallel Interfaces IP core has the capability to calibrate each DQ/DQS pin separately. Successful per-bit calibration may improve the total DQS opening window. An example of the per-bit calibration (happening on the RX side) is shown in the following figures:
Read Deskew Algorithm
The read deskew algorithm gives an idea how you can write any calibration algorithm to get an optimal margin of capturing data by center aligning DQS to all DQ. This algorithm calibrates the following knobs on input PHY Lite for Parallel Interfaces side.
Knob | Unit Per Step |
---|---|
DQSen delay | 1 external interface clock cycle. |
DQSen phase | 1/128th VCO clock cycle. |
Input DQS | 1/256th VCO clock cycle. |
Input per-bit DQ | 1/256th VCO clock cycle. |
- DQSen Calibration
- Sweep through DQSen (delay + phase) settings from min to max
for (cur_delay = PIN_DQS_EN_DLY_DLY_VAL_MIN; cur_dly <= PIN_DQS_EN_DLY_DLY_VAL_MAX; cur_dly++) { for (cur_phase = PIN_DQS_EN_PHASE_DLY_VAL_MIN; cur_phase <= PIN_DQS_EN_PHASE_DLY_VAL_MAX; cur_phase++) { //More code goes here } }
- For each iteration, send 5 separate patterns on each pin to compare.
- Find passing window width.
- Set DQSen delay and DQSen phase of passing window to center.
- Sweep through DQSen (delay + phase) settings from min to max
- Per-bit DQ Deskew
- Sweep individual dq_input_delay to find both right and left edge.
- Set per-bit DQ to its center ((left edge + right edge)/2).
- DQS Deskew
- Sweep dqs_input_delay from high to low.
- Find passing window width and set DQS to center.
Compiling the Reference Design
Follow these steps to set up and run the simulation reference design.
- Download the reference design files from Design Store and restore the design using Intel® Quartus® Prime software. For more information about the guideline to download and install the reference design files, refer to Getting Started with the Design Store in the related information.
- Open the reference design file (phylite_top.qpf) after successfully installing the design templates.
-
From the
Intel®
Quartus® Prime software, open the dut_INPUT.qsys and dut_OUTPUT.qsys files. Make sure that the PHY Lite for Parallel Interfaces
Intel®
Arria® 10 FPGA IP has the same configuration, as shown in the following figures:
Figure 9. General Tab Configuration for DUT_INPUT ModuleFigure 10. Group 0 Tab Configuration for DUT_INPUT ModuleFigure 11. Group 1 Tab Configuration for DUT_INPUT ModuleFigure 12. General Tab Configuration for DUT_OUTPUT ModuleFigure 13. Group 0 Tab Configuration for DUT_OUTPUT ModuleFigure 14. Group 1 Tab Configuration for DUT_OUTPUT Module
- From the Intel® Quartus® Prime software, click Processing > Start Compilation to compile the reference design.
Hardware Testing
Setting Up the Development Kit
- Set the Intel® Arria® 10 GX FPGA development kit switches to default position.
- Connect the HiLo loopback card on the HiLo memory interface.
-
Connect the
Intel® FPGA Download Cable to the
Intel®
Arria® 10 GX FPGA development kit and your host machine.
Figure 15. Intel® Arria® 10 GX FPGA Development Kit Board
- Click Tools > Programmer to program the <project directory> /phyllite_top.sof file into the Intel® Arria® 10 GX FPGA development kit.
Generating Executable and Linking Format (.elf) Programming File
Follow the steps below to generate an executable and linking format (.elf) programming file. These steps are necessary if you would like to modify the phylite_dynamic_reconfiguration.c, phylite_dynamic_reconfiguration.h and hello_world.c files.
-
In the
Intel®
Quartus® Prime software version 19.1, select Tools >
Nios® II Software Build Tools for Eclipse.
Figure 16. Nios® II Software Build Tools for Eclipse
-
Create a new workspace when the Select a workspace window prompt appears.
Figure 17. Create New Workspace
-
Select File > New >
Nios® II Application and BSP from Template in the Nios II - Eclipse window.
Figure 18. Nios® II Application and BSP from Template
-
In the SOPC Information File name parameter, browse to the location of phylite_nios.sopcinfo file in your host machine. Click OK to select the file and Eclipse automatically loads all CPU settings.
The phylite_nios.sopcinfo is created when generating phylite_nios.qsys.
- In the Project name parameter, specify your desired project name.
- Choose Hello World as the project template.
-
Click Finish to generate the project. The
Intel®
Quartus® Prime software creates a new directory named software in the specified project location.
Figure 19. Nios® II Application and BSP from Template Settings
-
Replace the following files from <project directory>/software reference design with the files located in your new software directory.
- hello_world.c
- phylite_dynamic_reconfiguration.c
- phylite_dynamic_reconfiguration.h
- In the Nios II - Eclipse window, press F5 to refresh the window and reload the new files into the project.
- Click Project > Build Project.
- Make sure the <project_name>.elf file is generated in the new <project directory>/software/<project_name>/ directory.
Running the Hardware Reference Design
Follow the steps below to run dynamic calibration and start the data transfer for the hardware reference design.
Remove all other connected devices in the programming device list during JTAG connection setup in your operating system.
-
Open two
Nios® II Command Shell prompts on your host machine:
- For Windows operating system:
- In the Intel® Quartus® Prime software installation directory in your host machine and double click on Nios® II Command Shell.bat to launch the command prompt window (command prompt A).
- For Linux operating system:
- Go to <Quartus software installation directory> \linux64\nios2ed directory and run nios2_command_shell.sh to launch the command prompt window (command prompt A).
- Repeat this step to launch the second command shell (command prompt B).
Command prompt A is to display the dynamic calibration result. Command prompt B is used to run Nios® II commands. - For Windows operating system:
-
In command prompt A, use the following command to run the
Nios® II terminal application for result printouts.
nios2-terminal
-
In command prompt B, go to the project top directory.
cd <project directory>
-
In command prompt B, download the executable (<project_name>.elf) file into the FPGA and start the dynamic calibration process with the following command:
nios2-download -r -g software/<project_name>/<project_name>.elf
or
nios2-download -r -g software/phylite_top/a10_devkit.elfYou may observe the passing dynamic calibration result displayed in command prompt A. -
When the
Nios® II instruction memory is cleaned and calibration is done, run the following command in command prompt B to reset the system, start the random data transfer and capture internal signals.
quartus_stp -t issp.tcl top.qpf 1 1Note: Sent and received data are displayed in command prompt B after running the command.
Results
The hardware reference design provides:
- Dynamic calibration result
- Random data transfer result
Dynamic Calibration Result
The figures below show the per-bit calibration result log on command prompt A.
Random Data Transfer Result
Start the random data transfer by using this command in command prompt B:
quartus_stp -t issp.tcl <project.qpf> 1 1
- The number of words being transferred.
- The expected data value.
- The received data value.
- The passing/failing status of the test.
Document Revision History for AN 887: PHY Lite for Parallel Interfaces Reference Design with Dynamic Reconfiguration for Intel Arria 10 Devices
Document Version | Changes |
---|---|
2019.05.24 | Initial release. |
Appendix A: HiLo Loopback Card Pin Connections
Appendix B: Retrieving Lane and Pin Information
The code block below is written in Nios® II processor to read out the parameter table as shown in Figure 27.
#define BASE_ADDR 0xE000 #define PT_SIZE_PTR 0x0000014 #define ADDR_OFFSET 0x0000018 void Read_Param_table() { int delay = -1; int addr_offset = -1; unsigned int size = 0; unsigned int value = 0; int i; addr_offset = IORD32(BASE_ADDR+ADDR_OFFSET); printf("Reading Addr Offset from Param Table: %08x\n\n",addr_offset); size = IORD32(BASE_ADDR+PT_SIZE_PTR); printf("Param Table size is %08x:\n", size); printf("\nParam Table:\n"); for (addr=0x0; addr < size+1; addr += 4) { value = IORD32(BASE_ADDR+addr); printf("%d\t%03x\t0x%08x\n",addr,value); }
Appendix C: Decoding Parameter Table
- To access the parameter table = 24’hE000
- To determine the size of the parameter table, generate an address. For example:
addr = 24’hE000 + 24’h14 value at addr = 0xA4
The size of parameter table is AC, which means that information about the PHY Lite for Parallel Interfaces IP cores are spread from address 27’hE000 to 27’hE0A4.
- To determine the address offset of the PHY Lite for Parallel Interfaces IP cores in the parameter table.
- There are two PHY Lite for Parallel Interfaces IP cores in the parameter table at address offset. For example:
27’hE018 = 84000044 27’hE01C = 85000074
where 0x44 address offset points to PHY Lite for Parallel Interfaces IP core 1 and 0x74 address offset points to PHY Lite for Parallel Interfaces IP core 2.
- 4 and 5 (marked in yellow box) are the PHY Lite for Parallel Interfaces IP core interface IDs.
- There are two PHY Lite for Parallel Interfaces IP cores in the parameter table at address offset. For example:
- To determine the number of groups in the
first PHY Lite for Parallel Interfaces IP core interface:
24’hE048 = 00000002
The underlined number indicates that there is only two groups.
- To determine the group information (for example, the number of lanes and pins in a PHY Lite for Parallel Interfaces IP core interface per group):
24’hE04C = 00000606
where- num_lanes[7:6],num_pins[5:0] means lanes = 1 and pins = 6 of Group 0.
- and num_lanes[7:6],num_pins[5:0] means lanes = 1 and pins = 6 of Group 1.
- To determine the lane and pin address offsets of each group:
24’hE054 = 00590068
where lane_off[31:16],pin_off[15:0] means- lane off = 0x58 and pin off = 0x5C of Group 0.
- lane off = 0x59 and pin off = 0x68 of Group 1.
- To determine the lane address of each group:
24’hE058 = 00003930
where- Lane address of Group 0 = 0x30
- Lane address of Group 1 = 0x39
- To determine the pin address at
24’hE05C to 24’hE064 for Group 0:
24’hE05C = 30E530E4 (for Group 0)
where- DQS_P = Pin 4; DQS_N = Pin 5
- DQ[0] = Pin 6; DQ[1] = Pin 7
- DQ[2] = Pin 0; DQ[3] = Pin 1
24’hE068 = 39E539E4 (for Group 1)
where- DQS_P = Pin 4; DQS_N = Pin 5
- DQ[4] = Pin 9; DQ[5] = Pin 6
- DQ[6] = Pin 3; DQ[7] = Pin 8
Note: {lane_addr[7:0],0xE,pin[3:0]} for strobe and {lane_addr[7:0],0xF,pin[3:0]} for data.