top of page
Writer's pictureAngelo Jacobo

Getting Started with UberDDR3 (Part 1) - Post #2

Table of Contents:

In this blog post, we will explore UberDDR3, an open-source DDR3 controller. As mentioned in my previous post, UberDDR3: An Open-Source DDR3 Controller the aim of this DDR3 controller is to provide a high-performance, easy-to-use, open-source IP for anyone wanting to utilize DDR3 RAM on their FPGA boards.


This is Part 1, where we will focus on importing UberDDR3 into a Vivado project, running the testbench simulation and passing it, then exploring the waveform and text report outputs to get an intuitive understanding of what UberDDR3 is doing underneath.


The only prerequisite to follow along is the Vivado tool. So, let's get started!


I. Setting Up a New Vivado Project with UberDDR3


1. Retrieve the UberDDR3 repository by git cloning:

Or if you do not have git installed, go to my GItHub repo:

then click the <> Code button and then Download Zip. Unzip this file after downloading.


2. Create a new Vivado project (as shown below, I named it uberddr3_test). Then on Add Sources page click Add Directories, then choose the folder UberDDR3 which we just cloned earlier.


3. Skip the Add Constraints page as we will add that later on. Then on Choose Board page, for now choose ArtyS7-50 since that will be the FPGA board we will use on part 2 of this tutorial. However, if you have a different board then feel free to choose that now, I will add disclaimers later on for instructions that will only be applicable for ArtyS7-50.


II. Running the UberDDR3 Testbench Simulation

The UberDDR3 repository already includes simulation files, such as the Micron DDR3 model file, to run a comprehensive testbench simulation. It is recommended to run the testbench simulation to ensure everything is working properly. Essentially, this testbench simulates interaction with a real DDR3 RAM via the Micron DDR3 model file. The test will run through the reset sequence, calibration, internal read-write test, and then an external read-write test. The test concludes with a report detailing the number of successful and failed transactions. Let us now go through running this testbench simulation:


1. Under the hierarchy of Simulation Sources, choose the file ddr3_dimm_micron_sim as the top-level simulation module:


2. Click Run Simulation > Run Behavioral Simulation. This will compile the design, elaborate, then do simulation for 1000ns. After this is all done, the simulation window should be displayed.


3. We will explore the simulation output waveform to show the relevant signals and details of what is happening on the testbench. However, the default waveform configuration is not a good starting point to explore the design, as it does not contain the signals we actually want to see. Click on File > Simulation Waveform > Open Configuration. Choose the ddr3_dimm_micron_sim_behav.wcfg file located inside UberDDR3/testbench. The waveform window will then change to use the signals from the prepared waveform configuration file.


4. Click the Restart button and then Run All to run the simulation until the end. This will take a while. For reference, on my laptop (AMD Ryzen-7 with 16GB RAM), it took 8 minutes. In the meantime, feel free to read my blog introducing the UberDDR3 project, if you haven't already 😁.

III. Analyzing the Simulation Results

By default, the testbench simulation is set for x16 DDR3 (similar to ArtyS7). The controller clock is set at 100MHz, and the DDR3 clock at 400MHz, which is four times the controller clock since UberDDR3 is a 4:1 memory controller. Let's now inspect the waveform and see what UberDDR3 is doing underneath:


1. Clocks and Reset:

There are 4 clocks used in the design where the i_controller_clk is 100MHz (10ns period), i_ddr3_clk is 400MHz (2.5ns period), i_ddr3_clk_90 is a 90° phase shifted version of i_ddr3_clk, and i_ref_clk is 200MHz (5 ns period). The testbench resets the UberDDR3 by lowering the active low i_rst_n.


2. Read Calibration:

The first part of calibration is read calibration as hinted by the calibration_state signal. As shown on the lane signal, read calibration is first done on lane 0 and then lane 1 afterwards. There are two lanes since UberDDR3 treats the x16 DDR3 as two lanes of x8, and each lanes will have separate read calibration.


The read calibration uses the MPR (Multi Purpose Register) feature of DDR3 to read out a predefined data pattern {0,1,0,1,0,1,0,1} from DDR3 RAM which will be used by the controller to align the incoming DQS and DQ to the DDR3 clock. Notice below that at the end of read calibration, both lanes of idelay_dqs are delayed relative to the io_ddr3_dqs and are now aligned with the i_ddr3_clk.


3. Write Calibration (if ODELAY is supported):

After read calibration comes write calibration, if the target FPGA supports ODELAY that is... However, since this simulation models an ArtyS7 as the target board, which does not have ODELAY, UberDDR3 will skip write calibration.


 

3.1 Why is write calibration skipped?

Just to clarify, UberDDR3 does support write calibration, but only for those FPGAs that have ODELAY primitives.


To provide some context, an ODELAY primitive is used for tap-based delay of an outgoing signal, essentially the reverse of the IDELAY primitive, which delays incoming signals. However, 7-series FPGA boards from AMD only support ODELAY for those with HP (High Performance) banks, such as the Kintex-7 FPGA. Unfortunately, boards like the ArtyS7 which uses Spartan-7 only have HR (High Range) banks connected to the DDR3 RAM.


Does this mean that Vivado MIG also does not support write calibration for FPGAs without HP banks? Well, it can support write calibration, even if there are no HP banks, since it uses the PHASER primitives. However, these are undocumented primitives, so we cannot use them in this open-source DDR3 controller.


Would this be a problem for UberDDR3? Write calibration is relevant for those FPGA boards which have a long PCB trace between the DDR3 RAM and the FPGA chip. This long trace causes skew between the DDR3 clock and the DQ/DQS signal, so write calibration is used to delay and align the DQ/DQS signal with the DDR3 clock. Thus, if there is only minimal trace length, then write calibration would not be needed. A good example of this is the ArtyS7 FPGA board.


So no need to worry! Also, most low-end FPGA boards with a DDR3 controller have a short PCB trace between the DDR3 chip and the FPGA chip, so it's more likely that UberDDR3 skipping write calibration (for FPGAs without HP banks) will not be a problem.

 

4. Write Alignment:

To continue, since write calibration is skipped, it will proceed to write alignment. Essentially, the controller issues two back-to-back write requests and then reads them afterwards. Before write alignment, the incoming DQ/DQS signals are already aligned with the DDR3 clock, but that does not mean they are also already aligned with the proper DDR3 clock cycle.

As shown below, the controller issues write data {c1,51,ad,d0,8c,29,77...}. BUT when it reads out the same data it becomes {ad,d0,8c,29,77...}. Weird! Notice how the data written to the DDR3 RAM starts at {ad} and NOT on {c1} which we might expect.

In simple words, the controller issued the write data too soon such that the {c1,51} is skipped and the DDR3 RAM only starts receiving the data on {ad}. The write alignment handles this problem by simply delaying the outgoing data to match the timing of when the DDR3 RAM is expecting the data.


5. Internal Test Sequence and Done Calibration:

After write alignment, the refresh sequence and calibration are already completed as required by the DDR3 specification. However, UberDDR3 supports an internal test sequence which runs after calibration. As shown below, this internal test includes a burst write-read, followed by random write-read, and then alternating write-read sequences.

Since this is just a simulation, the address covered by the internal test is limited to shorten the simulation time, but when UberDDR3 is implemented on actual hardware, this internal test will attempt to cover all addresses of your DDR3 RAM.


Notice also how DQ/DQS uniformly stops (becomes high-impedance which is colored blue on the waveform) after some time, this gap is when refresh sequence occurs.


Basically, The DDR3 SDRAM requires refresh cycles at an average periodic interval of 7.8us to refresh the charges on the capacitors inside the DDR3 RAM (read this article on basics of DRAM).


Looking on the markers on the waveform above:

34.66us - 26.83us = 7.82us

So it matches the requirement of the DDR3 spec of 7.8us periodic interval.


This refreshing of capacitors is the main reason why DDR3 is considered "volatile," as the charges inside the capacitors can leak over time and change the value stored unless a refresh sequence is performed.


To explore more on the individual read/write transactions, let us try zooming in on the random read operation as shown below. The command_used signal hints at the DDR3 command issued. Notice how, after the first RD (read) command, the DDR3 RAM returns the read data (shown on io_ddr3_dq as 16'h8787) and toggles the io_ddr3_dqs.


Since this is a random read operation, each read will require a precharge and activate. As shown above, after the first read command, it then proceeds to PRE (precharge), then ACT (activate), before again performing RD (read), where the next data read is 16'h8888.


Let us now zoom in on the alternating write-read operation as shown below. Notice the WR (write) command where the controller issued 16'ha2a2 on the io_ddr3_dq. It then issued an RD (read) command, wherein the DDR3 RAM returns 16'ha2a2. The data it writes matches what it reads!

The internal test includes monitoring logic that checks whether the data read matches what is expected, as shown below by the signals correct_read_data and wrong_read_data. In a successful simulation like the one shown, there are 349 write operations, thus there are 349 correct read data instances and 0 wrong read_data instances.

If UberDDR3 is implemented on a faulty hardware and the wrong_read_data becomes non-zero, it will repeat the entire calibration sequence. The assumption is that there was a glitch in the calibration sequence causing incorrect data to be read from the DDR3 RAM, so repeating the entire calibration sequence might solve the problem. This implies that if the hardware is completely faulty, UberDDR3 will get stuck repeating the calibration sequence over and over again.


However, if the internal test is successful, then the calibration will officially end and proceed to DONE CALIBRATION as shown above. At this point, the user can interact with the wishbone interface.


6. Testbench Issues Requests via Wishbone

After calibration officially ends, the testbench will interact with UberDDR3 via the Wishbone Interface and issue write and read requests. The test sequence is very similar to the internal test, but this is now, obviously, an external test. As shown below, the testbench issues requests and holds the i_wb_stb (strobe) high, and UberDDR3 is also constantly returning high on the o_wb_ack (acknowledge).


Zooming in on the Wishbone requests, as shown below, the testbench issues a burst write (i_wb_we is high) from address 0 (i_wb_addr starts from zero then increments by one) with the data (i_wb_data) derived from a pseudo-random generator. After the first write request, the acknowledge (o_wb_ack) will go high after some time and remain high since the request is a burst write and there is no need to stall.

7. Text Report:

Let us now proceed to the text report on the Tcl console, you will see something like this reported on the console:

[10000 ps] RD @ (0, 840) -> [10000 ps] RD @ (0, 848) -> [10000 ps] RD @ (0, 856) -> [10000 ps] RD @ (0, 864) -> [10000 ps] RD @ (0, 872) -> ....

The format is [time_delay] command @ (bank, address), so:

[10000 ps] RD @ (0, 840)

means a 10000 ps (or 10 ns) delay before a read command is issued at address 840 of bank 0. The next read commands are at addresses 848, 856, 864, and so on. This is a burst read, so the address increments by 8 (each burst in UberDDR3 consists of 8 words). Notice how the read command has a delay of 10000 ps or 10ns from each other. Since this simulation has a controller clock of 100 MHz (10ns clock period), we can conclude that there are no interruptions between sequential read commands, resulting in very high throughput.


The text report ends with a summary as shown on the right. The first part displays the summary of the test sequence from the testbench (external test). There were 4608 writes and reads, 4 of which failed. There were 4 failures because we actually injected 4 errors, so it matches. Thus, the external test is successful!


The second part shows the summary of the internal test, which includes the same signals we checked on the waveform earlier. There were 349 correct read data instances and 0 wrong read data instances. So, the internal test is successful!


IV. Conclusion

In this blog post, we've taken a detailed look at how to import UberDDR3 into a Vivado project, run simulations, and gain a deeper understanding of how UberDDR3 operates beneath the surface to interface with DDR3 RAM. I hoped this gave the readers some confidence to play with this opensource DDR3 controller IP and possibly use this on your next FPGA project.


That wraps up this post. Catch you in the next blog post for part 2 where we will implement UberDDR3 on hardware!

877 views

3 Comments


Justin C Das
Justin C Das
Jul 13

Yes Now the simulation is moving. Thanks

Like

Justin C Das
Justin C Das
Jul 07

I have ran the simulation, It seems like waveform not going beyond 0 simulation time. What could be the issue

Log

Vivado Simulator v2024.1

Copyright 1986-2022 Xilinx, Inc. All Rights Reserved.

Copyright 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.

Running: C:/Xilinx/Vivado/2024.1/bin/unwrapped/win64.o/xelab.exe --incr --debug typical --relax --mt 2 -L xil_defaultlib -L uvm -L unisims_ver -L unimacro_ver -L secureip --snapshot ddr3_dimm_micron_sim_behav xil_defaultlib.ddr3_dimm_micron_sim xil_defaultlib.glbl -log elaborate.log

Using 2 slave threads.

Starting static elaboration

Pass Through NonSizing Optimizer

WARNING: [VRFC 10-9024] illegal argument of type reg [packed dim count:1] in math function 'ceil()'; expected real type [C:/Users/JustinDas/Desktop/TestBed/Arty/DDR3/UberDDR3/rtl/ddr3_controller.v:377]

WARNING: [VRFC 10-3091] actual bit length 32 differs from formal bit length 1 for port 'SHIFTIN1' [C:/Users/JustinDas/Desktop/TestBed/Arty/DDR3/UberDDR3/rtl/ddr3_phy.v:202]

WARNING: [VRFC 10-3091] actual bit length 32 differs from…

Like
Angelo Jacobo
Angelo Jacobo
Jul 08
Replying to

Hi, this Micron DDR3 model file is trying to create a temporary file which can be used to store the data written to it. I'm successfully running this simulation in Vivado 2022.1 in Ubuntu, I'm not sure if the cause is some file access issue since yours is running in Windows.


But, try doing the procedure below before running the simulation, it will force the simulation to not use a temporary file for storing data:

1. Open the file: Simulation Sources > ddr3_dimm_micron_sim > ddr_0

2. Comment the "`define MAX_MEM" which is on line 1 . This should look as shown below:


3. Open the file: Simulation Sources > Verilog Header > 8192Mb_ddr3_parameters.vh

4. Change the value of MEM_BITS to…


Like
Computer Processor

Subscribe to Our Newsletter

Thanks for submitting!

bottom of page