Guess what? UberDDR3 just levelled up with a new feature: Error Correction Code (ECC)! Imagine your data with a safety net, catching and fixing errors before they cause trouble. This is a game-changer for your next projects that might need ultra-reliable DDR3 memory.
And what's more, you can enable ECC of UberDDR3 even with your non-ECC DRAM. Intrigued? Stick around as we dive into the details of this new error correction feature of UberDDR3.
Table of Contents:
I. What is ECC and SECDED?
ECC DRAMs, short for Error Correction Code DRAMs, is a specialized type of memory utilized in servers and workstations where data integrity must be prioritized. Unlike most DRAMs, which have the usual eight memory chips, ECC DRAMs add a 9th memory chip to store the ECC bits as shown below:
From comparative studies of Puget Systems, ECC RAM has a lower failure rate—approximately 0.24%—compared to the 1% failure rate of standard non-ECC RAM (note that these numbers are much lower in actual field).
SECDED (Single-bit Error Correction and Double-bit Error Detection) is a type of error correction which can correct single-bit error and detect double-bit errors. This provides an end-to-end protection against single-bit errors that can occur anywhere in the memory subsystem, from the controller to the memory.
As shown below, non-ECC DRAMS are susceptible to external noises (electrical noise, EMI, temperature variation, cosmic ray, etc.) that can cause bit-flips while ECC DRAMs has capability to detect and correct this single bit error:
II. What if you want ECC in your non-ECC system?
ECC memory is designed for enterprise-grade workloads, and most consumer PC motherboards will either not support ECC RAM or will run it BUT without its ECC function. To truly benefit from ECC memory, you’ll need a workstation or server-level motherboard. (referenced here)
But WHAT IF you still want to have ECC functionality for your non-server level system? What if you could turn your standard non-ECC DDR3 RAM into ECC-capable memory?
UberDDR3 to the rescue!
UberDDR3's new ECC feature supports even non-ECC DDR3 RAMs. There are 3 options for ECC and in this blog update, we will elaborate on the first two:
ECC_ENABLE = 1 (Burst-granular Side-Band EC)
ECC_ENABLE = 2 (Word-granular Side-Band ECC)
As you can see, simply modifying the value of the ECC_ENABLE parameter is all that’s needed to add ECC to UberDDR3.
Part 2 of this blog series will elaborate on the third option: Inline ECC (inspired from LPDDR4!).
III. How ECC works?
ECC basically uses Hamming Code. The Hamming code uses redundancy to detect and correct errors. It works by adding extra parity bits to blocks of data before transmission.
The receiver side will then recalculate the values of parity bits, perform parity checks, and then combine them to form the checking number. The code is error-free if the checking number is zero. Otherwise, an error has occurred at the bit position corresponding to the checking number.
For example, looking on the illustration below, the data wants to send the data 7'b1011001 (D7 to D1). Before transmitting, it encodes the data with parity bits (P4 to P1). But along the path to the receiver, bit 7 was flipped (from 1 to 0 as marked red below)! Maybe due to electrical noise, EMI, temperature variation, or maybe cosmic ray.
Now, on the receiver side it performs parity check and forms the checking number as shown below and behold, it shows 4'b0111 which means bit 7 is the erroneous bit! Bit 7 will just be flipped back to 1, then the receiver now has the correct data.
IV. ECC Implementation
Do I have to design my own RTL for SECDED ECC from scratch? No need since we already have openly available designs. I have reused this open-source Hamming-ECC project. Among the opensource ECC projects, this design stands out as the most configurable.
Briefly analyzing, this design comes on two parts: encoder and decoder.
The encoder takes the input d_i of K bits (information bits), and outputs q_o which is the encoded data of n+1 bits (n = K + m, where "K" is information bits and "m" are parity bits). Notice how the K is a parameter, thus making this design adaptable on any length of data widths!
The decoder reverses the process, it takes the input d_i of n+1 bits (where "K" is information bits and "m" are parity bits), then outputs q_o which is the decoded data of K bits (information bits). It also outputs the flags sb_err_o for single bit error, and db_err_o for double bit errors.
There is also the option of adding latency to the design that requires a clock input. This approach registers the inputs and outputs, effectively isolating the combinational logic of the decoder from the user interface, thereby simplifying the timing requirements at higher frequencies.
Below is simple illustration of the flow of data from encoder to DRAM then to decoder. Notice how the data received by the decoder from DRAM has a single bit-flipped but is still corrected (while also asserting the sb_err_o):
IV.I Formal Verification of Hamming-ECC project
Before we use this open-source ECC design, we need to first verify that it works and has no bugs. The repository itself includes a self-checking testbench, which seems quite comprehensive in my opinion, as it tests 1-bit and 2-bit flips both sequentially and randomly, as shown in the snippet of the testbench on the right
However, perhaps due to trust issues, I also want to perform a separate verification on my side. I have decided to use formal verification, as it will likely be quite straightforward
The beauty of formal verification lies in its simplicity: instead of creating a testbench, you define formal assertions that outline what the design should and should not do—essentially modelling the "contract" of the design. The formal engine then rigorously checks all possible inputs against these assertions to ensure the design behaves correctly. For a deeper dive into formal verification, check out the ZipCPU blog (Formal Verification Courseware or maybe this blog).
The formal assertions are all in the ecc_formal.v file.
First, what is the contract or rule of the ECC?
If there is zero or one corrupted bit, then information data received by the encoder should be equal to the decoder's output data. The sb_err_o will not assert if zero corrupted bit, and should assert if there is one corrupted bit. db_err_o should be deasserted for both.
If there are two corrupted bits, then db_err_o should be asserted while sb_err_o deasserted.
This design contract will be modeled through assertions as shown below:
The engine is free to choose the values of the information input data, the number of corrupted bits, and which specific bit will be corrupted. This is achieved by specifying the attributes of these variables as (*anyseq*) (more information here) :
That's basically it, we now have the assertions to formally verify this ECC design. Below is a simple illustration where the ones in violet represent the added formal properties:
The formal verification can be run via the open-source formal verification tool, SymbiYosys (more information here). If it is installed on your system, simply run ./run_compile.sh in the main directory of the UberDDR3 directory to perform formal verification for this ECC design.
V. Integrating ECC to UberDDR3
Now that we have the ECC encoder and decoder logic, we will integrate it with UberDDR3. As previously mentioned, commercialized ECC DRAMs use the 9th memory chip to store the ECC bits. Thus, a x64 DDR3 will become x72, with the added 8 bits for the ECC SECDED parity bits.
However, as noted earlier, we aim to implement ECC on a non-ECC DDR3. How? By leveraging the fact that UberDDR3 is open-source, we are not bound to adhere to the standards of commercial DDR3 implementations.
What if we reverse the approach: an x64 DDR3 already includes both the information data and ECC parity bits?😲😲😲
V.I ECC_ENABLE = 1: Burst-granular Side-Band ECC
Setting the parameter ECC_ENABLE to 1 will add a burst-granular side-band ECC to UberDDR3. This means that each burst of data will be encoded with ECC as shown below:
The burst width is equal to the device width; therefore, an x64 DRAM will have a single burst width of 64 bits. The ECC will be ‘burst granular,’ meaning that each burst will contain the encoded data (information bits plus parity bits). Working backwards, this implies that the information data comprises 57 bits, while the remaining 7 bits are reserved for parity. (For the mathematics behind this, refer to this AMD documentation).
Shown below is the simplified computation:
This is analogous to the operation of ECC DRAMs, where a device width of x72 stores 64 bits of information and 8 bits of parity. The notable difference in the UberDDR3 implementation with ECC_ENABLE=1 is that the information bits amount to only 57 bits, not 64 bits.
Since UberDDR3 has 8 bursts per clock cycle, the word width for x64 DRAM would be 512 bits (64 bits/burst × 8 bursts). With the integration of burst-granular ECC:
The total information data bits will be only 456 (57 information bits/burst × 8 bursts).
The remaining 56 bits are allocated for ECC parity (512 total bus width - 456 information bits).
This means that the original 512-bit Wishbone bus width will no longer be fully available to the user, as only 456 bits will be accessible.
To elaborate on the implementation of this feature, below is a snippet from the DDR3 controller:
Essentially, each of the 8 bursts will require an ECC encoder-decoder pair, thus 8 instantiations of these ECC modules are needed. To elaborate:
When the user issues a write request via the Wishbone interface, the write data is initially stored in stage 1 of the controller’s 2-stage pipeline. The data from stage 1 is then sent to the ECC encoder, which adds parity bits to the original information data. The encoded data (stage1_data_encoded) is then received by stage 2. This data is subsequently sent to the PHY and eventually to the DDR3 DRAM for storage. Note how the encoder serves as an intermediary between stage 1 and stage 2.
Conversely, when the user initiates a read request, the read data from the DDR3 DRAM is received by the PHY and then forwarded to the controller. The data is processed and aligned before being sent to the decoder, which corrects single-bit errors and asserts flags as needed. The decoded data is then transmitted to the user via the Wishbone output data port. In essence, the decoder acts as an intermediary between the controller and the Wishbone output data."
This encoder-decoder implementation is illustrated below:
Note for x16 DDR3:
Using the same computation as above for x16 DDR3:
Information bits per burst = 11 bits
Parity bits per burst = 5 bits
Since x16 DDR3 has total width of 128 bits per word (8 bursts of x16), then:
Available information bits per word = 88 bits
Parity bits per word = 40 bits
This means that the original 128-bit Wishbone bus width for x16 DRAM will no longer be fully available to the user, as only 88 bits will be accessible.
V.II ECC_ENABLE = 2: Word-granular Side-Band ECC
Now, what if the bus width available for ECC_ENABLE = 1 is not enough? Then ECC_ENABLE = 2 or word-granular Side-Band ECC might be the solution you're looking for.
With ECC_ENABLE = 1, each burst is encoded with parity bits. For x64 DRAM, this results in a total of 56 parity bits out of a 512-bit bus width. However, with ECC_ENABLE = 2, the entire 8 bursts of data (or the whole WORD) are encoded, as illustrated below:
Working backwards, this implies that the information data comprises 502 bits, while the remaining 10 bits are allocated for parity (For the mathematics behind this, refer to this AMD documentation).
The computation is also shown below:
The available information is now wider: 502 bits for ECC_ENABLE = 2 compared to 456 bits in ECC_ENABLE = 1!
Although we have a wider information bus width with ECC_ENABLE = 2, it doesn't necessarily mean it's superior to ECC_ENABLE = 1 in every aspect:
The ECC is less granular with ECC_ENABLE = 2. While burst-granular ECC (ECC_ENABLE = 1) can handle single-bit errors per burst (allowing for up to 8 errors across 8 bursts), the word-granular ECC (ECC_ENABLE = 2) only permits one single-bit error across the entire 8 bursts.
The Hamming code for ECC_ENABLE = 2 encodes and decodes the length of 8 bursts per cycle, resulting in much deeper combinational logic compared to ECC_ENABLE = 1, where the logic encodes and decodes the length of only a single burst per cycle. This implies maximum frequency that ECC_ENABLE = 2 can run is lower than ECC_ENABLE = 1.
To elaborate on the implementation of this feature, below is a snippet from the ddr3 controller:
The implementation is simpler now compared to ECC_ENABLE = 1 since there is only 1 encoder-decoder pair that needs to be instantiated.
How it works is very much the same with ECC_ENABLE = 1 (almost copied-pasted from previous section):
When the user issues a write request via the Wishbone interface, the write data is initially stored in stage 1 of the controller’s 2-stage pipeline. The data from stage 1 is then sent to the ECC encoder, which adds parity bits to the original information data. The encoded data (stage1_data_encoded) is then received by stage 2. This data is subsequently sent to the PHY and eventually to the DDR3 DRAM for storage. Note how the encoder serves as an intermediary between stage 1 and stage 2.
Conversely, when the user initiates a read request, the read data from the DDR3 DRAM is received by the PHY and then forwarded to the controller. The data is processed and aligned before being sent to the decoder, which corrects single-bit errors and asserts flags as needed. The decoded data is then transmitted to the user via the Wishbone output data port. In essence, the decoder acts as an intermediary between the controller and the Wishbone output data."
This encoder-decoder implementation is illustrated below:
Note for x16 DDR3:
Since x16 DDR3 has total width of 128 bits per word (8 bursts of x16), then:
Available information bits per word = 120 bits
Parity bits per word = 8 bits
The available information is now wider: 120 bits for ECC_ENABLE = 2 compared to 88 bits in ECC_ENABLE = 1!
VI. Simulation Testbench
As mentioned in previous blog "Getting Started with UberDDR3 (Part 1)", UberDDR3 already includes a self-checking testbench using the Micron DDR3 model file. Integrating tests for ECC is simple: a bit will be intentionally flipped on the encoded data being sent to the DDR3 DRAM during write requests, with the expectation that the correct data will still be retrieved during read requests.
For ECC testing, I added a new parameter to the DDR3 controller called ECC_TEST. When set to 1, the encoded data from the ECC encoder will have its LSB tied to zero before being sent to the PHY. This was already demonstrated in earlier code snippets:
This implies that bit 0 will not always be erroneous—if it is originally zero, there is no bit error. To run the simulation for the ECC test, set the localparam for ECC_ENABLE to 1 or 2 inside the testbench then just follow the instructions on the blog "Getting Started with UberDDR3 (Part 1)".
As shown on the simulation below, the sb_err_o gets asserted randomly (bit 0 should be 1 but is flipped to zero), while the db_err_o signal remains deasserted (no double bit errors since all bit-flips are single-bit thus is correctable).
VII. Hardware Test
Testing this on hardware is straightforward. You can use the demo projects already available in the UberDDR3 repo. In my case, I used the Arty-S7 demo project. We can enable the ECC feature of UberDDR3 by setting the ECC_ENABLE parameter to 1 (or 2).
As mentioned in the blog post "Getting Started with UberDDR3 (Part 1)", an internal test sequence is performed after the initial calibration. This sequence functions like a built-in self-test, executing write and read requests with various patterns across all DDR3 addresses. The calibration is officially complete only if this test sequence ends with no errors.
This means you can simply run UberDDR3 with ECC on hardware, and once it finishes calibration (perhaps indicated by LEDs lighting up), you can confidently say that UberDDR3 with ECC has been properly tested on hardware!
For demonstration purposes, I attached an ILA to the demo project. As shown below, it successfully reaches the DONE_CALIBRATE state (state_calibrate == 23) indicating that the calibration and test sequence have completed successfully. Before that, notice how the sb_err_o signal keeps toggling randomly while the db_err_o is always deasserted, similar to the simulation shown above, since I set the ECC_TEST parameter to 1:
VIII. Conclusion
In this first part of our update on UberDDR3’s new ECC capability, we've explored how you can now add a layer of error protection to your designs, even with standard non-ECC DDR3 memory. This feature enhances the reliability of your projects, giving you peace of mind knowing that your data is protected against common memory errors. Whether you're working on complex systems or simple applications, this upgrade brings an extra level of security without the need to use ECC-capable DRAM memory. And all of that is achieved simply by setting the ECC_ENABLE parameter!
Stay tuned for Part 2, where we will delve deeper into the final ECC option, inspired by LPDDR4: Inline ECC!
Commenti