Search

Error Correction (ECC) in Solid State Drives

Memory is not perfect. On the contrary, errors are expected, and NAND flash memory will accumulate more errors over time through wear, various forms of disturb from programming and reading, and also through limitations of data retention. The general amount of errors is known as the raw bit error rate, or RBER. The ability to read data from cells requires a precise measurement of charge based on voltage threshold, so there must be a way to correct these errors. Error correcting code (ECC) is used for this purpose.

There are two basic methods of error correction for modern SSDs: hard- and soft-decision decoding. Hard decoding is more rigid, hence the name, and less able to correct errors but is faster and more efficient with less die area in the controller required for the ECC engine. With old SSDs this was done with Bose-Chaudhuri-Hocquenghem (BCH) codes. The advent of TLC meant that errors were becoming more troublesome which led to the use of Low-Density Parity-Check (LDPC) codes. While less efficient, LDPC can also do soft decoding to repair more bits.

BCH & LDPC ECC

BCH vs. LDPC (hard and soft). Source.

If the SSD controller has issues with reading data it will engage ECC and also attempt read retries. More stages of ECC, and more read retries, are required for data with more errors. NAND that hosts stale data or has seen greater wear will have more errors. These steps can add latency which impacts performance. The SSD controller will try to refresh data before it gets too error-prone, although if there are unrecoverable bits it may be necessary to rely on extra parity data spread over multiple die planes to repair it. Blocks that fail here, or during erasure, will be retired and replaced if possible.

Our NVMe SSDs use the newest controllers available which also means the newest ECC. This includes updated LDPC and RAID ECC, but we also have end-to-end data path protection as described in our endurance blog. This ensures that your data remains uncorrupted for as long as possible. An unpowered SSD will eventually lose data, however, so we recommend letting a drive initialize at least once a year if possible. Your SSD should otherwise manage itself, particularly with an OS doing regular TRIM optimization, although doing a full-drive scan or read can encourage the early refreshing of stale data as can a rewriting of the drive.