Why Do SSDs Fail?

“Why did my SSD fail?” This is a question we see a lot, regardless of brand. Some companies, like BackBlaze, have been sharing their SSD failure data, but does this really apply to a single consumer’s drive? The fact is, storage devices tend to follow the Bathtub Curve model where most failures happen early or late in life. Users will usually look at the Mean Time Before Failure (MTBF) and Total Bytes Written (TBW) ratings for their SSD and associate that with reliability, when in fact the former cannot escape the Bathtub Curve (random failures) and the latter falsely suggests that most SSDs fail from NAND/flash wear.

Bathtub Curve
Bathtub Curve.

In fact, studies have found that less than 1 in 20 SSDs perish due to the flash. That’s not to say that drives from sketchy brands won’t use inferior flash more prone to failure, just that reliable brands use robust flash that can survive a lot of writes. Quality control on the drive’s printed circuit board (PCB) can also cause problems in about 1 of 20 drives, although these are inspected before a drive is shipped. While we inspect our RMAs to find the problem source, about 1 in 8 drives fail for an unknown reason. Another 1/8th have issues with broken connectors or ports, or have some component on the board completely fail - which can include the controller in some circumstances.

The reality, then, is that about 2/3rds of SSDs fail from physical or environmental damage, or firmware issues. This can be caused by too many power failures from an unstable system, prolonged exposure to high temperatures, electrostatic discharge, shipping, too much pressure on the drive as with bending, or other mechanical impact or failure. Firmware failures can include issues with the system reserved part of the flash required for operation. It’s best not to jump to conclusions - make sure your drive is really broken before returning it; trying it in another system is a good start.

Source DOI: 10.1109/JPROC.2017.2725738

If you do come to the conclusion that your drive is broken, contact customer or technical support and see if they can catch something you missed. In the worst case, submit an RMA. We work with various companies to make sure our firmware is solid and that we have good hardware for the flash and other components - but things do happen. Anecdotally, everybody knows somebody who swears off a brand due to a poor past experience, but we ensure you that we study our RMAs and firmware seriously in an effort to make sure our drives are one thing you don’t have to worry about.

For more information, see How SSDs Fail from NVM Express and Why SSDs Die from Elcomsoft.

See our storage products here.