Are there laptops with ECC memory

ECC - Error Correcting Code

ECC is an error correction method for memories in which redundant information is generated in order to be able to detect and correct data errors. The aim is to lower the statistical bit error rate. ECC can correct 1-bit errors immediately. 2-bit errors are recognized but not corrected. Not all 3-bit errors can be recognized. There are memory protection procedures that can do a lot more.
ECC memory is a technical measure to improve reliability, availability and serviceability (RAS for short). However, ECC reduces the maximum usable storage space or costs computing and transmission power. The error correction takes place via redundancy and thus consumes storage space or transmission capacity.

And just with an ECC memory, a commercially available PC does not turn into a highly available machine. This usually requires several measures. ECC is just one of them. Typically, ECC memory occurs in servers and workstations. Rarely in PCs.

ECC memory

One would hardly assume that a data error can occur in a closed system such as a computer. Why should ECC memory make sense here? In fact, some memory modules and motherboards are particularly prone to data errors. The probability of data errors in memory increases with the size of the memory, but also with its average utilization. Errors in memory can corrupt data. Incorrect data and calculations as well as crashes can result.
Although ECC memory protects against data errors, its use in a privately used computer makes little sense. If only a fraction of the physical memory is used and the system only runs a few hours a day, then data errors will rarely occur there. It looks completely different in servers and workstations. The main memory is considerably larger here and is used more intensively.

ECC memory protection

A functioning ECC memory protection requires the cooperation of 4 components. The memory controller must be able to use the respective ECC algorithm. Special ECC memory modules are used, which are of course more expensive than normal memory modules. The motherboard must provide 72 data lines per memory channel. 8 more than normal memory channels. And the BIOS has to turn on the ECC memory protection of the memory controller.
ECC memory protection consists in the memory controller generating an additional byte of redundant data for each 8-byte data word before writing. This creates a 9-byte data word. This means that an ECC memory module has to be wider in order to be able to store the additional data. An ECC memory module (DIMM) has 12.5 percent more memory and an additional 8 data lines. Normal DIMM memory modules have 64 data lines. With ECC-DIMMs there are 72 data lines.

ECC algorithm

There is no such thing as an ECC algorithm. There are different and patented algorithms with different properties. The ECC algorithm is hardwired into the memory controller to avoid a loss of speed. In the case of read access, the ECC algorithm uses the redundant data to check whether the user data has changed. The memory controller can reliably detect and correct single bit errors. The system continues to run normally. 2-bit errors are recognized but not corrected. Not all 3-bit errors can be recognized. There are memory protection procedures that can do a lot more.

Why is ECC actually necessary for memory?

The use of ECC-RAM is advisable whenever data errors have a major impact, if the main memory is used intensively over a long period of time (a lot of write and read accesses). Typically for storage systems or virtual machines working in parallel. Usually also with systems that work around the clock. ECC leads to higher reliability here.
With typical workstation computers such as desktops or notebooks, ECC-RAM can be dispensed with. Due to the lower processing load, the main memory is also used less here. There are fewer errors. Most of the time, these errors have no effect. The system may hang or crash.
However, systems without ECC also make it more difficult to find the cause of the error. For example defective memory bars. With ECC, every memory error is logged by the operating system and can be evaluated later for diagnostic purposes.

Here's a little anecdote: The National Laboratory in Los Alamos (New Mexico, USA) once complained to IBM that their colleagues in Livermore (California, USA) experience far fewer errors on the same computers.
You have to know that Los Alamos is at about 2200 m altitude, while Livermore near San Francisco is about sea level. This means that in Los Alamos at an altitude of approx. 2200 m, the cosmic neutron radiation is five times stronger than at sea level. In the aircraft, the strength of the radiation is a hundred times as high.

In general, the probability of errors in processors and memory is low. As a rule, one only has to deal with ambient radiation, which can trigger errors. In Los Alamos it looks very different, of course. The sensitivity of memories, caches and registers in particular has increased due to smaller semiconductor structures. In order to catch errors, therefore, the error correction via ECC is used. This is necessary in systems that process a lot of data and have a large working memory. In mainframes, hundreds of error corrections can occur every day.
Against this background, it is not surprising that Intel holds patents for radiation detection and builds corresponding radiation detectors into chips.

Applications of ECC

  • RAM in PCs and servers
  • Processor caches
  • Flash memory
  • Hard drives

Other related topics:

Everything you need to know about computer technology.

Computer technology primer

The computer technology primer is a book about the basics of computer technology, processor technology, semiconductor memory, interfaces, data storage devices, drives and important hardware components.

I want that!

Everything you need to know about computer technology.

Computer technology primer

The computer technology primer is a book about the basics of computer technology, processor technology, semiconductor memory, interfaces, data storage devices, drives and important hardware components.

I want that!