AM3 and AM4 Motherboards for Home Servers - MCE, ECC and RAS Features

ASRock AM4 Pro4 Motherboard

A while back I have built ECC platforms based on consumer hardware and found the information on the web is not really accurate.

Specifically, while ASUS AM3 boards support ECC mode and I have been using this for some years now. I have never witnessed any error being reported during this time, but I am located near the sea level which has influence on the number of cosmic rays.

There is also Machine Check Exceptions support on these platforms but the motherboard itself is hit and miss. These are useful to track errors in CPU caches and other parts, that help prevent data corruption and make you aware of damaged hardware (mostly PSU or board VRMs).

Some TLDR for CPUs:

  • AMD Phenom I/II and Athlon 64 X2 chips support error reporting through module "edac_mce_amd". This module works without ECC and reports cache or other errors related to the CPU.
  • Athlon II also works.
  • For AM4 Ryzen, APUs only support ECC if they are from "Pro" line.
TLDR for motherboard support:
  • ASUS AM3 M4A8xx motherboards officially support ECC but you should not rely on it.
  • ASUS AM4 and Ryzen support ECC through RASDaemon but only on up to date BIOS.
  • Older ASUS AM4 BIOS report through kernel methods but only uncorrectable errors(UE) are logged in '/sys' nodes.
  • All ASRock AM4 boards seem to support ECC mode, with corrected error reporting.
  • Gigabyte AM4 B550 boards mention ECC mode support.
  • Only ECC Unbuffered RAM is supported. (PC3/4-xxxxxE JEDEC specs)

On the kernel side:

  • RAM ECC is supported through "amd64_edac" module. 
  • CPU error reporting is handled by "edac_mce_amd".
Tested hardware:
  • ASUS M4A87TD/USB3
  • ASUS M4A88TD-V EVO/USB3
  • ASUS A320M-K
  • ASUS EX320M Gaming
  • ASUS ROG Strix B450-F
  • ASRock A520M Pro4

Testing Methods

At this point I've had the ASUS M4A board run with ECC RAM for 3+ years and saw no errors reported. This is not really expected even for good hardware!

The only way I see to rapidly test ECC is working is to set unstable timings with ECC disabled and confirm with Memtest86. Next, set it to on and verify it is reporting errors and/or fixing them.

ASUS AM3 Motherboards

ASUS M4A88T

The kernel itself only shows messages with no detail, no matter what kernel parameters are passed to 'mce' boot parameter:

[Hardware error] Machine Check Exception

From testing, these will be corrected errors but I don't know how it will handle uncorrectable errors, as those are harder to reproduce. There is some level of functionality here but it seems the kernel will not be aware of corruption of memory from uncorrectable errors.

The first problem is there is no additional information on what exactly the error is, so the OS will not know if it needs to kill some process to prevent data corruption. There should be additional lines after the [Hardware error] entry but the motherboards is not handling the error further.

Also, the '/sys' nodes for 'mc*' entries in edac module are not be populated with error counts. So you can't really track them over time without custom scripts that monitor the kernel log.

I don't consider ECC to be fully functional on these boards because of this, though some posts seemed to imply ECC was correctly supported.

These boards also don't report any kind of error related to CPU errors. I was first aware of this functionality when a damaged ASRock board started locking up but due to errors reported to the OS. On compatible boards, these show up on the kernel messages in the following format:

[Hardware error] Machine Check Excpetion logged
[Hardware error] ERROR DETAILS
These do not get recorded on MCE Log but are specifically handled by the kernel. ('edac_mce_amd' module) This is useful because uncorrected errors can then discard buffers or kill the process with corrupted data.

Because of ASUS not enabling this functionality, you may get some data corruption if the PSU or motherboard VRM are damaged. I would not rely on this hardware without regularly testing CPU stability with something like Prime95.

ASRock AM4


I have personally had good success with the A520M Pro 4 motherboard. When using a Zen 3, Ryzen based 5600 CPU, everything seems to work as expected. For instance, corrected errors (CE) are reported correctly:

[  988.187253] mce: [Hardware Error]: Machine check events logged
[  988.187257] [Hardware Error]: Corrected error, no action required.
[  988.187266] [Hardware Error]: CPU:0 (19:21:2) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c2040000000011b
[  988.187282] [Hardware Error]: Error Addr: 0x000000008df651c0
[  988.187287] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xd29d00080a800c02
[  988.187295] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
[  988.187307] EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x15beca offset:0x2c0 grain:64 syndrome:0x8)
[  988.187315] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
[ 1315.867282] mce: [Hardware Error]: Machine check events logged
[ 1315.867287] [Hardware Error]: Corrected error, no action required.
[ 1315.867296] [Hardware Error]: CPU:0 (19:21:2) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c2040000000011b
[ 1315.867311] [Hardware Error]: Error Addr: 0x00000000acee4840
[ 1315.867316] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xd29d00080a800c02
[ 1315.867324] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
[ 1315.867335] EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x199dc9 offset:0x40 grain:64 syndrome:0x8)
[ 1315.867343] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD
[ 1643.548321] mce: [Hardware Error]: Machine check events logged
[ 1643.548326] [Hardware Error]: Corrected error, no action required.
[ 1643.548335] [Hardware Error]: CPU:0 (19:21:2) MC17_STATUS[-|CE|MiscV|AddrV|-|-|SyndV|CECC|-|-|-]: 0x9c2040000000011b
[ 1643.548351] [Hardware Error]: Error Addr: 0x00000000e9115f80
[ 1643.548356] [Hardware Error]: IPID: 0x0000009600050f00, Syndrome: 0xd29d00080a800c02
[ 1643.548364] [Hardware Error]: Unified Memory Controller Ext. Error Code: 0
[ 1643.548376] EDAC MC0: 1 CE on mc#0csrow#2channel#0 (csrow:2 channel:0 page:0x21222b offset:0xe80 grain:64 syndrome:0x8)
[ 1643.548384] [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

As you can see, both EDAC interface and MCE report the memory errors that were detected during memory scrub.

This is rather important feature that allows you to detect configuration or other RAM problems that may eventually cause uncorrectable errors (UEs).

Looking at the '/sys/devices/system/edac/mc/mc0/*e_count' nodes, will give you a summary of the number of each type of error since system boot.

The motherboard also works fine with both RASDaemon and mcelog software:

  • When an hardware reboot is forced by MCE, the kernel logs the error when the system reboots. It is visible with dmesg command (dmesg | grep mce).
  • It is also acessible by running mcelog on the terminal, upon which it will be consumed from the error stack.
  • Further mcelog executions without runtime errors will return empty.
Overall, it is not surprising the ASRock has high quality support for ECC on AM4 - they have been using this type of consumer hardware on the ASRock Rack boards. They have been providing this type of hardware for server boards for many years, which is not so much the case with ASUS.

ASUS AM4
ASUS EX320M Gaming

On ASUS AM4 boards, older BIOS versions would work as the AM3 boards but memory error reporting was working correctly - you could read /sys and get error counts, or look at the kernel log.

On current BIOS, AMD has updated the error reporting to the modern RAS functionality of the Linux Kernel. You have to install RAS Daemon and use 'ras-mc-ctl' command to read error counts.

The corrected errors (CE) are reported in RAS but are not counted in EDAC 'mc*' nodes, only uncorrected errors. The kernel handles UEs and won't reboot the system if the process is non-critical or hits a page cache, which is discarded.

CPU related MCE/RAS may require some register tweaking, according to AMD's documentation for these CPUs. I have managed to reproduce CPU crashes by undervolting, with no errors reported in the kernel or RAS Daemon.

Other AM4 Brands

Gigabyte AM4 based on B550 and X570 chipsets are explicitely listed to support ECC mode with unbuffered DIMMs. Boards with A520 chipsets mention ECC but not explicitely "ECC Mode", which may mean it will work normally with ECC DIMMs but not offer the same advanced functionality.

All series of these boards seem to work as long as you select one of those two chipsets.

Gigabyte A520I AC I managed to also boot with ECC and there are a lot of configuration options related to ECC. Further testing is needed on this brand.