> From my understanding, the only risk to your data from non-ECC is a bit flip in RAM, pre-checksum calculation. In that unlikely scenario, you commit bad data to disk as good data(valid checksum).
Wouldn't an option to do it twice in different memory regions be nice? I'm pretty sure in many use cases scarifying performance for greater reliability wouldn't be an issue. Given how many cores we have available nowadays it could potentially even not have that much impact on performance.
Also are there any software solutions (like a kernel patch) which would do "software ECC"? I imagine in this case performance hit would be quite devastating but it still could be acceptable trade-off for NAS-like systems where you want to have lots of RAM for dedup and cache but it's not a busy system.
There is still a race condition: if you read data from disk into a buffer, make a copy of the buffer, then do 2 checksums, the bit flip can still occur before the 2nd copy is created.
Wouldn't an option to do it twice in different memory regions be nice? I'm pretty sure in many use cases scarifying performance for greater reliability wouldn't be an issue. Given how many cores we have available nowadays it could potentially even not have that much impact on performance.
Also are there any software solutions (like a kernel patch) which would do "software ECC"? I imagine in this case performance hit would be quite devastating but it still could be acceptable trade-off for NAS-like systems where you want to have lots of RAM for dedup and cache but it's not a busy system.