public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* ECC and DMA to/from disk controllers
@ 2007-09-10 12:19 Bruce Allen
  2007-09-10 13:54 ` Alan Cox
  2007-09-10 18:05 ` linux-os (Dick Johnson)
  0 siblings, 2 replies; 6+ messages in thread
From: Bruce Allen @ 2007-09-10 12:19 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: Bruce Allen, Bruce Allen

Dear LKML,

Apologies in advance for potential mis-use of LKML, but I don't know where 
else to ask.

An ongoing study on datasets of several Petabytes have shown that there 
can be 'silent data corruption' at rates much larger than one might 
naively expect from the expected error rates in RAID arrays and the 
expected probability of single bit uncorrected errors in hard disks.

The origin of this data corruption is still unknown.  See for example 
http://cern.ch/Peter.Kelemen/talk/2007/kelemen-2007-C5-Silent_Corruptions.pdf

In thinking about this, I began to wonder about the following.  Suppose 
that a (possibly RAID) disk controller correctly reads data from disk and 
has correct data in the controller memory and buffers.  However when that 
data is DMA'd into system memory some errors occur (cosmic rays, 
electrical noise, etc).  Am I correct that these errors would NOT be 
detected, even on a 'reliable' server with ECC memory?  In other words the 
ECC bits would be calculated in server memory based on incorrect data from 
the disk.

The alternative is that disk controllers (or at least ones that are meant 
to be reliable) DMA both the data AND the ECC byte into system memory. 
So that if an error occurs in this transfer, then it would most likely be 
picked up and corrected by the ECC mechanism.  But I don't think that 
'this is how it works'.  Could someone knowledgable please confirm or 
contradict?

Cheers,
 	Bruce

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-09-14  9:48 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <fa.qDlrttL6xIyU0yqEbWvsJiyy5tU@ifi.uio.no>
2007-09-11  1:09 ` ECC and DMA to/from disk controllers Robert Hancock
2007-09-13  3:37   ` Bruce Allen
2007-09-10 12:19 Bruce Allen
2007-09-10 13:54 ` Alan Cox
2007-09-14  9:32   ` KELEMEN Peter
2007-09-10 18:05 ` linux-os (Dick Johnson)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox