From: Roger Heflin <rogerheflin@gmail.com>
To: Greg Freemyer <greg.freemyer@gmail.com>
Cc: "Michał Przyłuski" <mikylie@gmail.com>,
"Peter Rabbitson" <rabbit+list@rabbit.us>,
Redeeman <redeeman@metanurb.dk>,
linux-raid@vger.kernel.org
Subject: Re: detection/correction of corruption with raid6
Date: Fri, 05 Dec 2008 18:39:18 -0600 [thread overview]
Message-ID: <4939C9B6.30600@gmail.com> (raw)
In-Reply-To: <87f94c370812051443nd154992kfb61f3b6f0f5625d@mail.gmail.com>
Greg Freemyer wrote:
> I'm also very concerned about silent corruption and we often "verify"
> our critical large files by performing MD5 verifies against a known
> good value. Especially when we make copies or move them from one
> media to another.
>
> But in all the cases of silent corruption I've seen, it was never the
> disk. Instead I've seen it be the cable, the controller, bad memory,
> bad power supply, but never the disk itself. Not to say the disk
> controller could not be the cause, just that I have not seen it.
>
> I did not read the relevant threads, but do they cover all of these
> sources of silent corruption, or just if a disk is the source?
>
> Thanks
> Greg
I will second what Greg says, I have debugged a number of corruptions
related to filesystems. I have never seen it be the disk, I have
seen 3-4 different controllers corrupt (bad PCI/MB interaction-2
different manufacturers controllers, and a bad controller).
And then the #1 issue is actual bad memory or bad power supply in the
machine. None of the actual cases I saw actually affected *ONLY* a
single disk=they affected all of the disks on the controller, so
whatever has to be done would almost have to be done a the filesystem
level or the application level. The typical corruption is not data
off of the disk, the platters themselves (and the internals of the
disk) appear to have very very good corruption detection and
correction, it is really really unlikely for a bad sector read to not
get caught. The PCI bus only has parity (and likely parity errors on
the PCI bus are not being monitored-unless you installed the edac_mc
module) so 50% of the errors that happen get missed. This was one
of the bad PCI/MB interactions, one of the slots on a certain MB (all
of the specific MB with a couple of different companies card) *HAD* to
be throttled to not produce corrupt data every 1GB of reads or so.
And internally the controllers often have poor checking, and will miss
things if the controller goes bad. The disks themselves appear to
have very good internal controls-I have never seen disk electronics
screw up and corrupt data either.
Basically don't waste time worrying about the single disk corrupting
data silently, worry about everything after the disk first as that is
the weakest link of everything and is far far more likely to bite you.
next prev parent reply other threads:[~2008-12-06 0:39 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-12-05 21:00 detection/correction of corruption with raid6 Redeeman
2008-12-05 21:02 ` Justin Piszcz
2008-12-05 21:06 ` Redeeman
2008-12-05 21:09 ` Justin Piszcz
2008-12-05 21:12 ` Redeeman
2008-12-05 21:17 ` Justin Piszcz
2008-12-05 21:30 ` Michał Przyłuski
2008-12-05 22:12 ` Peter Rabbitson
2008-12-05 22:26 ` Michał Przyłuski
2008-12-05 22:43 ` Greg Freemyer
2008-12-06 0:39 ` Roger Heflin [this message]
2008-12-12 15:31 ` Redeeman
2008-12-16 2:33 ` Neil Brown
2008-12-16 6:33 ` Redeeman
2008-12-16 7:59 ` Mattias Wadenstein
2008-12-16 22:20 ` Chris Worley
-- strict thread matches above, loose matches on Subject: below --
2008-12-16 21:58 Piergiorgio Sartor
2008-12-16 22:25 ` Redeeman
2008-12-17 21:52 ` Piergiorgio Sartor
2008-12-19 4:39 ` Neil Brown
2008-12-19 5:38 ` Redeeman
2008-12-17 14:48 ` Bill Davidsen
2008-12-17 15:50 ` David Lethe
[not found] ` <494960E8.8020407@tmr.com>
2008-12-17 21:47 ` David Lethe
2008-12-19 8:40 piergiorgio.sartor
2008-12-19 13:10 ` Redeeman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4939C9B6.30600@gmail.com \
--to=rogerheflin@gmail.com \
--cc=greg.freemyer@gmail.com \
--cc=linux-raid@vger.kernel.org \
--cc=mikylie@gmail.com \
--cc=rabbit+list@rabbit.us \
--cc=redeeman@metanurb.dk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).