From: Jaromir Capik <jcapik@redhat.com>
To: stan@hardwarefreak.com
Cc: Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust
Date: Mon, 23 Jul 2012 05:34:17 -0400 (EDT) [thread overview]
Message-ID: <542194327.593466.1343036057599.JavaMail.root@redhat.com> (raw)
In-Reply-To: <500CD32E.4000800@hardwarefreak.com>
Hello Stan.
I received your reply without having the Linux RAID list in Cc
and thus I was unsure if you wanna discuss that privately or not.
I always choose reply to all unless I really want to remove
some of the recipients :]
Cheers,
Jaromir.
>
> Please keep discussion on list. This is probably an MUA issue.
> Happens
> to me on occasion when I hit "reply to list" instead of "reply to
> all".
> vger doesn't provide a List-Post: header so "reply to list" doesn't
> work and you end up replying to the sender.
>
> On 7/22/2012 5:11 PM, Jaromir Capik wrote:
> >>> I admit, that the problem could lie elsewhere ... but that
> >>> doesn't
> >>> change anything on the fact, that the data became corrupted
> >>> without
> >>> me noticing that.
> >>
> >> The key here I think is "without me noticing that". Drives
> >> normally
> >> cry
> >> out in the night, spitting errors to logs, when they encounter
> >> problems.
> >> You may not receive an immediate error in your application,
> >> especially
> >> when the drive is a RAID member and the data can be shipped
> >> regardless
> >> of the drive error. If you never check your logs, or simply don't
> >> see
> >> these disk errors, how will you know there's a problem?
> >
> > Hello Stan.
> >
> > I used to periodically check logs as well as S.M.A.R.T. attributes.
> > And I believe I've already mentioned two of the cases and how
> > I finally discovered the issues. Moreover I switched from manual
> > checking to receiving emails from monitoring daemons. And even
> > if you receive such email, it usually takes some time to replace
> > the failing drive. That time window might be fatal for your data
> > if junk is read from one of the drives and when it's followed
> > by a write. Such write would destroy the second correct copy ...
> >
> >>
> >> Likewise, if the checksumming you request is implemented in
> >> md/RAID1,
> >> and your application never sees a problem when a drive heads
> >> South,
> >> and
> >> you never check your logs and thus don't see the checksum
> >> errors...
> >
> > You wouldn't have to ... because the corrupted chunks would be
> > immediately resynced with good data and you'll REALLY get some
> > errors
> > in the logs if the harddrive or controller or it's driver doesn't
> > produce them for whatever reason.
> >
> >>
> >> How is this new checksumming any better than the current
> >> situation?
> >> The
> >> drive is still failing and you're still unaware of it.
> >
> > Do you believe, that other reasons of silent data corruptions
> > simply
> > do not exist? Try to imagine a case, when the correct data aren't
> > written at all to one of the drives due to a bug in the drive's
> > firmware
> > or due to a bug in the controller design or due to a bug in the
> > controller driver or due to other reasons. Such bug could be
> > tiggered
> > by anything ... it could be a delay in the read operation when the
> > sector is not well readable or any race condition, etc. Especially
> > new devices and their very first versions are expected to be buggy.
> > Checksuming would prevent them all and would make the whole
> > I/O really bulletproof.
>
>
next prev parent reply other threads:[~2012-07-23 9:34 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1082734092.338339.1342995087426.JavaMail.root@redhat.com>
2012-07-23 4:29 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Stan Hoeppner
2012-07-23 9:34 ` Jaromir Capik [this message]
2012-07-23 10:53 ` Stan Hoeppner
2012-07-23 17:03 ` Piergiorgio Sartor
2012-07-23 18:24 ` Roberto Spadim
2012-07-23 21:31 ` Drew
2012-07-23 21:42 ` Roberto Spadim
2012-07-24 4:42 ` Stan Hoeppner
2012-07-24 12:51 ` Roberto Spadim
2012-07-27 6:06 ` Adam Goryachev
2012-07-27 13:42 ` Roberto Spadim
2012-07-24 15:09 ` Jaromir Capik
[not found] <1897705147.341625.1342995720661.JavaMail.root@redhat.com>
2012-07-23 4:30 ` Stan Hoeppner
[not found] <17025a94-1999-4619-b23d-7460946c2f85@zmail15.collab.prod.int.phx2.redhat.com>
2012-07-18 11:01 ` Jaromir Capik
2012-07-18 11:13 ` Mathias Burén
2012-07-18 12:42 ` Jaromir Capik
2012-07-18 11:15 ` NeilBrown
2012-07-18 13:04 ` Jaromir Capik
2012-07-19 3:48 ` Stan Hoeppner
2012-07-20 12:53 ` Jaromir Capik
2012-07-20 18:24 ` Roberto Spadim
2012-07-20 18:30 ` Roberto Spadim
2012-07-20 20:07 ` Jaromir Capik
2012-07-20 20:21 ` Roberto Spadim
2012-07-20 20:44 ` Jaromir Capik
2012-07-20 20:59 ` Roberto Spadim
2012-07-21 3:58 ` Stan Hoeppner
2012-07-18 11:49 ` keld
2012-07-18 13:08 ` Jaromir Capik
2012-07-18 16:08 ` Roberto Spadim
2012-07-20 10:35 ` Jaromir Capik
2012-07-18 21:02 ` keld
2012-07-18 16:28 ` Asdo
2012-07-20 11:07 ` Jaromir Capik
2012-07-20 11:14 ` Oliver Schinagl
2012-07-20 11:28 ` Jaromir Capik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=542194327.593466.1343036057599.JavaMail.root@redhat.com \
--to=jcapik@redhat.com \
--cc=linux-raid@vger.kernel.org \
--cc=stan@hardwarefreak.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.