From: Stan Hoeppner <stan@hardwarefreak.com>
To: Jaromir Capik <jcapik@redhat.com>,
Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust
Date: Sun, 22 Jul 2012 23:29:34 -0500 [thread overview]
Message-ID: <500CD32E.4000800@hardwarefreak.com> (raw)
In-Reply-To: <1082734092.338339.1342995087426.JavaMail.root@redhat.com>
Please keep discussion on list. This is probably an MUA issue. Happens
to me on occasion when I hit "reply to list" instead of "reply to all".
vger doesn't provide a List-Post: header so "reply to list" doesn't
work and you end up replying to the sender.
On 7/22/2012 5:11 PM, Jaromir Capik wrote:
>>> I admit, that the problem could lie elsewhere ... but that doesn't
>>> change anything on the fact, that the data became corrupted without
>>> me noticing that.
>>
>> The key here I think is "without me noticing that". Drives normally
>> cry
>> out in the night, spitting errors to logs, when they encounter
>> problems.
>> You may not receive an immediate error in your application,
>> especially
>> when the drive is a RAID member and the data can be shipped
>> regardless
>> of the drive error. If you never check your logs, or simply don't
>> see
>> these disk errors, how will you know there's a problem?
>
> Hello Stan.
>
> I used to periodically check logs as well as S.M.A.R.T. attributes.
> And I believe I've already mentioned two of the cases and how
> I finally discovered the issues. Moreover I switched from manual
> checking to receiving emails from monitoring daemons. And even
> if you receive such email, it usually takes some time to replace
> the failing drive. That time window might be fatal for your data
> if junk is read from one of the drives and when it's followed
> by a write. Such write would destroy the second correct copy ...
>
>>
>> Likewise, if the checksumming you request is implemented in md/RAID1,
>> and your application never sees a problem when a drive heads South,
>> and
>> you never check your logs and thus don't see the checksum errors...
>
> You wouldn't have to ... because the corrupted chunks would be
> immediately resynced with good data and you'll REALLY get some errors
> in the logs if the harddrive or controller or it's driver doesn't
> produce them for whatever reason.
>
>>
>> How is this new checksumming any better than the current situation?
>> The
>> drive is still failing and you're still unaware of it.
>
> Do you believe, that other reasons of silent data corruptions simply
> do not exist? Try to imagine a case, when the correct data aren't
> written at all to one of the drives due to a bug in the drive's firmware
> or due to a bug in the controller design or due to a bug in the
> controller driver or due to other reasons. Such bug could be tiggered
> by anything ... it could be a delay in the read operation when the
> sector is not well readable or any race condition, etc. Especially
> new devices and their very first versions are expected to be buggy.
> Checksuming would prevent them all and would make the whole
> I/O really bulletproof.
next parent reply other threads:[~2012-07-23 4:29 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1082734092.338339.1342995087426.JavaMail.root@redhat.com>
2012-07-23 4:29 ` Stan Hoeppner [this message]
2012-07-23 9:34 ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik
2012-07-23 10:53 ` Stan Hoeppner
2012-07-23 17:03 ` Piergiorgio Sartor
2012-07-23 18:24 ` Roberto Spadim
2012-07-23 21:31 ` Drew
2012-07-23 21:42 ` Roberto Spadim
2012-07-24 4:42 ` Stan Hoeppner
2012-07-24 12:51 ` Roberto Spadim
2012-07-27 6:06 ` Adam Goryachev
2012-07-27 13:42 ` Roberto Spadim
2012-07-24 15:09 ` Jaromir Capik
[not found] <1897705147.341625.1342995720661.JavaMail.root@redhat.com>
2012-07-23 4:30 ` Stan Hoeppner
[not found] <17025a94-1999-4619-b23d-7460946c2f85@zmail15.collab.prod.int.phx2.redhat.com>
2012-07-18 11:01 ` Jaromir Capik
2012-07-18 11:13 ` Mathias Burén
2012-07-18 12:42 ` Jaromir Capik
2012-07-18 11:15 ` NeilBrown
2012-07-18 13:04 ` Jaromir Capik
2012-07-19 3:48 ` Stan Hoeppner
2012-07-20 12:53 ` Jaromir Capik
2012-07-20 18:24 ` Roberto Spadim
2012-07-20 18:30 ` Roberto Spadim
2012-07-20 20:07 ` Jaromir Capik
2012-07-20 20:21 ` Roberto Spadim
2012-07-20 20:44 ` Jaromir Capik
2012-07-20 20:59 ` Roberto Spadim
2012-07-21 3:58 ` Stan Hoeppner
2012-07-18 11:49 ` keld
2012-07-18 13:08 ` Jaromir Capik
2012-07-18 16:08 ` Roberto Spadim
2012-07-20 10:35 ` Jaromir Capik
2012-07-18 21:02 ` keld
2012-07-18 16:28 ` Asdo
2012-07-20 11:07 ` Jaromir Capik
2012-07-20 11:14 ` Oliver Schinagl
2012-07-20 11:28 ` Jaromir Capik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=500CD32E.4000800@hardwarefreak.com \
--to=stan@hardwarefreak.com \
--cc=jcapik@redhat.com \
--cc=linux-raid@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).