All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stan Hoeppner <stan@hardwarefreak.com>
To: Jaromir Capik <jcapik@redhat.com>,
	Linux RAID <linux-raid@vger.kernel.org>
Subject: Re: [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust
Date: Sun, 22 Jul 2012 23:29:34 -0500	[thread overview]
Message-ID: <500CD32E.4000800@hardwarefreak.com> (raw)
In-Reply-To: <1082734092.338339.1342995087426.JavaMail.root@redhat.com>

Please keep discussion on list.  This is probably an MUA issue.  Happens
to me on occasion when I hit "reply to list" instead of "reply to all".
 vger doesn't provide a List-Post: header so "reply to list" doesn't
work and you end up replying to the sender.

On 7/22/2012 5:11 PM, Jaromir Capik wrote:
>>> I admit, that the problem could lie elsewhere ... but that doesn't
>>> change anything on the fact, that the data became corrupted without
>>> me noticing that.
>>
>> The key here I think is "without me noticing that".  Drives normally
>> cry
>> out in the night, spitting errors to logs, when they encounter
>> problems.
>>  You may not receive an immediate error in your application,
>>  especially
>> when the drive is a RAID member and the data can be shipped
>> regardless
>> of the drive error.  If you never check your logs, or simply don't
>> see
>> these disk errors, how will you know there's a problem?
> 
> Hello Stan.
> 
> I used to periodically check logs as well as S.M.A.R.T. attributes.
> And I believe I've already mentioned two of the cases and how
> I finally discovered the issues. Moreover I switched from manual
> checking to receiving emails from monitoring daemons. And even
> if you receive such email, it usually takes some time to replace
> the failing drive. That time window might be fatal for your data
> if junk is read from one of the drives and when it's followed
> by a write. Such write would destroy the second correct copy ...
> 
>>
>> Likewise, if the checksumming you request is implemented in md/RAID1,
>> and your application never sees a problem when a drive heads South,
>> and
>> you never check your logs and thus don't see the checksum errors...
> 
> You wouldn't have to ... because the corrupted chunks would be 
> immediately resynced with good data and you'll REALLY get some errors
> in the logs if the harddrive or controller or it's driver doesn't
> produce them for whatever reason.
> 
>>
>> How is this new checksumming any better than the current situation?
>>  The
>> drive is still failing and you're still unaware of it.
> 
> Do you believe, that other reasons of silent data corruptions simply
> do not exist? Try to imagine a case, when the correct data aren't
> written at all to one of the drives due to a bug in the drive's firmware
> or due to a bug in the controller design or due to a bug in the
> controller driver or due to other reasons. Such bug could be tiggered
> by anything ... it could be a delay in the read operation when the
> sector is not well readable or any race condition, etc. Especially
> new devices and their very first versions are expected to be buggy.
> Checksuming would prevent them all and would make the whole
> I/O really bulletproof. 


       reply	other threads:[~2012-07-23  4:29 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1082734092.338339.1342995087426.JavaMail.root@redhat.com>
2012-07-23  4:29 ` Stan Hoeppner [this message]
2012-07-23  9:34   ` [RFE] Please, add optional RAID1 feature (= chunk checksums) to make it more robust Jaromir Capik
2012-07-23 10:53     ` Stan Hoeppner
2012-07-23 17:03     ` Piergiorgio Sartor
2012-07-23 18:24       ` Roberto Spadim
2012-07-23 21:31         ` Drew
2012-07-23 21:42           ` Roberto Spadim
2012-07-24  4:42           ` Stan Hoeppner
2012-07-24 12:51             ` Roberto Spadim
2012-07-27  6:06           ` Adam Goryachev
2012-07-27 13:42             ` Roberto Spadim
2012-07-24 15:09         ` Jaromir Capik
     [not found] <1897705147.341625.1342995720661.JavaMail.root@redhat.com>
2012-07-23  4:30 ` Stan Hoeppner
     [not found] <17025a94-1999-4619-b23d-7460946c2f85@zmail15.collab.prod.int.phx2.redhat.com>
2012-07-18 11:01 ` Jaromir Capik
2012-07-18 11:13   ` Mathias Burén
2012-07-18 12:42     ` Jaromir Capik
2012-07-18 11:15   ` NeilBrown
2012-07-18 13:04     ` Jaromir Capik
2012-07-19  3:48       ` Stan Hoeppner
2012-07-20 12:53         ` Jaromir Capik
2012-07-20 18:24           ` Roberto Spadim
2012-07-20 18:30             ` Roberto Spadim
2012-07-20 20:07             ` Jaromir Capik
2012-07-20 20:21               ` Roberto Spadim
2012-07-20 20:44                 ` Jaromir Capik
2012-07-20 20:59                   ` Roberto Spadim
2012-07-21  3:58           ` Stan Hoeppner
2012-07-18 11:49   ` keld
2012-07-18 13:08     ` Jaromir Capik
2012-07-18 16:08       ` Roberto Spadim
2012-07-20 10:35         ` Jaromir Capik
2012-07-18 21:02       ` keld
2012-07-18 16:28   ` Asdo
2012-07-20 11:07     ` Jaromir Capik
2012-07-20 11:14       ` Oliver Schinagl
2012-07-20 11:28       ` Jaromir Capik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=500CD32E.4000800@hardwarefreak.com \
    --to=stan@hardwarefreak.com \
    --cc=jcapik@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.