Re: Bug report: mdadm -E oddity

linux-raid.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Paul Clements <paul.clements@steeleye.com>
To: Doug Ledford <dledford@redhat.com>
Cc: linux-raid@vger.kernel.org
Subject: Re: Bug report: mdadm -E oddity
Date: Fri, 20 May 2005 12:04:34 -0400	[thread overview]
Message-ID: <428E0A92.3060108@steeleye.com> (raw)
In-Reply-To: <1116592212.23785.75.camel@compaq-rhel4.xsintricity.com>

Hi Doug,

Doug Ledford wrote:
> On Fri, 2005-05-20 at 17:00 +1000, Neil Brown wrote:

>>There is a converse to this.  People should be made to take notice if
>>there is possible data corruption.
>>
>>i.e. if you have a system crash while running a degraded raid5, then
>>silent data corruption could ensue.  mdadm will currently not start
>>any array in this state without an explicit '--force'.  This is somewhat
>>akin to fsck sometime requiring human interaction.  Ofcourse if there
>>is good reason to believe the data is still safe, mdadm should -- and
>>I believe does -- assemble the array even if degraded.
> 
> 
> Well, as I explained in my email sometime back on the issue of silent
> data corruption, this is where journaling saves your ass.  Since the
> journal has to be written before the filesystem proper updates are
> writting, if the array goes down it either is in the journal write, in
> which case you are throwing those blocks away anyway and so corruption
> is irrelevant, or it's in the filesystem proper writes and if they get
> corrupted you don't care because we are going to replay the journal and
> rewrite them.

I think you may be misunderstanding the nature of the data corruption 
that ensues when a system with a degraded raid4, raid5, or raid6 array 
crashes. Data that you aren't even actively writing can get corrupted. 
For example, say we have a 3 disk raid5 and disk 3 is missing. This 
means that for some stripes, we'll be writing parity and data:

disk1   disk2   {disk3}

  D1       P      {D2}

So, say we're in the middle of updating this stripe, and we're writing 
D1 and P to disk when the system crashes. We may have just corrupted D2, 
which isn't even active right now. This is because we'll use D1 and P to 
reconstruct D2 when disk3 (or its replacement) comes back. If we wrote 
D1 and not P, then when we use D1 and P to reconstruct D2, we'll get the 
wrong data. Same goes if we wrote P and not D1, or some partial piece of 
either or both.

There's no way for a filesystem journal to protect us from D2 getting 
corrupted, as far as I know.

Note that if we lose the parity disk in a raid4, this type of data 
corruption isn't possible. Also note that for some stripes in a raid5 or 
raid6, this type of corruption can't happen (as long as the parity for 
that stripe is on the missing disk). Also, if you have a non-volatile 
cache on the array, as most hardware RAIDs do, then this type of data 
corruption doesn't occur.

--
Paul

next prev parent reply	other threads:[~2005-05-20 16:04 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-05-13 15:44 Bug report: mdadm -E oddity Doug Ledford
2005-05-13 17:11 ` Doug Ledford
2005-05-13 23:01   ` Neil Brown
2005-05-14 13:28     ` Doug Ledford
2005-05-15 17:32       ` Luca Berra
2005-05-20  7:00       ` Neil Brown
2005-05-20 12:30         ` Doug Ledford
2005-05-20 16:04           ` Paul Clements [this message]
2005-05-20 17:16             ` Peter T. Breuer
2005-05-20 18:40               ` Doug Ledford
2005-05-20 19:15                 ` Peter T. Breuer
2005-05-20 21:31                   ` Doug Ledford
2005-05-20 17:45             ` Doug Ledford
2005-05-20 18:33               ` Peter T. Breuer
2005-05-20 20:01                 ` berk walker
2005-05-20 21:00                   ` Gil
2005-05-20 21:51                   ` Peter T. Breuer
2005-05-20 22:14                     ` berk walker
2005-05-20 20:05                 ` Paul Clements
2005-05-16 16:46     ` Doug Ledford
2005-05-20  7:08       ` Neil Brown
2005-05-20 11:29         ` Doug Ledford
2005-05-16 22:11   ` Doug Ledford

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=428E0A92.3060108@steeleye.com \
    --to=paul.clements@steeleye.com \
    --cc=dledford@redhat.com \
    --cc=linux-raid@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).