public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Andi Kleen <ak@muc.de>
To: linux-kernel@vger.kernel.org
Subject: Re: critical bugs in md raid5
Date: 27 Jan 2005 10:51:02 +0100
Date: Thu, 27 Jan 2005 10:51:02 +0100	[thread overview]
Message-ID: <20050127095102.GA88779@muc.de> (raw)
In-Reply-To: <20050127063131.GA29574@schmorp.de>

> I disagree. When not working in degraded mode, it's absolutely reasonable
> to e.g. use only the non-parity data. A crash with raid5 is in no way

Yep. But when you go into degraded mode during the crash recovery 
(before the RAID is fully synced again) you lose.

> different to a crash without raid5 then: either the old data is on the
> disk, the new data is on the disk, or you had some catastrophic disk event
> and no data is on the disk.

No, that's not how RAID-5 works. For its redundancy it requires
coordinated writes of full stripes (= bigger than fs block) over
multiple disks. When you crash in the middle of a write and you
lose a disk during crash recovery there is no way to fully
reconstruct all the data because the XOR data recovery requires
valid data on all disks.

The nasty part there is that it can affect completely unrelated
data too (on a traditional disk you normally only lose the data
that is currently being written) because of of the relationship
between stripes on different disks.

> 
> The case I reported was not a catastrophic failure: either the old or new
> data was on the disk, and the filesystem journaling (which is ext3) will
> take care of it. Even if the parity information is not in sync, either old or
> new data is on the disk.

But you lost a disk in the middle of recovery (any IO error is
a lost disk) 

> Indeed, but I think linux' behaviour is especially poor. For example, the
> renumbering of the devices or the strange rebuild-restart behaviour (which
> is definitely a bug) will make recovery unnecessarily complicated.

There were some suggestions in the past 
to be a bit nicer on read IO errors - often if a read fails and you rewrite 
the block from the reconstructed data the disk would allocate a new block
and then be error free again.

The problem is just that when there are user visible IO errors
on a modern disk something is very wrong and it will likely run quickly out 
of replacement blocks, and will eventually fail. That is why
Linux "forces" early replacement of the disk on any error - it is the
safest thing to do.


> > problem though (e.g. when file system metadata is affected)
> 
> Of course, but that's supposed to be worked around by using a journaling
> file system, right?

Nope, journaling is no magical fix for meta data corruption.

-Andi


  reply	other threads:[~2005-01-27  9:51 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-27  3:59 critical bugs in md raid5 Marc Lehmann
2005-01-27  5:11 ` Andi Kleen
2005-01-27  6:31   ` Marc Lehmann
2005-01-27  9:51     ` Andi Kleen [this message]
2005-01-27 16:33       ` critical bugs in md raid5 and ATA disk failure/recovery modes Marc Lehmann
2005-01-29 18:35         ` Pavel Machek
2005-01-29 18:37           ` Andi Kleen
2005-01-29 18:55             ` Pavel Machek
2005-01-27 16:56       ` critical bugs in md raid5 Lehmann 
2005-01-27  6:48   ` Lehmann 

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20050127095102.GA88779@muc.de \
    --to=ak@muc.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox