linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andreas Dilger <adilger@sun.com>
To: George Spelvin <linux@horizon.com>
Cc: david@lang.hm, pavel@ucw.cz, linux-doc@vger.kernel.org,
	linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
Date: Tue, 01 Sep 2009 10:18:28 -0600	[thread overview]
Message-ID: <20090901161828.GN4197@webber.adilger.int> (raw)
In-Reply-To: <20090901005629.3932.qmail@science.horizon.com>

On Aug 31, 2009  20:56 -0400, George Spelvin wrote:
> >> The more I learn about storage, the more I like idea of zfs. Given the
> >> subtle issues between filesystem and raid layer, integrating them just
> >> makes sense.
> > 
> > Note that all that zfs does is tell you that you already lost data (and 
> > then only if the checksumming algorithm would be invalid on a blank block 
> > being returned), it doesn't protect your data.
> 
> Obviously, there are limits, but it does provide useful protection:
> - You know where the missing data is.
> - The error isn't amplified by believing corrupted metadata
> - I seem to recall that ZFS does replicate metadata.

ZFS definitely does replicate data.  At the lowest level it has RAID-1,
and RAID-Z/Z2, which are pretty close to RAID-5/6 respectively, but with
the important difference that every write is a full-stripe-width write,
so that it is not possible for RAID-Z/Z2 to cause corruption due to a
partially-written RAID parity stripe.

In addition, for internal metadata blocks there are 1 or 2 duplicate
copies written to different devices, so that in case of a fatal device
corruption (e.g. double failure of a RAID-Z device) the metadata tree
is still intact.

> - Corrupted replicas can be "scrubbed" and rewritten from uncorrupted ones.
> - If you have some storage redundancy, it can try different mirrors
>   to get the data back.
> 
> In particular, on a RAID-5 system, ZFS tries dropping out each data disk
> in turn to see if the correct data can be reconstructed from the others
> + parity.

What else is interesting is that in the case of 1-4-bit errors the
default checksum function can also be used as ECC to recover the correct
data even if there is no replicated copy of the data.

> One of ZFS's big performance problems is that currently it only checksums
> the entire RAID stripe, so it always has to read every drive, and doesn't
> get RAID's IOPS advantage.

Or this is a drawback of the Linux software RAID because it doesn't detect
the case when the parity is bad before there is a second drive failure and
the bad parity is used to reconstruct the data block incorrectly (which
will also go undetected because there is no checksum).

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

  parent reply	other threads:[~2009-09-01 16:18 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-31  0:54 raid is dangerous but that's secret (was Re: [patch] ext2/3: George Spelvin
2009-08-31 11:04 ` Pavel Machek
2009-08-31 15:45   ` david
2009-09-01  0:56     ` George Spelvin
2009-09-01  8:36       ` NeilBrown
2009-09-01  8:46         ` Pavel Machek
2009-09-01 11:18         ` George Spelvin
2009-09-01 12:35           ` NeilBrown
2009-09-01 15:25             ` david
2009-09-01 21:12               ` NeilBrown
2009-09-01 16:18       ` Andreas Dilger [this message]
2009-09-02  1:10         ` George Spelvin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090901161828.GN4197@webber.adilger.int \
    --to=adilger@sun.com \
    --cc=david@lang.hm \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@horizon.com \
    --cc=pavel@ucw.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).