From: "George Spelvin" <linux@horizon.com>
To: david@lang.hm, pavel@ucw.cz
Cc: linux-doc@vger.kernel.org, linux-ext4@vger.kernel.org,
linux-kernel@vger.kernel.org, linux@horizon.com
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
Date: 31 Aug 2009 20:56:29 -0400 [thread overview]
Message-ID: <20090901005629.3932.qmail@science.horizon.com> (raw)
In-Reply-To: <alpine.DEB.2.00.0908310844230.6822@asgard.lang.hm>
>From david@lang.hm Mon Aug 31 15:46:19 2009
Date: Mon, 31 Aug 2009 08:45:38 -0700 (PDT)
From: david@lang.hm
X-X-Sender: dlang@asgard.lang.hm
To: Pavel Machek <pavel@ucw.cz>
cc: George Spelvin <linux@horizon.com>, linux-doc@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: raid is dangerous but that's secret (was Re: [patch] ext2/3:
In-Reply-To: <20090831105645.GD1353@ucw.cz>
References: <20090831005426.13607.qmail@science.horizon.com> <20090831105645.GD1353@ucw.cz>
User-Agent: Alpine 2.00 (DEB 1167 2008-08-23)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed
>>> That's one thing I really like about ZFS: its policy of "don't trust
>>> the disks." If nothing else, simply telling you "your disks f*ed up,
>>> and I caught them doing it", instead of the usual mysterious corruption
>>> detected three months later, is tremendoudly useful information.
>>
>> The more I learn about storage, the more I like idea of zfs. Given the
>> subtle issues between filesystem and raid layer, integrating them just
>> makes sense.
>
> Note that all that zfs does is tell you that you already lost data (and
> then only if the checksumming algorithm would be invalid on a blank block
> being returned), it doesn't protect your data.
Obviously, there are limits, but it does provide useful protection:
- You know where the missing data is.
- The error isn't amplified by believing corrupted metadata
- I seem to recall that ZFS does replicate metadata.
- Corrupted replicas can be "scrubbed" and rewritten from uncorrupted ones.
- If you have some storage redundancy, it can try different mirrors
to get the data back.
In particular, on a RAID-5 system, ZFS tries dropping out each data disk
in turn to see if the correct data can be reconstructed from the others
+ parity.
One of ZFS's big performance problems is that currently it only checksums
the entire RAID stripe, so it always has to read every drive, and doesn't
get RAID's IOPS advantage. But that's fairly straightforward to fix.
(It's something of a problem for RAID-5 in general, because reads want
larger chunk sizes to increase the chance that a single read can be
satisfied by one disk, while writes want small chunks so that you can
do whole-stripe writes.)
The fact that the ZFS decelopers observed drives writing the data to the
wrong location emphasizes the importance of keeping the checksum with
the pointer. An embedded checksum, no matter how good, can't tell you if
the data is stale; you need a way to distinguish versions in the pointer.
next prev parent reply other threads:[~2009-09-01 0:56 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-31 0:54 raid is dangerous but that's secret (was Re: [patch] ext2/3: George Spelvin
2009-08-31 11:04 ` Pavel Machek
2009-08-31 15:45 ` david
2009-09-01 0:56 ` George Spelvin [this message]
2009-09-01 8:36 ` NeilBrown
2009-09-01 8:46 ` Pavel Machek
2009-09-01 11:18 ` George Spelvin
2009-09-01 12:35 ` NeilBrown
2009-09-01 15:25 ` david
2009-09-01 21:12 ` NeilBrown
2009-09-01 16:18 ` Andreas Dilger
2009-09-02 1:10 ` George Spelvin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090901005629.3932.qmail@science.horizon.com \
--to=linux@horizon.com \
--cc=david@lang.hm \
--cc=linux-doc@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=pavel@ucw.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).