From: Ric Wheeler <rwheeler@redhat.com>
To: NeilBrown <neilb@suse.de>
Cc: Nico Schottelius <nico-lkml-20110623@schottelius.org>,
LKML <linux-kernel@vger.kernel.org>,
Chris Mason <chris.mason@oracle.com>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Alasdair G Kergon <agk@redhat.com>
Subject: Re: Mis-Design of Btrfs?
Date: Thu, 14 Jul 2011 07:02:22 +0100 [thread overview]
Message-ID: <4E1E866E.2050405@redhat.com> (raw)
In-Reply-To: <20110714155620.6e9ac2cc@notabene.brown>
On 07/14/2011 06:56 AM, NeilBrown wrote:
> On Wed, 29 Jun 2011 10:29:53 +0100 Ric Wheeler<rwheeler@redhat.com> wrote:
>
>> On 06/27/2011 07:46 AM, NeilBrown wrote:
>>> On Thu, 23 Jun 2011 12:53:37 +0200 Nico Schottelius
>>> <nico-lkml-20110623@schottelius.org> wrote:
>>>
>>>> Good morning devs,
>>>>
>>>> I'm wondering whether the raid- and volume-management-builtin of btrfs is
>>>> actually a sane idea or not.
>>>> Currently we do have md/device-mapper support for raid
>>>> already, btrfs lacks raid5 support and re-implements stuff that
>>>> has already been done.
>>>>
>>>> I'm aware of the fact that it is very useful to know on which devices
>>>> we are in a filesystem. But I'm wondering, whether it wouldn't be
>>>> smarter to generalise the information exposure through the VFS layer
>>>> instead of replicating functionality:
>>>>
>>>> Physical: USB-HD SSD USB-Flash | Exposes information to
>>>> Raid: Raid1, Raid5, Raid10, etc. | higher levels
>>>> Crypto: Luks |
>>>> LVM: Groups/Volumes |
>>>> FS: xfs/jfs/reiser/ext3 v
>>>>
>>>> Thus a filesystem like ext3 could be aware that it is running
>>>> on a USB HD, enable -o sync be default or have the filesystem
>>>> to rewrite blocks when running on crypto or optimise for an SSD, ...
>>> I would certainly agree that exposing information to higher levels is a good
>>> idea. To some extent we do. But it isn't always as easy as it might sound.
>>> Choosing exactly what information to expose is the challenge. If you lack
>>> sufficient foresight you might expose something which turns out to be
>>> very specific to just one device, so all those upper levels which make use of
>>> the information find they are really special-casing one specific device,
>>> which isn't a good idea.
>>>
>>>
>>> However it doesn't follow that RAID5 should not be implemented in BTRFS.
>>> The levels that you have drawn are just one perspective. While that has
>>> value, it may not be universal.
>>> I could easily argue that the LVM layer is a mistake and that filesystems
>>> should provide that functionality directly.
>>> I could almost argue the same for crypto.
>>> RAID1 can make a lot of sense to be tightly integrated with the FS.
>>> RAID5 ... I'm less convinced, but then I have a vested interest there so that
>>> isn't an objective assessment.
>>>
>>> Part of "the way Linux works" is that s/he who writes the code gets to make
>>> the design decisions. The BTRFS developers might create something truly
>>> awesome, or might end up having to support a RAID feature that they
>>> subsequently think is a bad idea. But it really is their decision to make.
>>>
>>> NeilBrown
>>>
>> One more thing to add here is that I think that we still have a chance to
>> increase the sharing between btrfs and the MD stack if we can get those changes
>> made. No one likes to duplicate code, but we will need a richer interface
>> between the block and file system layer to help close that gap.
>>
>> Ric
>>
> I'm certainly open to suggestions and collaboration. Do you have in mind any
> particular way to make the interface richer??
>
> NeilBrown
Hi Neil,
I know that Chris has a very specific set of use cases for btrfs and think that
Alasdair and others have started to look at what is doable.
The obvious use case is the following:
If a file system uses checksumming or other data corruption detection bits, it
can detect that it got bad data on a write. If that data was protected by RAID,
it would like to ask the block layer to try to read from another mirror (for
raid1) or try to validate/rebuild from parity.
Today, I think that a retry will basically just give us back a random chance of
getting data from a different mirror or the same one that we got data from on
the first go.
Chris, Alasdair, was that a good summary of one concern?
Thanks!
Ric
next prev parent reply other threads:[~2011-07-14 6:02 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-23 10:53 Mis-Design of Btrfs? Nico Schottelius
2011-06-27 6:46 ` NeilBrown
2011-06-29 9:29 ` Ric Wheeler
2011-06-29 10:47 ` A. James Lewis
2011-07-14 20:47 ` Erik Jensen
2011-07-14 5:56 ` NeilBrown
2011-07-14 6:02 ` Ric Wheeler [this message]
2011-07-14 6:38 ` NeilBrown
2011-07-14 6:57 ` Ric Wheeler
2011-07-15 2:32 ` Chris Mason
2011-07-15 4:58 ` david
2011-07-15 6:33 ` NeilBrown
2011-07-15 11:34 ` Chris Mason
2011-07-15 12:58 ` Ric Wheeler
2011-07-15 13:20 ` Chris Mason
2011-07-15 13:31 ` Ric Wheeler
2011-07-15 14:00 ` Chris Mason
2011-07-15 14:07 ` Hugo Mills
2011-07-15 14:24 ` Chris Mason
2011-07-15 14:47 ` Christian Aßfalg
2011-07-15 14:47 ` Christian Aßfalg
2011-07-15 14:54 ` Hugo Mills
2011-07-15 15:12 ` Chris Mason
2011-07-15 16:23 ` david
2011-07-15 16:51 ` Ric Wheeler
2011-07-15 17:01 ` david
2011-07-15 17:23 ` Ric Wheeler
2011-07-15 13:55 ` Mike Snitzer
2011-07-15 13:55 ` Mike Snitzer
2011-07-15 16:03 ` david
2011-07-14 9:37 ` Jan Schmidt
2011-07-14 9:55 ` NeilBrown
2011-07-14 16:27 ` Goffredo Baroncelli
2011-07-14 16:55 ` Alasdair G Kergon
2011-07-14 16:55 ` Alasdair G Kergon
2011-07-14 19:50 ` John Stoffel
2011-07-14 20:48 ` david
2011-07-14 20:50 ` Erik Jensen
2011-07-14 6:59 ` Arne Jansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4E1E866E.2050405@redhat.com \
--to=rwheeler@redhat.com \
--cc=agk@redhat.com \
--cc=chris.mason@oracle.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=nico-lkml-20110623@schottelius.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.