From: Chris Mason <chris.mason@oracle.com>
To: Ric Wheeler <rwheeler@redhat.com>
Cc: NeilBrown <neilb@suse.de>, david <david@lang.hm>,
Nico Schottelius <nico-lkml-20110623@schottelius.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>,
Alasdair G Kergon <agk@redhat.com>
Subject: Re: Mis-Design of Btrfs?
Date: Fri, 15 Jul 2011 10:00:35 -0400 [thread overview]
Message-ID: <1310738205-sup-715@shiny> (raw)
In-Reply-To: <4E204139.5060702@redhat.com>
Excerpts from Ric Wheeler's message of 2011-07-15 09:31:37 -0400:
> On 07/15/2011 02:20 PM, Chris Mason wrote:
> > Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400:
> >> On 07/15/2011 12:34 PM, Chris Mason wrote:
> > [ triggering IO retries on failed crc or other checks ]
> >
> >>> But, maybe the whole btrfs model is backwards for a generic layer.
> >>> Instead of sending down ios and testing when they come back, we could
> >>> just set a verification function (or stack of them?).
> >>>
> >>> For metadata, btrfs compares the crc and a few other fields of the
> >>> metadata block, so we can easily add a compare function pointer and a
> >>> void * to pass in.
> >>>
> >>> The problem is the crc can take a lot of CPU, so btrfs kicks it off to
> >>> threading pools so saturate all the cpus on the box. But there's no
> >>> reason we can't make that available lower down.
> >>>
> >>> If we pushed the verification down, the retries could bubble up the
> >>> stack instead of the other way around.
> >>>
> >>> -chris
> >> I do like the idea of having the ability to do the verification and retries down
> >> the stack where you actually have the most context to figure out what is possible...
> >>
> >> Why would you need to bubble back up anything other than an error when all
> >> retries have failed?
> > By bubble up I mean that if you have multiple layers capable of doing
> > retries, the lowest levels would retry first. Basically by the time we
> > get an -EIO_ALREADY_RETRIED we know there's nothing that lower level can
> > do to help.
> >
> > -chris
>
> Absolutely sounds like the most sane way to go to me, thanks!
>
It really seemed like a good idea, but I just realized it doesn't work
well when parts of the stack transform the data.
Picture dm-crypt on top of raid1. If raid1 is responsible for the
crc retries, there's no way to crc the data because it needs to be
decrypted first.
I think the raided dm-crypt config is much more common (and interesting)
than multiple layers that can retry for other reasons (raid1 on top of
raid10?)
In other words, do we really want to do a lot of design work for
multiple layers where each one maintains multiple copies of the data
blocks? Are there configs where this really makes sense?
-chris
next prev parent reply other threads:[~2011-07-15 14:00 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20110623105337.GD3753@ethz.ch>
[not found] ` <20110627164637.377314e2@notabene.brown>
2011-06-29 9:29 ` Mis-Design of Btrfs? Ric Wheeler
2011-06-29 10:47 ` A. James Lewis
2011-07-14 20:47 ` Erik Jensen
2011-07-14 5:56 ` NeilBrown
2011-07-14 6:02 ` Ric Wheeler
2011-07-14 6:38 ` NeilBrown
2011-07-14 6:57 ` Ric Wheeler
2011-07-15 2:32 ` Chris Mason
2011-07-15 4:58 ` david
2011-07-15 6:33 ` NeilBrown
2011-07-15 11:34 ` Chris Mason
2011-07-15 12:58 ` Ric Wheeler
2011-07-15 13:20 ` Chris Mason
2011-07-15 13:31 ` Ric Wheeler
2011-07-15 14:00 ` Chris Mason [this message]
2011-07-15 14:07 ` Hugo Mills
2011-07-15 14:24 ` Chris Mason
2011-07-15 14:47 ` Christian Aßfalg
2011-07-15 14:54 ` Hugo Mills
2011-07-15 15:12 ` Chris Mason
2011-07-15 16:23 ` david
2011-07-15 16:51 ` Ric Wheeler
2011-07-15 17:01 ` david
2011-07-15 17:23 ` Ric Wheeler
2011-07-15 13:55 ` Mike Snitzer
2011-07-15 16:03 ` david
2011-07-14 9:37 ` Jan Schmidt
2011-07-14 9:55 ` NeilBrown
2011-07-14 16:27 ` Goffredo Baroncelli
2011-07-14 16:55 ` Alasdair G Kergon
2011-07-14 19:50 ` John Stoffel
2011-07-14 20:48 ` david
2011-07-14 20:50 ` Erik Jensen
2011-07-14 16:55 ` Alasdair G Kergon
2011-07-14 6:59 ` Arne Jansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1310738205-sup-715@shiny \
--to=chris.mason@oracle.com \
--cc=agk@redhat.com \
--cc=david@lang.hm \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.de \
--cc=nico-lkml-20110623@schottelius.org \
--cc=rwheeler@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).