Re: Mis-Design of Btrfs?

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ric Wheeler <rwheeler@redhat.com>
To: david@lang.hm
Cc: Chris Mason <chris.mason@oracle.com>, NeilBrown <neilb@suse.de>,
	Nico Schottelius <nico-lkml-20110623@schottelius.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	Alasdair G Kergon <agk@redhat.com>
Subject: Re: Mis-Design of Btrfs?
Date: Fri, 15 Jul 2011 17:51:04 +0100	[thread overview]
Message-ID: <4E206FF8.9090803@redhat.com> (raw)
In-Reply-To: <alpine.DEB.2.02.1107150920180.3745@asgard.lang.hm>

On 07/15/2011 05:23 PM, david@lang.hm wrote:
> On Fri, 15 Jul 2011, Chris Mason wrote:
>
>> Excerpts from Ric Wheeler's message of 2011-07-15 08:58:04 -0400:
>>> On 07/15/2011 12:34 PM, Chris Mason wrote:
>>
>> By bubble up I mean that if you have multiple layers capable of doing
>> retries, the lowest levels would retry first.  Basically by the time we
>> get an -EIO_ALREADY_RETRIED we know there's nothing that lower level can
>> do to help.
>
> the problem with doing this is that it can end up stalling the box for 
> significant amounts of time while all the retries happen.
>
> we already see this happening today where a disk read failure is retried 
> multiple times by the disk, multiple times by the raid controller, and then 
> multiple times by Linux, resulting is multi-minute stalls when you hit a disk 
> error in some cases.
>
> having the lower layers do the retries automatically runs the risk of making 
> this even worse.
>
> This needs to be able to be throttled by some layer that can see the entire 
> picture (either by cutting off the retries after a number, after some time, or 
> by spacing out the retries to allow other queries to get in and let the box do 
> useful work in the meantime)
>
> David Lang
>

That should not be an issue - we have a "fast fail" path for IO that should 
avoid retrying just for those reasons (i.e., for multi-path or when recovering a 
flaky drive).

This is not a scheme for unbounded retries. If you have a 3 disk mirror in 
RAID1, you would read the data no more than 2 extra times and almost never more 
than once.  That should be *much* faster than the multiple-second long timeout 
you see when waiting for SCSI timeout to fire, etc.

Ric

next prev parent reply	other threads:[~2011-07-15 16:51 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-06-23 10:53 Mis-Design of Btrfs? Nico Schottelius
2011-06-27  6:46 ` NeilBrown
2011-06-29  9:29   ` Ric Wheeler
2011-06-29 10:47     ` A. James Lewis
2011-07-14 20:47       ` Erik Jensen
2011-07-14  5:56     ` NeilBrown
2011-07-14  6:02       ` Ric Wheeler
2011-07-14  6:38         ` NeilBrown
2011-07-14  6:57           ` Ric Wheeler
2011-07-15  2:32             ` Chris Mason
2011-07-15  4:58               ` david
2011-07-15  6:33                 ` NeilBrown
2011-07-15 11:34                   ` Chris Mason
2011-07-15 12:58                     ` Ric Wheeler
2011-07-15 13:20                       ` Chris Mason
2011-07-15 13:31                         ` Ric Wheeler
2011-07-15 14:00                           ` Chris Mason
2011-07-15 14:07                             ` Hugo Mills
2011-07-15 14:24                               ` Chris Mason
2011-07-15 14:47                                 ` Christian Aßfalg
2011-07-15 14:47                                   ` Christian Aßfalg
2011-07-15 14:54                                 ` Hugo Mills
2011-07-15 15:12                                   ` Chris Mason
2011-07-15 16:23                         ` david
2011-07-15 16:51                           ` Ric Wheeler [this message]
2011-07-15 17:01                             ` david
2011-07-15 17:23                               ` Ric Wheeler
2011-07-15 13:55                       ` Mike Snitzer
2011-07-15 13:55                         ` Mike Snitzer
2011-07-15 16:03                   ` david
2011-07-14  9:37           ` Jan Schmidt
2011-07-14  9:55             ` NeilBrown
2011-07-14 16:27           ` Goffredo Baroncelli
2011-07-14 16:55           ` Alasdair G Kergon
2011-07-14 16:55           ` Alasdair G Kergon
2011-07-14 19:50             ` John Stoffel
2011-07-14 20:48               ` david
2011-07-14 20:50               ` Erik Jensen
2011-07-14  6:59         ` Arne Jansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4E206FF8.9090803@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=agk@redhat.com \
    --cc=chris.mason@oracle.com \
    --cc=david@lang.hm \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=nico-lkml-20110623@schottelius.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.