Re: [PATCH 0/7] retry write on error

linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: David Sterba <dsterba@suse.cz>
To: Anand Jain <anand.jain@oracle.com>
Cc: Peter Grandi <pg@btrfs.list.sabi.co.UK>,
	Linux fs Btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH 0/7] retry write on error
Date: Wed, 29 Nov 2017 17:47:08 +0100	[thread overview]
Message-ID: <20171129164708.GH3553@twin.jikos.cz> (raw)
In-Reply-To: <aaa69c5c-164a-5cba-452c-5a53d801c792@oracle.com>

On Wed, Nov 29, 2017 at 12:09:29PM +0800, Anand Jain wrote:
> On 11/29/2017 07:41 AM, pg@btrfs.list.sabi.co.UK wrote:
> >>>> If the underlying protocal doesn't support retry and there
> >>>> are some transient errors happening somewhere in our IO
> >>>> stack, we'd like to give an extra chance for IO.
> > 
> >>> A limited number of retries may make sense, though I saw some
> >>> long stalls after retries on bad disks.
> > 
> > Indeed! One of the major issues in actual storage administration
> > is to find ways to reliably disable most retries, or to shorten
> > them, both at the block device level and the device level,
> > because in almost all cases where storage reliability matters
> > what is important is simply swapping out the failing device
> > immediately and then examining and possible refreshing it
> > offline.
> > 
> > To the point that many device manufacturers deliberately cripple
> > in cheaper products retry shortening or disabling options to
> > force long stalls, so that people who care about reliability
> > more than price will buy the more expensive version that can
> > disable or shorten retries.
> > 
> >> Seems preferable to avoid issuing retries when the underlying
> >> transport layer(s) has already done so, but I am not sure
> >> there is a way to know that at the fs level.
> > 
> > Inded, and to use an euphemism, a third layer of retries at the
> > filesystem level are currently a thoroughly imbecilic idea :-),
> > as whether retries are worth doing is not a filesystem dependent
> > issue (but then plugging is done at the block io level when it
> > is entirely device dependent whether it is worth doing, so there
> > is famous precedent).
> > 
> > There are excellent reasons why error recovery is in general not
> > done at the filesystem level since around 20 years ago, which do
> > not need repeating every time. However one of them is that where
> > it makes sense device firmware does retries, and the block
> > device layer does retries too, which is often a bad idea, and
> > where it is not, the block io level should be do that, not the
> > filesystem.
> > 
> > A large part of the above discussion would not be needed if
> > Linux kernel "developers" exposed a clear notion of hardware
> > device and block device state machine and related semantics, or
> > even knew that it were desirable, but that's an idea that is
> > only 50 years old, so may not have yet reached popularity :-).
> 
>   I agree with Ed and Peter, similar opinion was posted here [1].
>      https://www.spinics.net/lists/linux-btrfs/msg70240.html

All the points in this thread speak against retries on the filesystem
level and I agree. Without an interface to query the block layer if the
retries make sense, it's just guessing, likely to be wrong.

next prev parent reply	other threads:[~2017-11-29 16:49 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-11-22  0:35 [PATCH 0/7] retry write on error Liu Bo
2017-11-22  0:35 ` [PATCH 1/7] Btrfs: keep a copy of bi_iter in btrfs_io_bio Liu Bo
2017-11-22  0:35 ` [PATCH 2/7] Btrfs: add helper __btrfs_end_bio Liu Bo
2017-11-22  0:35 ` [PATCH 3/7] Btrfs: retry write for non-raid56 Liu Bo
2017-11-22 14:41   ` Nikolay Borisov
2017-11-28 23:01     ` Liu Bo
2017-11-22  0:35 ` [PATCH 4/7] Btrfs: get rbio inside fail_bio_stripe Liu Bo
2017-11-22  0:35 ` [PATCH 5/7] Btrfs: add helper __raid_write_end_io Liu Bo
2017-11-22  0:35 ` [PATCH 6/7] Btrfs: retry write for raid56 Liu Bo
2017-11-22  0:35 ` [PATCH 7/7] Btrfs: retry the whole bio on write error Liu Bo
2017-11-28 19:22 ` [PATCH 0/7] retry write on error David Sterba
2017-11-28 22:07   ` Edmund Nadolski
2017-11-28 23:41     ` Peter Grandi
2017-11-29  4:09       ` Anand Jain
2017-11-29 16:47         ` David Sterba [this message]
2017-11-30 20:22           ` Liu Bo
2017-12-03 21:00             ` Peter Grandi
2017-12-04  9:14             ` Anand Jain
2017-12-04 20:49             ` Edmund Nadolski
2017-12-05 18:57             ` David Sterba
2017-11-29 18:09   ` Liu Bo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20171129164708.GH3553@twin.jikos.cz \
    --to=dsterba@suse.cz \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=pg@btrfs.list.sabi.co.UK \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).