linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ric Wheeler <rwheeler@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: "Ted Ts'o" <tytso@mit.edu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Jan Kara <jack@suse.cz>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>,
	Edward Shishkin <edward@redhat.com>
Subject: Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error
Date: Thu, 26 Jan 2012 07:17:41 -0500	[thread overview]
Message-ID: <4F214465.9010600@redhat.com> (raw)
In-Reply-To: <20120124003657.GJ15102@dastard>

On 01/23/2012 07:36 PM, Dave Chinner wrote:
> On Mon, Jan 23, 2012 at 04:47:09PM -0500, Ted Ts'o wrote:
>>> The thing is, transient write errors tend to be isolated and go away
>>> when a retry occurs (think of IO timeouts when multipath failover
>>> occurs). When non-isolated IO or unrecoverable problems occur (e.g.
>>> no paths left to fail over onto), critical other metadata reads and
>>> writes will fail and shut down the filesystem, thereby terminating
>>> the "try forever" background writeback loop those delayed write
>>> buffers may be in. So the truth is that "trying forever" on write
>>> errors can handle a whole class of write IO errors very
>>> effectively....
>> So how does XFS decide whether a write should fail and shutdown the
>> file system, or just "try forever"?
> The IO dispatcher decides that. If the dispatcher has handed the IO
> off to the delayed write queue, then failed writes will be tried
> again. If the caller is catching the IO completion (e.g. sync
> writes) or attaching a completion callback (journal IO), then the
> completion context will handle the error appropriately. Journal IO
> errors tend to shutdown the filesystem on the first error, other
> contexts may handle the error, retry or shutdown the filesystem
> depending on their current state when the error occurs.
>
> Reads are even more complex, because ithe dispatch context can be
> within a transaction and the correct error handling is then
> dependent on the current state of the transaction....
>
> Cheers,
>
> Dave.

I think that having retry logic at the file system layer is really putting the 
fix in the wrong place.

Specifically, if we have multipath configured under a file system, it is up to 
the multipath logic to handle the failure (and use another path, retry, etc).  
If we see a failed IO further up the stack, it is *really* dead at that point.

Transient errors on normal drives are also rarely worth re-trying since pretty 
much all modern storage devices have firmware that will have done exhaustive 
retries on a failed write. Definitely not worth retrying forever for a normal 
device.

At one end of the spectrum, think of a box with dozens of storage devices 
attached (either via SAN or local S-ATA devices). If we are doing large, 
streaming writes, we could get a large amount of memory dirtied while writing. 
If that one device dies and we keep that memory in use for the endless retry 
loop, we have really cripple the box which still has multiple happy storage 
devices and file systems....

Ric





  reply	other threads:[~2012-01-26 12:18 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-05 14:40 [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error Jan Kara
2012-01-05 14:40 ` [PATCH 1/3] fs: Convert checks for write IO errors from !buffer_uptodate to buffer_write_io_error Jan Kara
2012-01-05 14:40 ` [PATCH 2/3] fs: Do not clear uptodate flag on write IO error Jan Kara
2012-01-05 14:40 ` [PATCH 3/3] ext2: Replace tests of write IO errors using buffer_uptodate Jan Kara
2012-01-05 22:16 ` [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error Andrew Morton
2012-01-15  2:19 ` Linus Torvalds
2012-01-16 16:01   ` Jan Kara
2012-01-16 18:55     ` Linus Torvalds
2012-01-16 19:06       ` Linus Torvalds
2012-01-17  0:36       ` Dave Chinner
2012-01-17  0:59         ` Linus Torvalds
2012-01-17 10:46           ` Boaz Harrosh
2012-01-23  3:04           ` Dave Chinner
2012-01-23 21:47             ` Ted Ts'o
2012-01-23 23:49               ` Linus Torvalds
2012-01-24  6:12                 ` Dave Chinner
2012-01-24  7:10                   ` Linus Torvalds
2012-01-24 12:13                     ` Jan Kara
2012-01-24  0:36               ` Dave Chinner
2012-01-26 12:17                 ` Ric Wheeler [this message]
2012-01-26 20:51                   ` Jan Kara
2012-01-26 20:58                     ` Ric Wheeler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F214465.9010600@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=edward@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).