Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Ric Wheeler <rwheeler@redhat.com>
To: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>, "Ted Ts'o" <tytso@mit.edu>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Hellwig <hch@infradead.org>,
	Al Viro <viro@zeniv.linux.org.uk>,
	LKML <linux-kernel@vger.kernel.org>,
	Edward Shishkin <edward@redhat.com>
Subject: Re: [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error
Date: Thu, 26 Jan 2012 15:58:32 -0500	[thread overview]
Message-ID: <4F21BE78.3050808@redhat.com> (raw)
In-Reply-To: <20120126205105.GC27283@quack.suse.cz>

On 01/26/2012 03:51 PM, Jan Kara wrote:
> On Thu 26-01-12 07:17:41, Ric Wheeler wrote:
>> On 01/23/2012 07:36 PM, Dave Chinner wrote:
>>> On Mon, Jan 23, 2012 at 04:47:09PM -0500, Ted Ts'o wrote:
>>>>> The thing is, transient write errors tend to be isolated and go away
>>>>> when a retry occurs (think of IO timeouts when multipath failover
>>>>> occurs). When non-isolated IO or unrecoverable problems occur (e.g.
>>>>> no paths left to fail over onto), critical other metadata reads and
>>>>> writes will fail and shut down the filesystem, thereby terminating
>>>>> the "try forever" background writeback loop those delayed write
>>>>> buffers may be in. So the truth is that "trying forever" on write
>>>>> errors can handle a whole class of write IO errors very
>>>>> effectively....
>>>> So how does XFS decide whether a write should fail and shutdown the
>>>> file system, or just "try forever"?
>>> The IO dispatcher decides that. If the dispatcher has handed the IO
>>> off to the delayed write queue, then failed writes will be tried
>>> again. If the caller is catching the IO completion (e.g. sync
>>> writes) or attaching a completion callback (journal IO), then the
>>> completion context will handle the error appropriately. Journal IO
>>> errors tend to shutdown the filesystem on the first error, other
>>> contexts may handle the error, retry or shutdown the filesystem
>>> depending on their current state when the error occurs.
>>>
>>> Reads are even more complex, because ithe dispatch context can be
>>> within a transaction and the correct error handling is then
>>> dependent on the current state of the transaction....
>> I think that having retry logic at the file system layer is really
>> putting the fix in the wrong place.
>>
>> Specifically, if we have multipath configured under a file system,
>> it is up to the multipath logic to handle the failure (and use
>> another path, retry, etc).  If we see a failed IO further up the
>> stack, it is *really* dead at that point.
>    Yes, that makes sense. Only, if my memory serves well, e.g. with iSCSI we
> do see transient errors so it's not like they don't happen.

iSCSI is "just" a transport for SCSI - you can have multipath enabled for iSCSI 
as well of course :)
>
>> Transient errors on normal drives are also rarely worth re-trying
>> since pretty much all modern storage devices have firmware that will
>> have done exhaustive retries on a failed write. Definitely not worth
>> retrying forever for a normal device.
>    Agreed. But we could still be clever enough to write the data / metadata
> to a different place.

Most storage devices totally lie to you about the layout, but there is some 
value (like btrfs) in writing things twice to make sure that you can survive a 
single bad sector.  Even in that case, you still want to avoid a re-try of a 
failed IO though.

>
>> At one end of the spectrum, think of a box with dozens of storage
>> devices attached (either via SAN or local S-ATA devices). If we are
>> doing large, streaming writes, we could get a large amount of memory
>> dirtied while writing. If that one device dies and we keep that
>> memory in use for the endless retry loop, we have really cripple the
>> box which still has multiple happy storage devices and file
>> systems....
>    I agree that if we ever decide to keep unwriteable data in memory,
> kernel has to have a way to get rid of this data if it needs to.

I seem to recall having this discussion (LinuxCon Japan?) a few years back.

Ric

     prev parent reply	other threads:[~2012-01-26 20:58 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-05 14:40 [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error Jan Kara
2012-01-05 14:40 ` [PATCH 1/3] fs: Convert checks for write IO errors from !buffer_uptodate to buffer_write_io_error Jan Kara
2012-01-05 14:40 ` [PATCH 2/3] fs: Do not clear uptodate flag on write IO error Jan Kara
2012-01-05 14:40 ` [PATCH 3/3] ext2: Replace tests of write IO errors using buffer_uptodate Jan Kara
2012-01-05 22:16 ` [RFC PATCH 0/3] Stop clearing uptodate flag on write IO error Andrew Morton
2012-01-15  2:19 ` Linus Torvalds
2012-01-16 16:01   ` Jan Kara
2012-01-16 18:55     ` Linus Torvalds
2012-01-16 19:06       ` Linus Torvalds
2012-01-17  0:36       ` Dave Chinner
2012-01-17  0:59         ` Linus Torvalds
2012-01-17 10:46           ` Boaz Harrosh
2012-01-23  3:04           ` Dave Chinner
2012-01-23 21:47             ` Ted Ts'o
2012-01-23 23:49               ` Linus Torvalds
2012-01-24  6:12                 ` Dave Chinner
2012-01-24  7:10                   ` Linus Torvalds
2012-01-24 12:13                     ` Jan Kara
2012-01-24  0:36               ` Dave Chinner
2012-01-26 12:17                 ` Ric Wheeler
2012-01-26 20:51                   ` Jan Kara
2012-01-26 20:58                     ` Ric Wheeler [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F21BE78.3050808@redhat.com \
    --to=rwheeler@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@fromorbit.com \
    --cc=edward@redhat.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).