linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: NeilBrown <neilb@suse.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: Jan Kara <jack@suse.cz>,
	Trond Myklebust <trondmy@primarydata.com>,
	"kwolf@redhat.com" <kwolf@redhat.com>,
	"riel@redhat.com" <riel@redhat.com>,
	"hch@infradead.org" <hch@infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"jlayton@poochiereds.net" <jlayton@poochiereds.net>,
	"lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"rwheeler@redhat.com" <rwheeler@redhat.com>
Subject: Re: [Lsf-pc] [LSF/MM TOPIC] I/O error handling and fsync()
Date: Fri, 27 Jan 2017 17:03:24 +1100	[thread overview]
Message-ID: <87a8adt4yb.fsf@notabene.neil.brown.name> (raw)
In-Reply-To: <20170127032318.rkdiwu6nog3nifdo@thunk.org>

[-- Attachment #1: Type: text/plain, Size: 4496 bytes --]

On Thu, Jan 26 2017, Theodore Ts'o wrote:

> On Fri, Jan 27, 2017 at 09:19:10AM +1100, NeilBrown wrote:
>> I don't think it has.
>> The original topic was about gracefully handling of recoverable IO errors.
>> The question was framed as about retrying fsync() is it reported an
>> error, but this was based on a misunderstand.  fsync() doesn't report
>> an error for recoverable errors.  It hangs.
>> So the original topic is really about gracefully handling IO operations
>> which currently can hang indefinitely.
>
> Well, the problem is that it is up to the device driver to decide when
> an error is recoverable or not.  This might include waiting X minutes,
> and then deciding that the fibre channel connection isn't coming back,
> and then turning it into an unrecoverable error.  Or for other
> devices, the timeout might be much smaller.
>
> Which is fine --- I think that's where the decision ought to live, and
> if users want to tune a different timeout before the driver stops
> waiting, that should be between the system administrator and the
> device driver /sys tuning knob.

Completely agree.  Whether a particular condition should be treated as
recoverable or unrecoverable is a question and that driver authors and
sysadmins could reasonably provide input to.
But once that decision has been made, the application must accept the
decision.  EIO means unrecoverable.  There is never any point retrying.
Recoverable manifests as a hang, awaiting recovery.

I recently noticed that PG_error is effectively meaningless for write
errors.  filemap_fdatawait_range() can clear it, and the return value is
often ignored. AS_EIO is the really meaningful flag for write errors,
and it is per-file, not per-page.

>
>> >> When combined with O_DIRECT, it effectively means "no retries".  For
>> >> block devices and files backed by block devices,
>> >> REQ_FAILFAST_DEV|REQ_FAILFAST_TRANSPORT is used and a failure will be
>> >> reported as EWOULDBLOCK, unless it is obvious that retrying wouldn't
>> >> help.
>
> Absolutely no retries?  Even TCP retries in the case of iSCSI?  I
> don't think turning every TCP packet drop into EWOULDBLOCK would make
> sense under any circumstances.  What might make sense is to have a
> "short timeout" where it's up to the block device to decide what
> "short timeout" means.

The implemented semantics of REQ_FAILFAST_* are to disable retries on
certain types of fail.  That is what I was meaning to refer to.
There are retries are many levels in the protocol stack, from the
collision detection retries at the data-link layer, to packet-level and
connection level and command level.  Some have predefined timeouts and
should be left alone.  Others have no timeouts and need to be disabled.
There are probably others in the middle.
I was looking for a semantic that could be implemented on top of current
interfaces, which means working with the REQ_FAILFAST_* semantic.

>
> EWOULDBLOCK is also a little misleading, because even if the I/O
> request is submitted immediately to the block device and immediately
> serviced and returned, the I/O request would still be "blocking".
> Maybe ETIMEDOUT instead?

Maybe - I won't argue.

>
>> And aio_write() isn't non-blocking for O_DIRECT already because .... oh,
>> it doesn't even try.  Is there something intrinsically hard about async
>> O_DIRECT writes, or is it just that no-one has written acceptable code
>> yet?
>
> AIO/DIO writes can indeed be non-blocking, if the file system doesn't
> need to do any metadata operations.  So if the file is preallocated,
> you should be able to issue an async DIO write without losing the CPU.

Yes, I see that now.  I misread some of the code.
Thanks.

NeilBrown


>
>> A truly async O_DIRECT aio_write() combined with a working io_cancel()
>> would probably be sufficient.  The block layer doesn't provide any way
>> to cancel a bio though, so that would need to be wired up.
>
> Kent Overstreet worked up io_cancel for AIO/DIO writes when he was at
> Google.  As I recall the patchset did get posted a few times, but it
> never ended up getted accepted for upstream adoption.
>
> We even had some very rough code that would propagate the cancellation
> request to the hard drive, for those hard drives that had a facility
> for accepting a cancellation request for an I/O which was queued via
> NCQ but which hadn't executed yet.  It sort-of worked, but it never
> hit a state where it could be published before the project was
> abandoned.
>
> 						- Ted

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  reply	other threads:[~2017-01-27  6:04 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-10 16:02 [LSF/MM TOPIC] I/O error handling and fsync() Kevin Wolf
2017-01-11  0:41 ` NeilBrown
2017-01-13 11:09   ` Kevin Wolf
2017-01-13 14:21     ` Theodore Ts'o
2017-01-13 16:00       ` Kevin Wolf
2017-01-13 22:28         ` NeilBrown
2017-01-14  6:18           ` Darrick J. Wong
2017-01-16 12:14           ` [Lsf-pc] " Jeff Layton
2017-01-22 22:44             ` NeilBrown
2017-01-22 23:31               ` Jeff Layton
2017-01-23  0:21                 ` Theodore Ts'o
2017-01-23 10:09                   ` Kevin Wolf
2017-01-23 12:10                     ` Jeff Layton
2017-01-23 17:25                       ` Theodore Ts'o
2017-01-23 17:53                         ` Chuck Lever
2017-01-23 22:40                         ` Jeff Layton
2017-01-23 22:35                     ` Jeff Layton
2017-01-23 23:09                       ` Trond Myklebust
2017-01-24  0:16                         ` NeilBrown
2017-01-24  0:46                           ` Jeff Layton
2017-01-24 21:58                             ` NeilBrown
2017-01-25 13:00                               ` Jeff Layton
2017-01-30  5:30                                 ` NeilBrown
2017-01-24  3:34                           ` Trond Myklebust
2017-01-25 18:35                             ` Theodore Ts'o
2017-01-26  0:36                               ` NeilBrown
2017-01-26  9:25                                 ` Jan Kara
2017-01-26 22:19                                   ` NeilBrown
2017-01-27  3:23                                     ` Theodore Ts'o
2017-01-27  6:03                                       ` NeilBrown [this message]
2017-01-30 16:04                                       ` Jan Kara
2017-01-13 18:40     ` Al Viro
2017-01-13 19:06       ` Kevin Wolf
2017-01-11  5:03 ` Theodore Ts'o
2017-01-11  9:47   ` [Lsf-pc] " Jan Kara
2017-01-11 15:45     ` Theodore Ts'o
2017-01-11 10:55   ` Chris Vest
2017-01-11 11:40   ` Kevin Wolf
2017-01-13  4:51     ` NeilBrown
2017-01-13 11:51       ` Kevin Wolf
2017-01-13 21:55         ` NeilBrown
2017-01-11 12:14   ` Chris Vest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a8adt4yb.fsf@notabene.neil.brown.name \
    --to=neilb@suse.com \
    --cc=hch@infradead.org \
    --cc=jack@suse.cz \
    --cc=jlayton@poochiereds.net \
    --cc=kwolf@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=riel@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=trondmy@primarydata.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).