linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Theodore Ts'o <tytso@mit.edu>
Cc: NeilBrown <neilb@suse.com>,
	lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, Christoph Hellwig <hch@infradead.org>,
	Ric Wheeler <rwheeler@redhat.com>, Rik van Riel <riel@redhat.com>
Subject: Re: [LSF/MM TOPIC] I/O error handling and fsync()
Date: Fri, 13 Jan 2017 17:00:22 +0100	[thread overview]
Message-ID: <20170113160022.GC4981@noname.redhat.com> (raw)
In-Reply-To: <20170113142154.iycjjhjujqt5u2ab@thunk.org>

Am 13.01.2017 um 15:21 hat Theodore Ts'o geschrieben:
> On Fri, Jan 13, 2017 at 12:09:59PM +0100, Kevin Wolf wrote:
> > Now even if at the moment there were no storage backend where a write
> > failure can be temporary (which I find hard to believe, but who knows),
> > a single new driver is enough to expose the problem. Are you confident
> > enough that no single driver will ever behave this way to make data
> > integrity depend on the assumption?
> 
> This is really a philosophical question.  It very much simplifiees
> things if we can make the assumption that a driver that *does* behave
> this way is **broken**.  If the I/O error is temporary, then the
> driver should simply not complete the write, and wait.

If we are sure that (at least we make it so that) every error is
permanent, then yes, this simplifies things a bit because it saves you
the retries that we know wouldn't succeed anyway.

In that case, what's possibly left is modifying fsync() so that it
consistently returns an error; or if not, we need to promise this
behaviour to userspace so that on the first fsync() failure it can give
up on the file without doing less for the user than it could do.

> If it fails, it should only be because it has timed out on waiting and
> has assumed that the problem is permanent.

If a manual action is required to restore the functionality, how can you
use a timeout for determining whether a problem is permanent or not?

This is exactly the kind of errors from which we want to recover in
qemu instead of killing the VMs. Assuming that errors are permanent when
they aren't, but just require some action before they can succeed, is
not a solution to the problem, but it's pretty much the description of
the problem that we had before we implemented the retry logic.

So if you say that all errors are permanent, fine; but if some of them
are actually temporary, we're back to square one.

> Otherwise, every single application is going to have to learn how to
> deal with temporary errors, and everything that implies (throwing up
> dialog boxes to the user, who may not be able to do anything

Yes, that's obviously not a realistic option.

> --- this is why in the dm-thin case, if you think it should be
> temporary, dm-thin should be calling out to a usr space program that
> pages an system administrator; why do you think the process or the
> user who started the process can do anything about it/)

In the case of qemu, we can't do anything about it in terms of making
the request work, but we can do something useful with the information:
We limit the damage done, by pausing the VM and preventing it from
seeing a broken hard disk from which it wouldn't recover without a
reboot. So in our case, both the system administrator and the process
want to be informed.

A timeout could serve as a trigger for qemu, but we could possibly do
better for things like the dm-thin case where we know immediately that
we'll have to wait for manual action.

> Now, perhaps there ought to be a way for the application to say, "you
> know, if you are going to have to wait more than <timeval>, don't
> bother".  This might be interesting from a general sense, even for
> working hardware, since there are HDD's with media extensions where
> you can tell the disk drive not to bother with the I/O operation if
> it's going to take more than XX milliseconds, and if there is a way to
> reflect that back to userspace, that can be useful for other
> applications, such as video or other soft realtime programs.
> 
> But forcing every single application to have to deal with retries in
> the case of temporary errors?  That way lies madness, and there's no
> way we can get to all of the applications to make them do the right
> thing.

Agree on both points.

> > Note that I didn't think of a "keep-data-after-write-error" flag,
> > neither per-fd nor per-file, because I assumed that everyone would want
> > it as long as there is some hope that the data could still be
> > successfully written out later.
> 
> But not everyone is going to know to do this.  This is why the retry
> really should be done by the device driver, and if it fails, everyone
> lives will be much simpler if the failure should be a permanent
> failure where there is no hope.
> 
> Are there use cases you are concerned about where this model wouldn't
> suit?

If, and only if, all permanent errors are actually permanent, I think
this works.

Of course, this makes handling hanging requests even more important for
us. We have certain places where we want to get to a clean state with no
pending requests. We could probably use timeouts in userspace, but we
would also want to get the thread doing the syscall unstuck and ideally
be sure that the kernel doesn't still try changing the file behind our
back (maybe the latter part is only thinkable with direct I/O, though).

In other words, we're the only user of a file and we want to cancel
hanging I/O syscalls. I think we once came to the conclusion that this
isn't currently possible, but it's been a while...

Kevin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-01-13 16:00 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-10 16:02 [LSF/MM TOPIC] I/O error handling and fsync() Kevin Wolf
2017-01-11  0:41 ` NeilBrown
2017-01-13 11:09   ` Kevin Wolf
2017-01-13 14:21     ` Theodore Ts'o
2017-01-13 16:00       ` Kevin Wolf [this message]
2017-01-13 22:28         ` NeilBrown
2017-01-14  6:18           ` Darrick J. Wong
2017-01-16 12:14           ` [Lsf-pc] " Jeff Layton
2017-01-22 22:44             ` NeilBrown
2017-01-22 23:31               ` Jeff Layton
2017-01-23  0:21                 ` Theodore Ts'o
2017-01-23 10:09                   ` Kevin Wolf
2017-01-23 12:10                     ` Jeff Layton
2017-01-23 17:25                       ` Theodore Ts'o
2017-01-23 17:53                         ` Chuck Lever
2017-01-23 22:40                         ` Jeff Layton
2017-01-23 22:35                     ` Jeff Layton
2017-01-23 23:09                       ` Trond Myklebust
2017-01-24  0:16                         ` NeilBrown
2017-01-24  0:46                           ` Jeff Layton
2017-01-24 21:58                             ` NeilBrown
2017-01-25 13:00                               ` Jeff Layton
2017-01-30  5:30                                 ` NeilBrown
2017-01-24  3:34                           ` Trond Myklebust
2017-01-25 18:35                             ` Theodore Ts'o
2017-01-26  0:36                               ` NeilBrown
2017-01-26  9:25                                 ` Jan Kara
2017-01-26 22:19                                   ` NeilBrown
2017-01-27  3:23                                     ` Theodore Ts'o
2017-01-27  6:03                                       ` NeilBrown
2017-01-30 16:04                                       ` Jan Kara
2017-01-13 18:40     ` Al Viro
2017-01-13 19:06       ` Kevin Wolf
2017-01-11  5:03 ` Theodore Ts'o
2017-01-11  9:47   ` [Lsf-pc] " Jan Kara
2017-01-11 15:45     ` Theodore Ts'o
2017-01-11 10:55   ` Chris Vest
2017-01-11 11:40   ` Kevin Wolf
2017-01-13  4:51     ` NeilBrown
2017-01-13 11:51       ` Kevin Wolf
2017-01-13 21:55         ` NeilBrown
2017-01-11 12:14   ` Chris Vest

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170113160022.GC4981@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=hch@infradead.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=neilb@suse.com \
    --cc=riel@redhat.com \
    --cc=rwheeler@redhat.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).