From: Jeff Layton <jlayton@redhat.com>
To: Nikolay Borisov <nborisov@suse.com>, linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, linux-ext4@vger.kernel.org,
akpm@linux-foundation.org, tytso@mit.edu, jack@suse.cz,
willy@infradead.org, neilb@suse.com
Subject: Re: [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting
Date: Mon, 03 Apr 2017 06:28:59 -0400 [thread overview]
Message-ID: <1491215339.2724.4.camel@redhat.com> (raw)
In-Reply-To: <3b4d45be-ef00-04a5-5858-126c5aeaa76f@suse.com>
On Mon, 2017-04-03 at 10:12 +0300, Nikolay Borisov wrote:
>
> On 31.03.2017 22:26, Jeff Layton wrote:
> > Most filesystems currently use mapping_set_error and
> > filemap_check_errors for setting and reporting/clearing writeback errors
> > at the mapping level. filemap_check_errors is indirectly called from
> > most of the filemap_fdatawait_* functions and from
> > filemap_write_and_wait*. These functions are called from all sorts of
> > contexts to wait on writeback to finish -- e.g. mostly in fsync, but
> > also in truncate calls, getattr, etc.
> >
> > It's those non-fsync callers that are problematic. We should be
> > reporting writeback errors during fsync, but many places in the code
> > clear out errors before they can be properly reported, or report errors
> > at nonsensical times. If I get -EIO on a stat() call, how do I know that
> > was because writeback failed?
> >
> > This patch adds a small bit of new infrastructure for setting and
> > reporting errors during pagecache writeback. While the above was my
> > original impetus for adding this, I think it's also the case that
> > current fsync semantics are just problematic for userland. Most
> > applications that call fsync do so to ensure that the data they wrote
> > has hit the backing store.
> >
> > In the case where there are multiple writers to the file at the same
> > time, this is really hard to determine. The first one to call fsync will
> > see any stored error, and the rest get back 0. The processes with open
> > fd may not be associated with one another in any way. They could even be
> > in different containers, so ensuring coordination between all fsync
> > callers is not really an option.
> >
> > One way to remedy this would be to track what file descriptor was used
> > to dirty the file, but that's rather cumbersome and would likely be
> > slow. However, there is a simpler way to improve the semantics here
> > without incurring too much overhead.
> >
> > This set adds a wb_error field and a sequence counter to the
> > address_space, and a corresponding sequence counter in the struct file.
> > When errors are reported during writeback, we set the error field in the
> > mapping and increment the sequence counter.
> >
> > When fsync or flush is called, we check the sequence in the file vs. the
> > one in the mapping. If the file's counter is behind the one in the
> > mapping, then we update the sequence counter in the file to the value of
> > the one in the mapping and report the error. If the file is "caught up"
> > then we just report 0.
> >
> > This changes the semantics of fsync such that applications can now use
> > it to determine whether there were any writeback errors since fsync(fd)
> > was last called (or since the file was opened in the case of fsync
> > having never been called).
> >
> > Note that those writeback errors may have occurred when writing data
> > that was dirtied via an entirely different fd, but that's the case now
> > with the current mapping_set_error/filemap_check_error infrastructure.
> > This will at least prevent you from getting a false report of success.
> >
> > The basic idea here is for filesystems to use filemap_set_wb_error to
> > set the error in the mapping when there are writeback errors, and then
> > have the fsync and flush operations call filemap_report_wb_error just
> > before returning to ensure that those errors get reported properly.
> >
> > Eventually, it may make sense to move the reporting into the generic
> > vfs_fsync_range helper, but doing it this way for now makes it simpler
> > to convert filesystems to the new API individually.
>
> There is already a mapping_set_error API which sets flags in
> mapping->flags (AS_EIO/AS_ENOSPC). Aren't you essentially duplicating
> some of the semantics of that API ?
Yes, more or less for now. The arguments of mapping_set_error and
filemap_set_wb_error are the same, but they do different things with the
error.
The plan is eventually to eliminate mapping_set_error and convert
everything over to use the new infrastructure I'm adding. That's
difficult to do all at once however, so for now some duplication is
necessary.
--
Jeff Layton <jlayton@redhat.com>
next prev parent reply other threads:[~2017-04-03 10:28 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-31 19:25 [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-04-03 7:12 ` Nikolay Borisov
2017-04-03 10:28 ` Jeff Layton [this message]
2017-04-03 14:47 ` Matthew Wilcox
2017-04-03 15:19 ` Jeff Layton
2017-04-03 16:15 ` Matthew Wilcox
2017-04-03 16:30 ` Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 2/4] dax: set errors in mapping when writeback fails Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 3/4] buffer: set wb errors using both new and old infrastructure for now Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 4/4] ext4: wire it up to the new writeback error reporting infrastructure Jeff Layton
2017-04-03 4:25 ` [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it NeilBrown
2017-04-03 10:28 ` Jeff Layton
2017-04-03 14:32 ` Matthew Wilcox
2017-04-03 17:47 ` Jeff Layton
2017-04-03 18:09 ` Jeremy Allison
2017-04-03 18:18 ` Jeff Layton
2017-04-03 18:36 ` Jeremy Allison
2017-04-03 18:40 ` Jeremy Allison
2017-04-03 18:49 ` Jeff Layton
2017-04-03 19:16 ` Matthew Wilcox
2017-04-03 20:16 ` Jeff Layton
2017-04-04 2:45 ` Matthew Wilcox
2017-04-04 3:03 ` NeilBrown
2017-04-04 11:41 ` Jeff Layton
2017-04-04 22:41 ` NeilBrown
2017-04-04 11:53 ` Matthew Wilcox
2017-04-04 12:17 ` Jeff Layton
2017-04-04 16:12 ` Matthew Wilcox
2017-04-04 16:25 ` Jeff Layton
2017-04-04 17:09 ` Matthew Wilcox
2017-04-04 18:08 ` Jeff Layton
2017-04-04 22:50 ` NeilBrown
2017-04-05 19:49 ` Jeff Layton
2017-04-05 21:03 ` Matthew Wilcox
2017-04-06 0:19 ` NeilBrown
2017-04-06 0:02 ` NeilBrown
2017-04-06 2:55 ` Matthew Wilcox
2017-04-06 5:12 ` NeilBrown
2017-04-06 13:31 ` Matthew Wilcox
2017-04-06 21:53 ` NeilBrown
2017-04-06 14:02 ` Jeff Layton
2017-04-06 19:14 ` Jeff Layton
2017-04-06 20:05 ` Matthew Wilcox
2017-04-07 13:12 ` Jeff Layton
2017-04-09 23:15 ` NeilBrown
2017-04-10 13:19 ` Jeff Layton
2017-04-06 22:15 ` NeilBrown
2017-04-04 23:13 ` NeilBrown
2017-04-05 11:14 ` Jeff Layton
2017-04-06 0:24 ` NeilBrown
2017-04-04 13:38 ` Theodore Ts'o
2017-04-04 22:28 ` NeilBrown
2017-04-03 14:51 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1491215339.2724.4.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=nborisov@suse.com \
--cc=neilb@suse.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).