From: Jeff Layton <jlayton@redhat.com>
To: Matthew Wilcox <willy@infradead.org>, NeilBrown <neilb@suse.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-ext4@vger.kernel.org, akpm@linux-foundation.org,
tytso@mit.edu, jack@suse.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it
Date: Tue, 04 Apr 2017 08:17:48 -0400 [thread overview]
Message-ID: <1491308268.20445.4.camel@redhat.com> (raw)
In-Reply-To: <20170404115358.GH30811@bombadil.infradead.org>
On Tue, 2017-04-04 at 04:53 -0700, Matthew Wilcox wrote:
> On Tue, Apr 04, 2017 at 01:03:22PM +1000, NeilBrown wrote:
> > On Mon, Apr 03 2017, Jeff Layton wrote:
> >
> > > On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
> > > > So, OK, that makes sense, we should keep allowing filesystems to report
> > > > ENOSPC as a writeback error. But I think much of the argument below
> > > > still holds, and we should continue to have a prior EIO to be reported
> > > > over a new ENOSPC (even if the program has already consumed the EIO).
> > >
> > > I'm fine with that (though I'd like Neil's thoughts before we decide
> > > anything) there.
> >
> > I'd like there be a well defined time when old errors were forgotten.
> > It does make sense for EIO to persist even if ENOSPC or EDQUOT is
> > received, but not forever.
> > Clearing the remembered errors when put_write_access() causes
> > i_writecount to reach zero is one option (as suggested), but I'm not
> > sure I'm happy with it.
> >
> > Local filesystems, or network filesystems which receive strong write
> > delegations, should only ever return EIO to fsync. We should
> > concentrate on them first, I think. As there is only one possible
> > error, the seq counter is sufficient to "clear" it once it has been
> > reported to fsync() (or write()?).
> >
> > Other network filesystems could return a whole host of errors: ENOSPC
> > EDQUOT ESTALE EPERM EFBIG ...
> > Do we want to limit exactly which errors are allowed in generic code, or
> > do we just support EIO generically and expect the filesystem to sort out
> > the details for anything else?
>
> I'd like us to focus on our POSIX compliance here and not return
> arbitrary errors. The relevant pages are here:
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html
>
> For close(), we have to map every error to EIO.
> For fsync(), we can return any error that write() could have. That limits
> us to:
>
> EFBIG ENOSPC EIO ENOBUFS ENXIO
>
> I think EFBIG really isn't a writeback error; are there any network
> filesystems that don't know the file size limit at the time they accept
> the original write? ENOBUFS seems like a transient error (*this* call to
> fsync() failed, but the next one may succeed ... it's the equivalent of
> ENOMEM). ENXIO seems to me like it's a submission error, not a writeback
> error. So that leaves us with ENOSPC and EIO, as we have support today.
>
Agreed that we should focus on POSIX compliance. I'll also note that
POSIX states:
"If more than one error occurs in processing a function call, any one
of the possible errors may be returned, as the order of
detection is undefined."
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_03
So, I'd like to push back on this idea that we need to prefer reporting
-EIO over other errors. POSIX certainly doesn't mandate that.
If we agree that that is the case, then I think the simplest thing to
do here would be to clear the other error flag(s) when we get a new
error, such that we only preserve the latest one. With that, we also
wouldn't need to clear anything out when i_writecount goes to zero
either. It would "just work" without that.
> > One possible approach a filesystem could take is just to allow a single
> > async writeback error. After that error, all subsequent write()
> > system calls become synchronous. As write() or fsync() is called on each
> > file descriptor (which could possibly have sent the write which caused
> > the error), an error is returned and that fact is counted. Once we have
> > returned as many errors as there are open file descriptors
> > (i_writecount?), and have seen a successful write, the filesystem
> > forgets all recorded errors and switches back to async writes (for that
> > inode). NFS does this switch-to-sync-on-error. See nfs_need_check_write().
> >
> > The "which could possibly have sent the write which caused the error" is
> > an explicit reference to NFS. NFS doesn't use the AS_EIO/AS_ENOSPC
> > flags to return async errors. It allocates an nfs_open_context for each
> > user who opens a given inode, and stores an error in there. Each dirty
> > pages is associated with one of these, so errors a sure to go to the
> > correct user, though not necessarily the correct fd at present.
>
> ... and you need the nfs_open_context in order to use the correct
> credentials when writing a page to the server, correct?
>
Yes, and it is expensive. I don't think we want to do that at the
generic VFS layer if we can at all help it.
> > When we specify the new behaviour we should be careful to be as vague as
> > possible while still saying what we need. This allows filesystems some
> > flexibility.
> >
> > If an error happens during writeback, the next write() or fsync() (or
> > ....) on the file descriptor to which data was written will return -1
> > with errno set to EIO or some other relevant error. Other file
> > descriptors open on the same file may receive EIO or some other error
> > on a subsequent appropriate system call.
> > It should not be assumed that close() will return an error. fsync()
> > must be called before close() if writeback errors are important to the
> > application.
>
...and I also agree that we leave as much grey area as possible here to
allow for a wide range of implementations.
--
Jeff Layton <jlayton@redhat.com>
next prev parent reply other threads:[~2017-04-04 12:17 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-03-31 19:25 [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-04-03 7:12 ` Nikolay Borisov
2017-04-03 10:28 ` Jeff Layton
2017-04-03 14:47 ` Matthew Wilcox
2017-04-03 15:19 ` Jeff Layton
2017-04-03 16:15 ` Matthew Wilcox
2017-04-03 16:30 ` Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 2/4] dax: set errors in mapping when writeback fails Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 3/4] buffer: set wb errors using both new and old infrastructure for now Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 4/4] ext4: wire it up to the new writeback error reporting infrastructure Jeff Layton
2017-04-03 4:25 ` [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it NeilBrown
2017-04-03 10:28 ` Jeff Layton
2017-04-03 14:32 ` Matthew Wilcox
2017-04-03 17:47 ` Jeff Layton
2017-04-03 18:09 ` Jeremy Allison
2017-04-03 18:18 ` Jeff Layton
2017-04-03 18:36 ` Jeremy Allison
2017-04-03 18:40 ` Jeremy Allison
2017-04-03 18:49 ` Jeff Layton
2017-04-03 19:16 ` Matthew Wilcox
2017-04-03 20:16 ` Jeff Layton
2017-04-04 2:45 ` Matthew Wilcox
2017-04-04 3:03 ` NeilBrown
2017-04-04 11:41 ` Jeff Layton
2017-04-04 22:41 ` NeilBrown
2017-04-04 11:53 ` Matthew Wilcox
2017-04-04 12:17 ` Jeff Layton [this message]
2017-04-04 16:12 ` Matthew Wilcox
2017-04-04 16:25 ` Jeff Layton
2017-04-04 17:09 ` Matthew Wilcox
2017-04-04 18:08 ` Jeff Layton
2017-04-04 22:50 ` NeilBrown
2017-04-05 19:49 ` Jeff Layton
2017-04-05 21:03 ` Matthew Wilcox
2017-04-06 0:19 ` NeilBrown
2017-04-06 0:02 ` NeilBrown
2017-04-06 2:55 ` Matthew Wilcox
2017-04-06 5:12 ` NeilBrown
2017-04-06 13:31 ` Matthew Wilcox
2017-04-06 21:53 ` NeilBrown
2017-04-06 14:02 ` Jeff Layton
2017-04-06 19:14 ` Jeff Layton
2017-04-06 20:05 ` Matthew Wilcox
2017-04-07 13:12 ` Jeff Layton
2017-04-09 23:15 ` NeilBrown
2017-04-10 13:19 ` Jeff Layton
2017-04-06 22:15 ` NeilBrown
2017-04-04 23:13 ` NeilBrown
2017-04-05 11:14 ` Jeff Layton
2017-04-06 0:24 ` NeilBrown
2017-04-04 13:38 ` Theodore Ts'o
2017-04-04 22:28 ` NeilBrown
2017-04-03 14:51 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1491308268.20445.4.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=jack@suse.cz \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=neilb@suse.com \
--cc=tytso@mit.edu \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).