linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: NeilBrown <neilb@suse.com>
Cc: Jeff Layton <jlayton@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-ext4@vger.kernel.org, akpm@linux-foundation.org,
	tytso@mit.edu, jack@suse.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it
Date: Tue, 4 Apr 2017 04:53:58 -0700	[thread overview]
Message-ID: <20170404115358.GH30811@bombadil.infradead.org> (raw)
In-Reply-To: <87h924kh6t.fsf@notabene.neil.brown.name>

On Tue, Apr 04, 2017 at 01:03:22PM +1000, NeilBrown wrote:
> On Mon, Apr 03 2017, Jeff Layton wrote:
> 
> > On Mon, 2017-04-03 at 12:16 -0700, Matthew Wilcox wrote:
> >> So, OK, that makes sense, we should keep allowing filesystems to report
> >> ENOSPC as a writeback error.  But I think much of the argument below
> >> still holds, and we should continue to have a prior EIO to be reported
> >> over a new ENOSPC (even if the program has already consumed the EIO).
> >
> > I'm fine with that (though I'd like Neil's thoughts before we decide
> > anything) there.
> 
> I'd like there be a well defined time when old errors were forgotten.
> It does make sense for EIO to persist even if ENOSPC or EDQUOT is
> received, but not forever.
> Clearing the remembered errors when put_write_access() causes
> i_writecount to reach zero is one option (as suggested), but I'm not
> sure I'm happy with it.
> 
> Local filesystems, or network filesystems which receive strong write
> delegations, should only ever return EIO to fsync.  We should
> concentrate on them first, I think.  As there is only one possible
> error, the seq counter is sufficient to "clear" it once it has been
> reported to fsync() (or write()?).
> 
> Other network filesystems could return a whole host of errors: ENOSPC
> EDQUOT ESTALE EPERM EFBIG ...
> Do we want to limit exactly which errors are allowed in generic code, or
> do we just support EIO generically and expect the filesystem to sort out
> the details for anything else?

I'd like us to focus on our POSIX compliance here and not return
arbitrary errors.  The relevant pages are here:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/fsync.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html

For close(), we have to map every error to EIO.
For fsync(), we can return any error that write() could have.  That limits
us to:

EFBIG ENOSPC EIO ENOBUFS ENXIO

I think EFBIG really isn't a writeback error; are there any network
filesystems that don't know the file size limit at the time they accept
the original write?  ENOBUFS seems like a transient error (*this* call to
fsync() failed, but the next one may succeed ... it's the equivalent of
ENOMEM).  ENXIO seems to me like it's a submission error, not a writeback
error.  So that leaves us with ENOSPC and EIO, as we have support today.

> One possible approach a filesystem could take is just to allow a single
> async writeback error.  After that error, all subsequent write()
> system calls become synchronous. As write() or fsync() is called on each
> file descriptor (which could possibly have sent the write which caused
> the error), an error is returned and that fact is counted.  Once we have
> returned as many errors as there are open file descriptors
> (i_writecount?), and have seen a successful write, the filesystem
> forgets all recorded errors and switches back to async writes (for that
> inode).   NFS does this switch-to-sync-on-error.  See nfs_need_check_write().
> 
> The "which could possibly have sent the write which caused the error" is
> an explicit reference to NFS.  NFS doesn't use the AS_EIO/AS_ENOSPC
> flags to return async errors.  It allocates an nfs_open_context for each
> user who opens a given inode, and stores an error in there.  Each dirty
> pages is associated with one of these, so errors a sure to go to the
> correct user, though not necessarily the correct fd at present.

... and you need the nfs_open_context in order to use the correct
credentials when writing a page to the server, correct?

> When we specify the new behaviour we should be careful to be as vague as
> possible while still saying what we need.  This allows filesystems some
> flexibility.
> 
>   If an error happens during writeback, the next write() or fsync() (or
>   ....) on the file descriptor to which data was written will return -1
>   with errno set to EIO or some other relevant error.  Other file
>   descriptors open on the same file may receive EIO or some other error
>   on a subsequent appropriate system call.
>   It should not be assumed that close() will return an error.  fsync()
>   must be called before close() if writeback errors are important to the
>   application.

Thanks for explaining what NFS does today.

  parent reply	other threads:[~2017-04-04 11:54 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31 19:25 [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-04-03  7:12   ` Nikolay Borisov
2017-04-03 10:28     ` Jeff Layton
2017-04-03 14:47   ` Matthew Wilcox
2017-04-03 15:19     ` Jeff Layton
2017-04-03 16:15       ` Matthew Wilcox
2017-04-03 16:30         ` Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 2/4] dax: set errors in mapping when writeback fails Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 3/4] buffer: set wb errors using both new and old infrastructure for now Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 4/4] ext4: wire it up to the new writeback error reporting infrastructure Jeff Layton
2017-04-03  4:25 ` [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it NeilBrown
2017-04-03 10:28   ` Jeff Layton
2017-04-03 14:32     ` Matthew Wilcox
2017-04-03 17:47       ` Jeff Layton
2017-04-03 18:09         ` Jeremy Allison
2017-04-03 18:18           ` Jeff Layton
2017-04-03 18:36             ` Jeremy Allison
2017-04-03 18:40               ` Jeremy Allison
2017-04-03 18:49                 ` Jeff Layton
2017-04-03 19:16         ` Matthew Wilcox
2017-04-03 20:16           ` Jeff Layton
2017-04-04  2:45             ` Matthew Wilcox
2017-04-04  3:03             ` NeilBrown
2017-04-04 11:41               ` Jeff Layton
2017-04-04 22:41                 ` NeilBrown
2017-04-04 11:53               ` Matthew Wilcox [this message]
2017-04-04 12:17                 ` Jeff Layton
2017-04-04 16:12                   ` Matthew Wilcox
2017-04-04 16:25                     ` Jeff Layton
2017-04-04 17:09                       ` Matthew Wilcox
2017-04-04 18:08                         ` Jeff Layton
2017-04-04 22:50                         ` NeilBrown
2017-04-05 19:49                         ` Jeff Layton
2017-04-05 21:03                           ` Matthew Wilcox
2017-04-06  0:19                             ` NeilBrown
2017-04-06  0:02                           ` NeilBrown
2017-04-06  2:55                             ` Matthew Wilcox
2017-04-06  5:12                               ` NeilBrown
2017-04-06 13:31                                 ` Matthew Wilcox
2017-04-06 21:53                                   ` NeilBrown
2017-04-06 14:02                             ` Jeff Layton
2017-04-06 19:14                             ` Jeff Layton
2017-04-06 20:05                               ` Matthew Wilcox
2017-04-07 13:12                                 ` Jeff Layton
2017-04-09 23:15                                   ` NeilBrown
2017-04-10 13:19                                     ` Jeff Layton
2017-04-06 22:15                               ` NeilBrown
2017-04-04 23:13                       ` NeilBrown
2017-04-05 11:14                         ` Jeff Layton
2017-04-06  0:24                           ` NeilBrown
2017-04-04 13:38                 ` Theodore Ts'o
2017-04-04 22:28                 ` NeilBrown
2017-04-03 14:51   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170404115358.GH30811@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).