linux-ext4.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Matthew Wilcox <willy@infradead.org>
To: Jeff Layton <jlayton@redhat.com>
Cc: NeilBrown <neilb@suse.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-ext4@vger.kernel.org, akpm@linux-foundation.org,
	tytso@mit.edu, jack@suse.cz
Subject: Re: [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it
Date: Mon, 3 Apr 2017 12:16:02 -0700	[thread overview]
Message-ID: <20170403191602.GF30811@bombadil.infradead.org> (raw)
In-Reply-To: <1491241657.2673.10.camel@redhat.com>

On Mon, Apr 03, 2017 at 01:47:37PM -0400, Jeff Layton wrote:
> > I wonder whether it's even worth supporting both EIO and ENOSPC for a
> > writeback problem.  If I understand correctly, at the time of write(),
> > filesystems check to see if they have enough blocks to satisfy the
> > request, so ENOSPC only comes up in the writeback context for thinly
> > provisioned devices.
> 
> No, ENOSPC on writeback can certainly happen with network filesystems.
> NFS and CIFS have no way to reserve space. You wouldn't want to have to
> do an extra RPC on every buffered write. :)

Aaah, yes, network filesystems.  I would indeed not want to do an extra
RPC on every write to a hole (it's a hole vs non-hole question, rather
than a buffered/unbuffered question ... unless you're WAFLing and not
reclaiming quickly enough, I suppose).

So, OK, that makes sense, we should keep allowing filesystems to report
ENOSPC as a writeback error.  But I think much of the argument below
still holds, and we should continue to have a prior EIO to be reported
over a new ENOSPC (even if the program has already consumed the EIO).

If you find that unconvincing, we could do something like this ...

void filemap_set_wb_error(struct address_space *mapping, int err)
{
	struct inode *inode = mapping->host;

	if (!err)
		return;
	/*
	 * This should be called with the error code that we want to return
	 * on fsync. Thus, it should always be <= 0.
	 */
	WARN_ON(err > 0);

	spin_lock(&inode->i_lock);
	if (err == -EIO)
		mapping->wb_err |= 1;
	else if (err == -ENOSPC)
		mapping->wb_err |= 2;
	mapping->wb_err += 4;
	spin_unlock(&inode->i_lock);
}

int filemap_report_wb_error(struct file *file)
{
	struct inode *inode = file_inode(file);
	struct address_space *mapping = file->f_mapping;
	int err;

	spin_lock(&inode->i_lock);
	if (file->f_wb_err == mapping->wb_err) {
		err = 0;
	} else if (mapping->wb_err & 1) {
		filp->f_wb_err = mapping->wb_err & ~2;
		err = -EIO;
	} else {
		filp->f_wb_err = mapping->wb_err;
		err = -ENOSPC;
	}
	spin_unlock(&inode->i_lock);
	return err;
}

If I got that right, calling fsync() on an inode which has experienced
both errors would first get an EIO.  Calling fsync() on it again would
get an ENOSPC.  Calling fsync() on it a third time would get 0.  When
either error occurs again, the thread will go back through the cycle
(EIO -> ENOSPC -> 0).

> > Programs have basically no use for the distinction.  In either case,
> > the situation is the same.  The written data is safely in RAM and cannot
> > be written to the storage.  If one were to make superhuman efforts,
> > one could mmap the file and write() it to a different device, but that
> > is incredibly rare.  For most programs, the response is to just die and
> > let the human deal with the corrupted file.
> > 
> > From a sysadmin point of view, of course the situation is different,
> > and the remedy is different, but they should be getting that information
> > through a different mechanism than monitoring the errno from every
> > system call.
> > 
> > If we do want to continue to support both EIO and ENOSPC from writeback,
> > then let's have EIO override ENOSPC as an error.  ie if an ENOSPC comes
> > in after an EIO is set, it only bumps the counter and applications will
> > see EIO, not ENOSPC on fresh calls to fsync().

  parent reply	other threads:[~2017-04-03 19:16 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-31 19:25 [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 1/4] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-04-03  7:12   ` Nikolay Borisov
2017-04-03 10:28     ` Jeff Layton
2017-04-03 14:47   ` Matthew Wilcox
2017-04-03 15:19     ` Jeff Layton
2017-04-03 16:15       ` Matthew Wilcox
2017-04-03 16:30         ` Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 2/4] dax: set errors in mapping when writeback fails Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 3/4] buffer: set wb errors using both new and old infrastructure for now Jeff Layton
2017-03-31 19:26 ` [RFC PATCH 4/4] ext4: wire it up to the new writeback error reporting infrastructure Jeff Layton
2017-04-03  4:25 ` [RFC PATCH 0/4] fs: introduce new writeback error tracking infrastructure and convert ext4 to use it NeilBrown
2017-04-03 10:28   ` Jeff Layton
2017-04-03 14:32     ` Matthew Wilcox
2017-04-03 17:47       ` Jeff Layton
2017-04-03 18:09         ` Jeremy Allison
2017-04-03 18:18           ` Jeff Layton
2017-04-03 18:36             ` Jeremy Allison
2017-04-03 18:40               ` Jeremy Allison
2017-04-03 18:49                 ` Jeff Layton
2017-04-03 19:16         ` Matthew Wilcox [this message]
2017-04-03 20:16           ` Jeff Layton
2017-04-04  2:45             ` Matthew Wilcox
2017-04-04  3:03             ` NeilBrown
2017-04-04 11:41               ` Jeff Layton
2017-04-04 22:41                 ` NeilBrown
2017-04-04 11:53               ` Matthew Wilcox
2017-04-04 12:17                 ` Jeff Layton
2017-04-04 16:12                   ` Matthew Wilcox
2017-04-04 16:25                     ` Jeff Layton
2017-04-04 17:09                       ` Matthew Wilcox
2017-04-04 18:08                         ` Jeff Layton
2017-04-04 22:50                         ` NeilBrown
2017-04-05 19:49                         ` Jeff Layton
2017-04-05 21:03                           ` Matthew Wilcox
2017-04-06  0:19                             ` NeilBrown
2017-04-06  0:02                           ` NeilBrown
2017-04-06  2:55                             ` Matthew Wilcox
2017-04-06  5:12                               ` NeilBrown
2017-04-06 13:31                                 ` Matthew Wilcox
2017-04-06 21:53                                   ` NeilBrown
2017-04-06 14:02                             ` Jeff Layton
2017-04-06 19:14                             ` Jeff Layton
2017-04-06 20:05                               ` Matthew Wilcox
2017-04-07 13:12                                 ` Jeff Layton
2017-04-09 23:15                                   ` NeilBrown
2017-04-10 13:19                                     ` Jeff Layton
2017-04-06 22:15                               ` NeilBrown
2017-04-04 23:13                       ` NeilBrown
2017-04-05 11:14                         ` Jeff Layton
2017-04-06  0:24                           ` NeilBrown
2017-04-04 13:38                 ` Theodore Ts'o
2017-04-04 22:28                 ` NeilBrown
2017-04-03 14:51   ` Matthew Wilcox

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170403191602.GF30811@bombadil.infradead.org \
    --to=willy@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=jack@suse.cz \
    --cc=jlayton@redhat.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).