From: Jeff Layton <jlayton@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@ZenIV.linux.org.uk>, Jan Kara <jack@suse.cz>,
tytso@mit.edu, axboe@kernel.dk, mawilcox@microsoft.com,
ross.zwisler@linux.intel.com, corbet@lwn.net,
Chris Mason <clm@fb.com>, Josef Bacik <jbacik@fb.com>,
David Sterba <dsterba@suse.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>
Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org
Subject: [PATCH v6 00/20] fs: enhanced writeback error reporting with errseq_t (pile #1)
Date: Mon, 12 Jun 2017 08:22:52 -0400 [thread overview]
Message-ID: <20170612122316.13244-1-jlayton@redhat.com> (raw)
v6:
===
This is the sixth posting of the patchset to revamp the way writeback
errors are tracked and reported.
This is a smaller set than the last one. The main difference from the
last set is that this one just adds errseq_t based error reporting for
the purposes of fsync, while leaving the internal callers of filemap_*
functions and the like largely untouched.
Some of these patches have been posted separately, but I'm re-posting
them here to make it clear that they're prerequisites to the later
patches in the series.
Background:
===========
The basic problem is that we have (for a very long time) tracked and
reported writeback errors based on two flags in the address_space:
AS_EIO and AS_ENOSPC. Those flags are cleared when they are checked,
so only the first caller to check them is able to consume them.
That model is quite unreliable though, for several related reasons:
* only the first fsync caller on the inode will see the error. In a
world of containerized setups, that's no longer viable. Applications
need to know that their writes are safely stored, and they can
currently miss seeing errors that they should be aware of when
they're not.
* there are a lot of internal callers to filemap_fdatawait* and
filemap_write_and_wait* that clear the error flags but then never
report them to userland in any fashion.
* Some internal callers report writeback errors, but can do so at
non-sensical times. For instance, we might want to truncate a file,
which triggers a pagecache flush. If that writeback fails, we might
report that error to the truncate caller, but a subsequent fsync
will likely not see it.
* Some internal callers try to reset the error flags after clearing
them, but that's racy. Another task could check the flags between
those two events.
Solution:
=========
This patchset adds a new datatype called an errseq_t that represents a
sequence of errors. It's a u32, with a field for a POSIX-flavor error
and a counter, managed with atomics. We can sample that value at a
particular point in time, and can later tell whether there have been any
errors since that point.
That allows us to provide traditional check-and-clear fsync semantics
on every open file description in a lightweight fashion. fsync callers
no longer need to coordinate between one another in order to ensure
that errors at fsync time are handled correctly.
Strategy:
=========
The aim with this set is to do the minimum possible to support for
reliable reporting of errors on fsync, without substantially changing
the internals of the filesystems themselves.
Most of the internal calls to filemap_fdatawait are left alone, so all
of the internal error error checking is done using the traditional flag
based checks. The only real difference here is more reliable reporting
of errors at fsync.
I think that we probably will want to eventually convert all of the
internal callers to use errseq_t based reporting too, but that can be
done in an incremental fashion in follow-on patchsets.
Testing:
========
I've primarily been testing this with a couple of new xfstests that I
will post separately. These tests use dm-error fault injection to flip
the underlying block device to start throwing I/O errors, and then test
the behavior of the filesystem layer on top of that.
Jeff Layton (20):
mm: fix mapping_set_error call in me_pagecache_dirty
buffer: use mapping_set_error instead of setting the flag
fs: check for writeback errors after syncing out buffers in
generic_file_fsync
buffer: set errors in mapping at the time that the error occurs
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: drop "wait" parameter from write_one_page
mm: clean up error handling in write_one_page
lib: add errseq_t type and infrastructure for handling it
fs: new infrastructure for writeback error handling and reporting
mm: tracepoints for writeback error events
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: add a new fstype flag to indicate how writeback errors are tracked
Documentation: flesh out the section in vfs.txt on storing and
reporting writeback errors
dax: set errors in mapping when writeback fails
fs: have call_fsync call filemap_report_wb_err if FS_WB_ERRSEQ is set
block: convert to errseq_t based writeback error tracking
fs: add f_md_wb_err field to struct file for tracking metadata errors
ext4: use errseq_t based error handling for reporting data writeback
errors
xfs: minimal conversion to errseq_t writeback error reporting
btrfs: minimal conversion to errseq_t writeback error reporting on
fsync
Documentation/filesystems/vfs.txt | 48 ++++++++-
drivers/dax/device.c | 1 +
fs/block_dev.c | 2 +
fs/btrfs/super.c | 2 +-
fs/buffer.c | 20 ++--
fs/dax.c | 18 +++-
fs/exofs/dir.c | 2 +-
fs/ext2/dir.c | 2 +-
fs/ext2/file.c | 2 +-
fs/ext4/dir.c | 8 +-
fs/ext4/file.c | 5 +-
fs/ext4/fsync.c | 23 ++++-
fs/ext4/super.c | 6 +-
fs/file_table.c | 1 +
fs/gfs2/lops.c | 2 +-
fs/jfs/jfs_metapage.c | 4 +-
fs/libfs.c | 3 +-
fs/minix/dir.c | 2 +-
fs/open.c | 3 +
fs/sysv/dir.c | 2 +-
fs/ufs/dir.c | 2 +-
fs/xfs/xfs_super.c | 2 +-
include/linux/buffer_head.h | 1 +
include/linux/errseq.h | 19 ++++
include/linux/fs.h | 80 +++++++++++++--
include/linux/mm.h | 2 +-
include/linux/pagemap.h | 31 ++++--
include/trace/events/filemap.h | 52 ++++++++++
lib/Makefile | 2 +-
lib/errseq.c | 208 ++++++++++++++++++++++++++++++++++++++
mm/filemap.c | 91 ++++++++++++++---
mm/memory-failure.c | 2 +-
mm/page-writeback.c | 21 ++--
33 files changed, 595 insertions(+), 74 deletions(-)
create mode 100644 include/linux/errseq.h
create mode 100644 lib/errseq.c
--
2.13.0
next reply other threads:[~2017-06-12 12:22 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-12 12:22 Jeff Layton [this message]
2017-06-12 12:22 ` [PATCH v6 01/20] mm: fix mapping_set_error call in me_pagecache_dirty Jeff Layton
2017-06-12 12:22 ` [PATCH v6 02/20] buffer: use mapping_set_error instead of setting the flag Jeff Layton
2017-06-12 12:22 ` [PATCH v6 03/20] fs: check for writeback errors after syncing out buffers in generic_file_fsync Jeff Layton
2017-06-12 12:22 ` [PATCH v6 04/20] buffer: set errors in mapping at the time that the error occurs Jeff Layton
2017-06-12 12:22 ` [PATCH v6 05/20] mm: don't TestClearPageError in __filemap_fdatawait_range Jeff Layton
2017-06-12 12:22 ` [PATCH v6 06/20] mm: drop "wait" parameter from write_one_page Jeff Layton
2017-06-12 12:22 ` [PATCH v6 07/20] mm: clean up error handling in write_one_page Jeff Layton
2017-06-12 12:23 ` [PATCH v6 08/20] lib: add errseq_t type and infrastructure for handling it Jeff Layton
2017-06-12 12:23 ` [PATCH v6 09/20] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-06-12 12:23 ` [PATCH v6 10/13] ext4: add more robust reporting of metadata writeback errors Jeff Layton
2017-06-12 12:23 ` [PATCH v6 10/20] mm: tracepoints for writeback error events Jeff Layton
2017-06-12 12:23 ` [PATCH v6 11/13] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
2017-06-12 12:23 ` [PATCH v6 11/20] mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error Jeff Layton
2017-06-12 12:23 ` [PATCH v6 12/20] fs: add a new fstype flag to indicate how writeback errors are tracked Jeff Layton
2017-06-12 12:45 ` Christoph Hellwig
2017-06-13 10:24 ` Jeff Layton
2017-06-14 6:47 ` Christoph Hellwig
2017-06-14 17:24 ` Jeff Layton
2017-06-15 8:22 ` Christoph Hellwig
2017-06-15 10:42 ` Jeff Layton
2017-06-15 14:57 ` Christoph Hellwig
2017-06-15 15:03 ` Jeff Layton
2017-06-12 12:23 ` [PATCH v6 12/13] xfs: minimal conversion to errseq_t writeback error reporting Jeff Layton
2017-06-12 12:23 ` [PATCH v6 13/13] btrfs: minimal conversion to errseq_t writeback error reporting on fsync Jeff Layton
2017-06-12 12:23 ` [PATCH v6 13/20] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
2017-06-12 12:23 ` [PATCH v6 14/20] dax: set errors in mapping when writeback fails Jeff Layton
2017-06-12 12:44 ` Christoph Hellwig
2017-06-12 12:23 ` [PATCH v6 15/20] fs: have call_fsync call filemap_report_wb_err if FS_WB_ERRSEQ is set Jeff Layton
2017-06-12 12:42 ` Christoph Hellwig
2017-06-12 12:23 ` [PATCH v6 16/20] block: convert to errseq_t based writeback error tracking Jeff Layton
2017-06-12 12:23 ` [PATCH v6 17/20] fs: add f_md_wb_err field to struct file for tracking metadata errors Jeff Layton
2017-06-12 12:23 ` [PATCH v6 18/20] ext4: use errseq_t based error handling for reporting data writeback errors Jeff Layton
2017-06-12 12:23 ` [PATCH v6 19/20] xfs: minimal conversion to errseq_t writeback error reporting Jeff Layton
2017-06-13 4:30 ` Darrick J. Wong
2017-06-13 10:27 ` Jeff Layton
2017-06-12 12:23 ` [PATCH v6 20/20] btrfs: minimal conversion to errseq_t writeback error reporting on fsync Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170612122316.13244-1-jlayton@redhat.com \
--to=jlayton@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=corbet@lwn.net \
--cc=darrick.wong@oracle.com \
--cc=dsterba@suse.com \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mawilcox@microsoft.com \
--cc=ross.zwisler@linux.intel.com \
--cc=tytso@mit.edu \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).