From: Jeff Layton <jlayton@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@ZenIV.linux.org.uk>, Jan Kara <jack@suse.cz>,
tytso@mit.edu, axboe@kernel.dk, mawilcox@microsoft.com,
ross.zwisler@linux.intel.com, corbet@lwn.net,
Chris Mason <clm@fb.com>, Josef Bacik <jbacik@fb.com>,
David Sterba <dsterba@suse.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>,
Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Carlos Maiolino <cmaiolino@redhat.com>,
Eryu Guan <eguan@redhat.com>, David Howells <dhowells@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH v7 00/22] fs: enhanced writeback error reporting with errseq_t (pile #1)
Date: Mon, 19 Jun 2017 12:23:46 -0400 [thread overview]
Message-ID: <1497889426.4654.7.camel@redhat.com> (raw)
In-Reply-To: <20170616193427.13955-1-jlayton@redhat.com>
On Fri, 2017-06-16 at 15:34 -0400, Jeff Layton wrote:
> v7:
> ===
> This is the seventh posting of the patchset to revamp the way writeback
> errors are tracked and reported.
>
> The main difference from the v6 posting is the removal of the
> FS_WB_ERRSEQ flag. That requires a few other incremental patches in the
> writeback code to ensure that both error tracking models are handled
> in a suitable way.
>
> Also, a bit more cleanup of the metadata writeback codepaths, and some
> documentation updates.
>
> Some of these patches have been posted separately, but I'm re-posting
> them here to make it clear that they're prerequisites of the later
> patches in the series.
>
> This series is based on top of linux-next from a day or so ago. I'd like
> to have this picked up by linux-next in the near future so we can get
> some more extensive testing with it. Should I just plan to maintain a
> topic branch on top of -next and ask Stephen to pick it up?
>
> Background:
> ===========
> The basic problem is that we have (for a very long time) tracked and
> reported writeback errors based on two flags in the address_space:
> AS_EIO and AS_ENOSPC. Those flags are cleared when they are checked,
> so only the first caller to check them is able to consume them.
>
> That model is quite unreliable, for several related reasons:
>
> * only the first fsync caller on the inode will see the error. In a
> world of containerized setups, that's no longer viable. Applications
> need to know that their writes are safely stored, and they can
> currently miss seeing errors that they should be aware of when
> they're not.
>
> * there are a lot of internal callers to filemap_fdatawait* and
> filemap_write_and_wait* that clear these errors but then never report
> them to userland in any fashion.
>
> * Some internal callers report writeback errors, but can do so at
> non-sensical times. For instance, we might want to truncate a file,
> which triggers a pagecache flush. If that writeback fails, we might
> report that error to the truncate caller, but a subsequent fsync
> will likely not see it.
>
> * Some internal callers try to reset the error flags after clearing
> them, but that's racy. Another task could check the flags between
> those two events.
>
> Solution:
> =========
> This patchset adds a new datatype called an errseq_t that represents a
> sequence of errors. It's a u32, with a field for a POSIX-flavor error
> and a counter, managed with atomics. We can sample that value at a
> particular point in time, and can later tell whether there have been any
> errors since that point.
>
> That allows us to provide traditional check-and-clear fsync semantics
> on every open file description in a lightweight fashion. fsync callers
> no longer need to coordinate between one another in order to ensure
> that errors at fsync time are handled correctly.
>
> Strategy:
> =========
> The aim with this pile is to do the minimum possible to support for
> reliable reporting of errors on fsync, without substantially changing
> the internals of the filesystems themselves.
>
> Most of the internal calls to filemap_fdatawait are left alone, so all
> of the internal error checkers are using the same error handling they
> always have. The only real difference here is that we're better
> reporting errors at fsync.
>
> I think that we probably will want to eventually convert all of those
> internal callers to use errseq_t based reporting, but that can be done
> in an incremental fashion in follow-on patchsets.
>
> Testing:
> ========
> I've primarily been testing this with some new xfstests that I will post
> in a separate series. These tests use dm-error fault injection to make
> the underlying block device start throwing I/O errors, and then test the
> how the filesystem layer reports errors after that.
>
> Jeff Layton (22):
> fs: remove call_fsync helper function
> buffer: use mapping_set_error instead of setting the flag
> fs: check for writeback errors after syncing out buffers in
> generic_file_fsync
> buffer: set errors in mapping at the time that the error occurs
> jbd2: don't clear and reset errors after waiting on writeback
> mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
> mm: don't TestClearPageError in __filemap_fdatawait_range
> mm: clean up error handling in write_one_page
> fs: always sync metadata in __generic_file_fsync
> lib: add errseq_t type and infrastructure for handling it
> fs: new infrastructure for writeback error handling and reporting
> mm: tracepoints for writeback error events
> mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
> Documentation: flesh out the section in vfs.txt on storing and
> reporting writeback errors
> dax: set errors in mapping when writeback fails
> block: convert to errseq_t based writeback error tracking
> ext4: use errseq_t based error handling for reporting data writeback
> errors
> fs: add f_md_wb_err field to struct file for tracking metadata errors
> ext4: add more robust reporting of metadata writeback errors
> ext2: convert to errseq_t based writeback error tracking
> xfs: minimal conversion to errseq_t writeback error reporting
> btrfs: minimal conversion to errseq_t writeback error reporting on
> fsync
>
> Documentation/filesystems/vfs.txt | 43 +++++++-
> drivers/dax/device.c | 1 +
> fs/block_dev.c | 9 +-
> fs/btrfs/file.c | 7 +-
> fs/buffer.c | 20 ++--
> fs/dax.c | 4 +-
> fs/ext2/dir.c | 8 ++
> fs/ext2/file.c | 26 ++++-
> fs/ext4/dir.c | 8 +-
> fs/ext4/file.c | 5 +-
> fs/ext4/fsync.c | 28 ++++-
> fs/file_table.c | 1 +
> fs/gfs2/lops.c | 2 +-
> fs/jbd2/commit.c | 15 +--
> fs/libfs.c | 12 +--
> fs/open.c | 3 +
> fs/sync.c | 2 +-
> fs/xfs/xfs_file.c | 15 ++-
> include/linux/buffer_head.h | 1 +
> include/linux/errseq.h | 19 ++++
> include/linux/fs.h | 67 ++++++++++--
> include/linux/pagemap.h | 31 ++++--
> include/trace/events/filemap.h | 52 ++++++++++
> ipc/shm.c | 2 +-
> lib/Makefile | 2 +-
> lib/errseq.c | 208 ++++++++++++++++++++++++++++++++++++++
> mm/filemap.c | 113 +++++++++++++++++----
> mm/page-writeback.c | 15 ++-
> 28 files changed, 628 insertions(+), 91 deletions(-)
> create mode 100644 include/linux/errseq.h
> create mode 100644 lib/errseq.c
>
If there are no major objections to this set, I'd like to have
linux-next start picking it up to get some wider testing. What's the
right vehicle for this, given that it touches stuff all over the tree?
I can see 3 potential options:
1) I could just pull these into the branch that Stephen is already
picking up for file-locks in my tree
2) I could put them into a new branch, and have Stephen pull that one in
addition to the file-locks branch
3) It could go in via someone else's tree entirely (Andrew or Al's
maybe?)
I'm fine with any of these. Anyone have thoughts?
Thanks,
--
Jeff Layton <jlayton@redhat.com>
WARNING: multiple messages have this Message-ID (diff)
From: Jeff Layton <jlayton@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>,
Al Viro <viro@ZenIV.linux.org.uk>, Jan Kara <jack@suse.cz>,
tytso@mit.edu, axboe@kernel.dk, mawilcox@microsoft.com,
ross.zwisler@linux.intel.com, corbet@lwn.net,
Chris Mason <clm@fb.com>, Josef Bacik <jbacik@fb.com>,
David Sterba <dsterba@suse.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>,
Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Carlos Maiolino <cmaiolino@redhat.com>,
Eryu Guan <eguan@redhat.com>, David Howells <dhowells@redhat.com>,
Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
linux-btrfs@vger.kernel.org, linux-block@vger.kernel.org
Subject: Re: [PATCH v7 00/22] fs: enhanced writeback error reporting with errseq_t (pile #1)
Date: Mon, 19 Jun 2017 12:23:46 -0400 [thread overview]
Message-ID: <1497889426.4654.7.camel@redhat.com> (raw)
In-Reply-To: <20170616193427.13955-1-jlayton@redhat.com>
On Fri, 2017-06-16 at 15:34 -0400, Jeff Layton wrote:
> v7:
> ===
> This is the seventh posting of the patchset to revamp the way writeback
> errors are tracked and reported.
>
> The main difference from the v6 posting is the removal of the
> FS_WB_ERRSEQ flag. That requires a few other incremental patches in the
> writeback code to ensure that both error tracking models are handled
> in a suitable way.
>
> Also, a bit more cleanup of the metadata writeback codepaths, and some
> documentation updates.
>
> Some of these patches have been posted separately, but I'm re-posting
> them here to make it clear that they're prerequisites of the later
> patches in the series.
>
> This series is based on top of linux-next from a day or so ago. I'd like
> to have this picked up by linux-next in the near future so we can get
> some more extensive testing with it. Should I just plan to maintain a
> topic branch on top of -next and ask Stephen to pick it up?
>
> Background:
> ===========
> The basic problem is that we have (for a very long time) tracked and
> reported writeback errors based on two flags in the address_space:
> AS_EIO and AS_ENOSPC. Those flags are cleared when they are checked,
> so only the first caller to check them is able to consume them.
>
> That model is quite unreliable, for several related reasons:
>
> * only the first fsync caller on the inode will see the error. In a
> world of containerized setups, that's no longer viable. Applications
> need to know that their writes are safely stored, and they can
> currently miss seeing errors that they should be aware of when
> they're not.
>
> * there are a lot of internal callers to filemap_fdatawait* and
> filemap_write_and_wait* that clear these errors but then never report
> them to userland in any fashion.
>
> * Some internal callers report writeback errors, but can do so at
> non-sensical times. For instance, we might want to truncate a file,
> which triggers a pagecache flush. If that writeback fails, we might
> report that error to the truncate caller, but a subsequent fsync
> will likely not see it.
>
> * Some internal callers try to reset the error flags after clearing
> them, but that's racy. Another task could check the flags between
> those two events.
>
> Solution:
> =========
> This patchset adds a new datatype called an errseq_t that represents a
> sequence of errors. It's a u32, with a field for a POSIX-flavor error
> and a counter, managed with atomics. We can sample that value at a
> particular point in time, and can later tell whether there have been any
> errors since that point.
>
> That allows us to provide traditional check-and-clear fsync semantics
> on every open file description in a lightweight fashion. fsync callers
> no longer need to coordinate between one another in order to ensure
> that errors at fsync time are handled correctly.
>
> Strategy:
> =========
> The aim with this pile is to do the minimum possible to support for
> reliable reporting of errors on fsync, without substantially changing
> the internals of the filesystems themselves.
>
> Most of the internal calls to filemap_fdatawait are left alone, so all
> of the internal error checkers are using the same error handling they
> always have. The only real difference here is that we're better
> reporting errors at fsync.
>
> I think that we probably will want to eventually convert all of those
> internal callers to use errseq_t based reporting, but that can be done
> in an incremental fashion in follow-on patchsets.
>
> Testing:
> ========
> I've primarily been testing this with some new xfstests that I will post
> in a separate series. These tests use dm-error fault injection to make
> the underlying block device start throwing I/O errors, and then test the
> how the filesystem layer reports errors after that.
>
> Jeff Layton (22):
> fs: remove call_fsync helper function
> buffer: use mapping_set_error instead of setting the flag
> fs: check for writeback errors after syncing out buffers in
> generic_file_fsync
> buffer: set errors in mapping at the time that the error occurs
> jbd2: don't clear and reset errors after waiting on writeback
> mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
> mm: don't TestClearPageError in __filemap_fdatawait_range
> mm: clean up error handling in write_one_page
> fs: always sync metadata in __generic_file_fsync
> lib: add errseq_t type and infrastructure for handling it
> fs: new infrastructure for writeback error handling and reporting
> mm: tracepoints for writeback error events
> mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
> Documentation: flesh out the section in vfs.txt on storing and
> reporting writeback errors
> dax: set errors in mapping when writeback fails
> block: convert to errseq_t based writeback error tracking
> ext4: use errseq_t based error handling for reporting data writeback
> errors
> fs: add f_md_wb_err field to struct file for tracking metadata errors
> ext4: add more robust reporting of metadata writeback errors
> ext2: convert to errseq_t based writeback error tracking
> xfs: minimal conversion to errseq_t writeback error reporting
> btrfs: minimal conversion to errseq_t writeback error reporting on
> fsync
>
> Documentation/filesystems/vfs.txt | 43 +++++++-
> drivers/dax/device.c | 1 +
> fs/block_dev.c | 9 +-
> fs/btrfs/file.c | 7 +-
> fs/buffer.c | 20 ++--
> fs/dax.c | 4 +-
> fs/ext2/dir.c | 8 ++
> fs/ext2/file.c | 26 ++++-
> fs/ext4/dir.c | 8 +-
> fs/ext4/file.c | 5 +-
> fs/ext4/fsync.c | 28 ++++-
> fs/file_table.c | 1 +
> fs/gfs2/lops.c | 2 +-
> fs/jbd2/commit.c | 15 +--
> fs/libfs.c | 12 +--
> fs/open.c | 3 +
> fs/sync.c | 2 +-
> fs/xfs/xfs_file.c | 15 ++-
> include/linux/buffer_head.h | 1 +
> include/linux/errseq.h | 19 ++++
> include/linux/fs.h | 67 ++++++++++--
> include/linux/pagemap.h | 31 ++++--
> include/trace/events/filemap.h | 52 ++++++++++
> ipc/shm.c | 2 +-
> lib/Makefile | 2 +-
> lib/errseq.c | 208 ++++++++++++++++++++++++++++++++++++++
> mm/filemap.c | 113 +++++++++++++++++----
> mm/page-writeback.c | 15 ++-
> 28 files changed, 628 insertions(+), 91 deletions(-)
> create mode 100644 include/linux/errseq.h
> create mode 100644 lib/errseq.c
>
If there are no major objections to this set, I'd like to have
linux-next start picking it up to get some wider testing. What's the
right vehicle for this, given that it touches stuff all over the tree?
I can see 3 potential options:
1) I could just pull these into the branch that Stephen is already
picking up for file-locks in my tree
2) I could put them into a new branch, and have Stephen pull that one in
addition to the file-locks branch
3) It could go in via someone else's tree entirely (Andrew or Al's
maybe?)
I'm fine with any of these. Anyone have thoughts?
Thanks,
--
Jeff Layton <jlayton@redhat.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-06-19 16:23 UTC|newest]
Thread overview: 92+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-06-16 19:34 [PATCH v7 00/22] fs: enhanced writeback error reporting with errseq_t (pile #1) Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 01/22] fs: remove call_fsync helper function Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-20 12:31 ` Christoph Hellwig
2017-06-20 12:31 ` Christoph Hellwig
2017-06-20 15:33 ` Jan Kara
2017-06-20 15:33 ` Jan Kara
2017-06-26 8:05 ` Carlos Maiolino
2017-06-26 8:05 ` Carlos Maiolino
2017-06-16 19:34 ` [PATCH v7 02/22] buffer: use mapping_set_error instead of setting the flag Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 03/22] fs: check for writeback errors after syncing out buffers in generic_file_fsync Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 04/22] buffer: set errors in mapping at the time that the error occurs Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-26 8:19 ` Carlos Maiolino
2017-06-26 8:19 ` Carlos Maiolino
2017-06-16 19:34 ` [PATCH v7 05/22] jbd2: don't clear and reset errors after waiting on writeback Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-20 15:32 ` Jan Kara
2017-06-20 15:32 ` Jan Kara
2017-06-26 8:23 ` Carlos Maiolino
2017-06-26 8:23 ` Carlos Maiolino
2017-06-16 19:34 ` [PATCH v7 06/22] mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 07/22] mm: don't TestClearPageError in __filemap_fdatawait_range Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 08/22] mm: clean up error handling in write_one_page Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 09/22] fs: always sync metadata in __generic_file_fsync Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 10/22] lib: add errseq_t type and infrastructure for handling it Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 11/22] fs: new infrastructure for writeback error handling and reporting Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-20 12:34 ` Christoph Hellwig
2017-06-20 12:34 ` Christoph Hellwig
2017-06-20 12:56 ` Jeff Layton
2017-06-20 12:56 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 12/22] mm: tracepoints for writeback error events Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 13/22] mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 14/22] Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 15/22] dax: set errors in mapping when writeback fails Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-17 12:39 ` Jeff Layton
2017-06-17 12:39 ` Jeff Layton
2017-06-19 17:48 ` Ross Zwisler
2017-06-19 17:48 ` Ross Zwisler
2017-06-16 19:34 ` [PATCH v7 16/22] block: convert to errseq_t based writeback error tracking Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-20 12:35 ` Christoph Hellwig
2017-06-20 12:35 ` Christoph Hellwig
2017-06-20 17:44 ` Jeff Layton
2017-06-20 17:44 ` Jeff Layton
2017-06-24 11:59 ` Christoph Hellwig
2017-06-24 11:59 ` Christoph Hellwig
2017-06-24 13:16 ` Jeff Layton
2017-06-24 13:16 ` Jeff Layton
2017-06-26 14:34 ` Jeff Layton
2017-06-26 14:34 ` Jeff Layton
2017-06-27 15:20 ` Christoph Hellwig
2017-06-27 15:20 ` Christoph Hellwig
2017-06-16 19:34 ` [PATCH v7 17/22] ext4: use errseq_t based error handling for reporting data writeback errors Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 18/22] fs: add f_md_wb_err field to struct file for tracking metadata errors Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 19/22] ext4: add more robust reporting of metadata writeback errors Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 20/22] ext2: convert to errseq_t based writeback error tracking Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-16 19:34 ` [PATCH v7 21/22] xfs: minimal conversion to errseq_t writeback error reporting Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-26 13:40 ` Carlos Maiolino
2017-06-26 13:40 ` Carlos Maiolino
2017-06-26 15:22 ` Darrick J. Wong
2017-06-26 15:22 ` Darrick J. Wong
2017-06-26 17:58 ` jlayton
2017-06-26 17:58 ` jlayton
2017-06-26 18:10 ` Darrick J. Wong
2017-06-26 18:10 ` Darrick J. Wong
2017-06-16 19:34 ` [PATCH v7 22/22] btrfs: minimal conversion to errseq_t writeback error reporting on fsync Jeff Layton
2017-06-16 19:34 ` Jeff Layton
2017-06-19 16:23 ` Jeff Layton [this message]
2017-06-19 16:23 ` [PATCH v7 00/22] fs: enhanced writeback error reporting with errseq_t (pile #1) Jeff Layton
2017-06-19 23:25 ` Stephen Rothwell
2017-06-19 23:25 ` Stephen Rothwell
2017-06-20 10:16 ` Jeff Layton
2017-06-20 10:16 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1497889426.4654.7.camel@redhat.com \
--to=jlayton@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=clm@fb.com \
--cc=cmaiolino@redhat.com \
--cc=corbet@lwn.net \
--cc=darrick.wong@oracle.com \
--cc=dhowells@redhat.com \
--cc=dsterba@suse.com \
--cc=eguan@redhat.com \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jbacik@fb.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-xfs@vger.kernel.org \
--cc=mawilcox@microsoft.com \
--cc=ross.zwisler@linux.intel.com \
--cc=sfr@canb.auug.org.au \
--cc=tytso@mit.edu \
--cc=viro@ZenIV.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.