From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
Date: Mon, 18 Jun 2018 09:24:20 -0400 [thread overview]
Message-ID: <20180618132419.GC28320@bfoster> (raw)
In-Reply-To: <20180618055711.23391-3-david@fromorbit.com>
On Mon, Jun 18, 2018 at 03:57:11PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When a corrupt inode is detected during xfs_iflush_cluster, we can
> get a shutdown ASSERT failure like this:
>
> XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
> XFS (pmem1): Unmount and run xfs_repair
> XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c. Return address = ffffffff814f4116
> XFS (pmem1): Corruption of in-memory data detected. Shutting down filesystem
> XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c. Return address = ffffffff814a8a88
> XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c. Return address = ffffffff814a8ef9
> XFS (pmem1): Please umount the filesystem and rectify the problem(s)
> XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
> .....
> Call Trace:
> xfs_iflush_abort+0x10a/0x110
> xfs_iflush+0xf3/0x390
> xfs_inode_item_push+0x126/0x1e0
> xfsaild+0x2c5/0x890
> kthread+0x11c/0x140
> ret_from_fork+0x24/0x30
>
> Essentially, xfs_iflush_abort() has been called twice on the
> original inode that that was flushed. This happens because the
> inode has been flushed to teh buffer successfully via
> xfs_iflush_int(), and so when another inode is detected as corrupt
> in xfs_iflush_cluster, the buffer is marked stale and EIO, and
> iodone callbacks are run on it.
>
> Running the iodone callbacks walks across the original inode and
> calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
> to xfs_iflush(), it runs the error path for that function, and that
> calls xfs_iflush_abort() on the inode a second time, leading to the
> above assert failure as the inode is not flush locked anymore.
>
> This bug has been there a long time.
>
> The simple fix would be to just avoid calling xfs_iflush_abort() in
> xfs_iflush() if we've got a failure from xfs_iflush_cluster().
> However, xfs_iflush_cluster() has magic delwri buffer handling that
> means it may or may not have run IO completion on the buffer, and
> hence sometimes we have to call xfs_iflush_abort() from
> xfs_iflush(), and sometimes we shouldn't.
>
> After reading through all the error paths and the delwri buffer
> code, it's clear that the error handling in xfs_iflush_cluster() is
> unnecessary. If the buffer is delwri, it leaves it on the delwri
> list so that when the delwri list is submitted it sees a shutdown
> fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
> and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
> when it's not a delwri buffer. Further, marking a buffer stale
> clears the _XBF_DELWRI_Q flag on the buffer, which means when
> submission of the buffer occurs, it just skips over it and releases
> it.
>
> IOWs, the error handling in xfs_iflush_cluster doesn't need to care
> if the buffer is already on a the delwri queue or not - it just
> needs to mark the buffer stale, EIO and run completions. That means
> we can just use the easy fix for xfs_iflush() to avoid the double
> abort.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
Looks good to me:
Reviewed-by: Brian Foster <bfoster@redhat.com>
> fs/xfs/xfs_inode.c | 57 +++++++++++++++++-----------------------------
> 1 file changed, 21 insertions(+), 36 deletions(-)
>
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 4a2e5e13c569..4a9b90cfb8c3 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3236,7 +3236,6 @@ xfs_iflush_cluster(
> struct xfs_inode *cip;
> int nr_found;
> int clcount = 0;
> - int bufwasdelwri;
> int i;
>
> pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
> @@ -3360,37 +3359,22 @@ xfs_iflush_cluster(
> * inode buffer and shut down the filesystem.
> */
> rcu_read_unlock();
> - /*
> - * Clean up the buffer. If it was delwri, just release it --
> - * brelse can handle it with no problems. If not, shut down the
> - * filesystem before releasing the buffer.
> - */
> - bufwasdelwri = (bp->b_flags & _XBF_DELWRI_Q);
> - if (bufwasdelwri)
> - xfs_buf_relse(bp);
> -
> xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>
> - if (!bufwasdelwri) {
> - /*
> - * Just like incore_relse: if we have b_iodone functions,
> - * mark the buffer as an error and call them. Otherwise
> - * mark it as stale and brelse.
> - */
> - if (bp->b_iodone) {
> - bp->b_flags &= ~XBF_DONE;
> - xfs_buf_stale(bp);
> - xfs_buf_ioerror(bp, -EIO);
> - xfs_buf_ioend(bp);
> - } else {
> - xfs_buf_stale(bp);
> - xfs_buf_relse(bp);
> - }
> - }
> -
> /*
> - * Unlocks the flush lock
> + * We'll always have an inode attached to the buffer for completion
> + * process by the time we are called from xfs_iflush(). Hence we have
> + * always need to do IO completion processing to abort the inodes
> + * attached to the buffer. handle them just like the shutdown case in
> + * xfs_buf_submit().
> */
> + ASSERT(bp->b_iodone);
> + bp->b_flags &= ~XBF_DONE;
> + xfs_buf_stale(bp);
> + xfs_buf_ioerror(bp, -EIO);
> + xfs_buf_ioend(bp);
> +
> + /* abort the corrupt inode, as it was not attached to the buffer */
> xfs_iflush_abort(cip, false);
> kmem_free(cilist);
> xfs_perag_put(pag);
> @@ -3486,12 +3470,17 @@ xfs_iflush(
> xfs_log_force(mp, 0);
>
> /*
> - * inode clustering:
> - * see if other inodes can be gathered into this write
> + * inode clustering: try to gather other inodes into this write
> + *
> + * Note: Any error during clustering will result in the filesystem
> + * being shut down and completion callbacks run on the cluster buffer.
> + * As we have already flushed and attached this inode to the buffer,
> + * it has already been aborted and released by xfs_iflush_cluster() and
> + * so we have no further error handling to do here.
> */
> error = xfs_iflush_cluster(ip, bp);
> if (error)
> - goto cluster_corrupt_out;
> + return error;
>
> *bpp = bp;
> return 0;
> @@ -3500,12 +3489,8 @@ xfs_iflush(
> if (bp)
> xfs_buf_relse(bp);
> xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -cluster_corrupt_out:
> - error = -EFSCORRUPTED;
> abort_out:
> - /*
> - * Unlocks the flush lock
> - */
> + /* abort the corrupt inode, as it was not attached to the buffer */
> xfs_iflush_abort(ip, false);
> return error;
> }
> --
> 2.17.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2018-06-18 13:24 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-06-18 5:57 [PATCH 0/2] xfs: symlink and inode writeback issues Dave Chinner
2018-06-18 5:57 ` [PATCH 1/2] xfs: zero length symlinks are not valid Dave Chinner
2018-06-18 13:24 ` Brian Foster
2018-06-18 22:42 ` Dave Chinner
2018-06-19 11:54 ` Brian Foster
2018-06-19 15:48 ` Darrick J. Wong
2018-06-19 16:28 ` Brian Foster
2018-06-19 23:22 ` Dave Chinner
2018-06-20 11:50 ` Brian Foster
2018-06-20 22:59 ` Dave Chinner
2018-06-21 11:46 ` Brian Foster
2018-06-21 22:31 ` Dave Chinner
2018-06-21 22:53 ` Darrick J. Wong
2018-06-22 10:44 ` Brian Foster
2018-06-23 17:38 ` Darrick J. Wong
2018-06-18 5:57 ` [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure Dave Chinner
2018-06-18 13:24 ` Brian Foster [this message]
2018-06-19 5:05 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180618132419.GC28320@bfoster \
--to=bfoster@redhat.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).