Re: [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Brian Foster <bfoster@redhat.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure
Date: Mon, 18 Jun 2018 09:24:20 -0400	[thread overview]
Message-ID: <20180618132419.GC28320@bfoster> (raw)
In-Reply-To: <20180618055711.23391-3-david@fromorbit.com>

On Mon, Jun 18, 2018 at 03:57:11PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When a corrupt inode is detected during xfs_iflush_cluster, we can
> get a shutdown ASSERT failure like this:
> 
> XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
> XFS (pmem1): Unmount and run xfs_repair
> XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
> XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
> XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
> XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
> XFS (pmem1): Please umount the filesystem and rectify the problem(s)
> XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
> .....
> Call Trace:
>  xfs_iflush_abort+0x10a/0x110
>  xfs_iflush+0xf3/0x390
>  xfs_inode_item_push+0x126/0x1e0
>  xfsaild+0x2c5/0x890
>  kthread+0x11c/0x140
>  ret_from_fork+0x24/0x30
> 
> Essentially, xfs_iflush_abort() has been called twice on the
> original inode that that was flushed. This happens because the
> inode has been flushed to teh buffer successfully via
> xfs_iflush_int(), and so when another inode is detected as corrupt
> in xfs_iflush_cluster, the buffer is marked stale and EIO, and
> iodone callbacks are run on it.
> 
> Running the iodone callbacks walks across the original inode and
> calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
> to xfs_iflush(), it runs the error path for that function, and that
> calls xfs_iflush_abort() on the inode a second time, leading to the
> above assert failure as the inode is not flush locked anymore.
> 
> This bug has been there a long time.
> 
> The simple fix would be to just avoid calling xfs_iflush_abort() in
> xfs_iflush() if we've got a failure from xfs_iflush_cluster().
> However, xfs_iflush_cluster() has magic delwri buffer handling that
> means it may or may not have run IO completion on the buffer, and
> hence sometimes we have to call xfs_iflush_abort() from
> xfs_iflush(), and sometimes we shouldn't.
> 
> After reading through all the error paths and the delwri buffer
> code, it's clear that the error handling in xfs_iflush_cluster() is
> unnecessary. If the buffer is delwri, it leaves it on the delwri
> list so that when the delwri list is submitted it sees a shutdown
> fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
> and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
> when it's not a delwri buffer. Further, marking a buffer stale
> clears the _XBF_DELWRI_Q flag on the buffer, which means when
> submission of the buffer occurs, it just skips over it and releases
> it.
> 
> IOWs, the error handling in xfs_iflush_cluster doesn't need to care
> if the buffer is already on a the delwri queue or not - it just
> needs to mark the buffer stale, EIO and run completions. That means
> we can just use the easy fix for xfs_iflush() to avoid the double
> abort.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---

Looks good to me:

Reviewed-by: Brian Foster <bfoster@redhat.com>

>  fs/xfs/xfs_inode.c | 57 +++++++++++++++++-----------------------------
>  1 file changed, 21 insertions(+), 36 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 4a2e5e13c569..4a9b90cfb8c3 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -3236,7 +3236,6 @@ xfs_iflush_cluster(
>  	struct xfs_inode	*cip;
>  	int			nr_found;
>  	int			clcount = 0;
> -	int			bufwasdelwri;
>  	int			i;
>  
>  	pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino));
> @@ -3360,37 +3359,22 @@ xfs_iflush_cluster(
>  	 * inode buffer and shut down the filesystem.
>  	 */
>  	rcu_read_unlock();
> -	/*
> -	 * Clean up the buffer.  If it was delwri, just release it --
> -	 * brelse can handle it with no problems.  If not, shut down the
> -	 * filesystem before releasing the buffer.
> -	 */
> -	bufwasdelwri = (bp->b_flags & _XBF_DELWRI_Q);
> -	if (bufwasdelwri)
> -		xfs_buf_relse(bp);
> -
>  	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
>  
> -	if (!bufwasdelwri) {
> -		/*
> -		 * Just like incore_relse: if we have b_iodone functions,
> -		 * mark the buffer as an error and call them.  Otherwise
> -		 * mark it as stale and brelse.
> -		 */
> -		if (bp->b_iodone) {
> -			bp->b_flags &= ~XBF_DONE;
> -			xfs_buf_stale(bp);
> -			xfs_buf_ioerror(bp, -EIO);
> -			xfs_buf_ioend(bp);
> -		} else {
> -			xfs_buf_stale(bp);
> -			xfs_buf_relse(bp);
> -		}
> -	}
> -
>  	/*
> -	 * Unlocks the flush lock
> +	 * We'll always have an inode attached to the buffer for completion
> +	 * process by the time we are called from xfs_iflush(). Hence we have
> +	 * always need to do IO completion processing to abort the inodes
> +	 * attached to the buffer.  handle them just like the shutdown case in
> +	 * xfs_buf_submit().
>  	 */
> +	ASSERT(bp->b_iodone);
> +	bp->b_flags &= ~XBF_DONE;
> +	xfs_buf_stale(bp);
> +	xfs_buf_ioerror(bp, -EIO);
> +	xfs_buf_ioend(bp);
> +
> +	/* abort the corrupt inode, as it was not attached to the buffer */
>  	xfs_iflush_abort(cip, false);
>  	kmem_free(cilist);
>  	xfs_perag_put(pag);
> @@ -3486,12 +3470,17 @@ xfs_iflush(
>  		xfs_log_force(mp, 0);
>  
>  	/*
> -	 * inode clustering:
> -	 * see if other inodes can be gathered into this write
> +	 * inode clustering: try to gather other inodes into this write
> +	 *
> +	 * Note: Any error during clustering will result in the filesystem
> +	 * being shut down and completion callbacks run on the cluster buffer.
> +	 * As we have already flushed and attached this inode to the buffer,
> +	 * it has already been aborted and released by xfs_iflush_cluster() and
> +	 * so we have no further error handling to do here.
>  	 */
>  	error = xfs_iflush_cluster(ip, bp);
>  	if (error)
> -		goto cluster_corrupt_out;
> +		return error;
>  
>  	*bpp = bp;
>  	return 0;
> @@ -3500,12 +3489,8 @@ xfs_iflush(
>  	if (bp)
>  		xfs_buf_relse(bp);
>  	xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
> -cluster_corrupt_out:
> -	error = -EFSCORRUPTED;
>  abort_out:
> -	/*
> -	 * Unlocks the flush lock
> -	 */
> +	/* abort the corrupt inode, as it was not attached to the buffer */
>  	xfs_iflush_abort(ip, false);
>  	return error;
>  }
> -- 
> 2.17.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

next prev parent reply	other threads:[~2018-06-18 13:24 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-18  5:57 [PATCH 0/2] xfs: symlink and inode writeback issues Dave Chinner
2018-06-18  5:57 ` [PATCH 1/2] xfs: zero length symlinks are not valid Dave Chinner
2018-06-18 13:24   ` Brian Foster
2018-06-18 22:42     ` Dave Chinner
2018-06-19 11:54       ` Brian Foster
2018-06-19 15:48         ` Darrick J. Wong
2018-06-19 16:28           ` Brian Foster
2018-06-19 23:22             ` Dave Chinner
2018-06-20 11:50               ` Brian Foster
2018-06-20 22:59                 ` Dave Chinner
2018-06-21 11:46                   ` Brian Foster
2018-06-21 22:31                     ` Dave Chinner
2018-06-21 22:53                       ` Darrick J. Wong
2018-06-22 10:44                         ` Brian Foster
2018-06-23 17:38                           ` Darrick J. Wong
2018-06-18  5:57 ` [PATCH 2/2] xfs: xfs_iflush_abort() can be called twice on cluster writeback failure Dave Chinner
2018-06-18 13:24   ` Brian Foster [this message]
2018-06-19  5:05   ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180618132419.GC28320@bfoster \
    --to=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.