All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes
Date: Mon, 1 Jun 2020 21:30:52 -0700	[thread overview]
Message-ID: <20200602043052.GZ8230@magnolia> (raw)
In-Reply-To: <20200601214251.4167140-2-david@fromorbit.com>

On Tue, Jun 02, 2020 at 07:42:22AM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In tracking down a problem in this patchset, I discovered we are
> reclaiming dirty stale inodes. This wasn't discovered until inodes
> were always attached to the cluster buffer and then the rcu callback
> that freed inodes was assert failing because the inode still had an
> active pointer to the cluster buffer after it had been reclaimed.
> 
> Debugging the issue indicated that this was a pre-existing issue
> resulting from the way the inodes are handled in xfs_inactive_ifree.
> When we free a cluster buffer from xfs_ifree_cluster, all the inodes
> in cache are marked XFS_ISTALE. Those that are clean have nothing
> else done to them and so eventually get cleaned up by background
> reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
> XFS_ISTALE.
> 
> On journal commit dirty stale inodes as are handled by both
> buffer and inode log items to run though xfs_istale_done() and
> removed from the AIL (buffer log item commit) or the log item will
> simply unpin it because the buffer log item will clean it. What happens
> to any specific inode is entirely dependent on which log item wins
> the commit race, but the result is the same - stale inodes are
> clean, not attached to the cluster buffer, and not in the AIL. Hence
> inode reclaim can just free these inodes without further care.
> 
> However, if the stale inode is relogged, it gets dirtied again and
> relogged into the CIL. Most of the time this isn't an issue, because
> relogging simply changes the inode's location in the current
> checkpoint. Problems arise, however, when the CIL checkpoints
> between two transactions in the xfs_inactive_ifree() deferops
> processing. This results in the XFS_ISTALE inode being redirtied
> and inserted into the CIL without any of the other stale cluster
> buffer infrastructure being in place.
> 
> Hence on journal commit, it simply gets unpinned, so it remains
> dirty in memory. Everything in inode writeback avoids XFS_ISTALE
> inodes so it can't be written back, and it is not tracked in the AIL
> so there's not even a trigger to attempt to clean the inode. Hence
> the inode just sits dirty in memory until inode reclaim comes along,
> sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
> of a dirty inode caused use after free, list corruptions and other
> nasty issues later in this patchset.
> 
> Hence this patch addresses a violation of the "never log XFS_ISTALE
> inodes" caused by the deferops processing rolling a transaction
> and relogging a stale inode in xfs_inactive_free. It also adds a
> bunch of asserts to catch this problem in debug kernels so that
> we don't reintroduce this problem in future.
> 
> Reproducer for this issue was generic/558 on a v4 filesystem.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/libxfs/xfs_trans_inode.c |  2 ++
>  fs/xfs/xfs_icache.c             |  3 ++-
>  fs/xfs/xfs_inode.c              | 25 ++++++++++++++++++++++---
>  3 files changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c
> index b5dfb66548422..4504d215cd590 100644
> --- a/fs/xfs/libxfs/xfs_trans_inode.c
> +++ b/fs/xfs/libxfs/xfs_trans_inode.c
> @@ -36,6 +36,7 @@ xfs_trans_ijoin(
>  
>  	ASSERT(iip->ili_lock_flags == 0);
>  	iip->ili_lock_flags = lock_flags;
> +	ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
>  
>  	/*
>  	 * Get a log_item_desc to point at the new item.
> @@ -89,6 +90,7 @@ xfs_trans_log_inode(
>  
>  	ASSERT(ip->i_itemp != NULL);
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
> +	ASSERT(!xfs_iflags_test(ip, XFS_ISTALE));
>  
>  	/*
>  	 * Don't bother with i_lock for the I_DIRTY_TIME check here, as races
> diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
> index 0a5ac6f9a5834..dbba4c1946386 100644
> --- a/fs/xfs/xfs_icache.c
> +++ b/fs/xfs/xfs_icache.c
> @@ -1141,7 +1141,7 @@ xfs_reclaim_inode(
>  			goto out_ifunlock;
>  		xfs_iunpin_wait(ip);
>  	}
> -	if (xfs_iflags_test(ip, XFS_ISTALE) || xfs_inode_clean(ip)) {
> +	if (xfs_inode_clean(ip)) {
>  		xfs_ifunlock(ip);
>  		goto reclaim;
>  	}
> @@ -1228,6 +1228,7 @@ xfs_reclaim_inode(
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_qm_dqdetach(ip);
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	ASSERT(xfs_inode_clean(ip));
>  
>  	__xfs_inode_free(ip);
>  	return error;
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 64f5f9a440aed..53a1d64782c35 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1740,10 +1740,31 @@ xfs_inactive_ifree(
>  		return error;
>  	}
>  
> +	/*
> +	 * We do not hold the inode locked across the entire rolling transaction
> +	 * here. We only need to hold it for the first transaction that
> +	 * xfs_ifree() builds, which may mark the inode XFS_ISTALE if the
> +	 * underlying cluster buffer is freed. Relogging an XFS_ISTALE inode
> +	 * here breaks the relationship between cluster buffer invalidation and
> +	 * stale inode invalidation on cluster buffer item journal commit
> +	 * completion, and can result in leaving dirty stale inodes hanging
> +	 * around in memory.
> +	 *
> +	 * We have no need for serialising this inode operation against other
> +	 * operations - we freed the inode and hence reallocation is required
> +	 * and that will serialise on reallocating the space the deferops need
> +	 * to free. Hence we can unlock the inode on the first commit of
> +	 * the transaction rather than roll it right through the deferops. This
> +	 * avoids relogging the XFS_ISTALE inode.

Hmm.  What defer ops causes a transaction roll?  Is it the EFI that
frees the inode cluster blocks?

> +	 *
> +	 * We check that xfs_ifree() hasn't grown an internal transaction roll
> +	 * by asserting that the inode is still locked when it returns.
> +	 */
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, ip, 0);
> +	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);

This looks right to me since we should be marking the inode free in the
first transaction and therefore should not keep it attached to the
transaction...

Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>

--D


>  	error = xfs_ifree(tp, ip);
> +	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>  	if (error) {
>  		/*
>  		 * If we fail to free the inode, shut down.  The cancel
> @@ -1756,7 +1777,6 @@ xfs_inactive_ifree(
>  			xfs_force_shutdown(mp, SHUTDOWN_META_IO_ERROR);
>  		}
>  		xfs_trans_cancel(tp);
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  		return error;
>  	}
>  
> @@ -1774,7 +1794,6 @@ xfs_inactive_ifree(
>  		xfs_notice(mp, "%s: xfs_trans_commit returned error %d",
>  			__func__, error);
>  
> -	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	return 0;
>  }
>  
> -- 
> 2.26.2.761.g0e0b3e54be
> 

  reply	other threads:[~2020-06-02  4:30 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-01 21:42 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-01 21:42 ` [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Dave Chinner
2020-06-02  4:30   ` Darrick J. Wong [this message]
2020-06-02  7:06     ` Dave Chinner
2020-06-02 16:32   ` Brian Foster
2020-06-01 21:42 ` [PATCH 02/30] xfs: remove logged flag from inode log item Dave Chinner
2020-06-02 16:32   ` Brian Foster
2020-06-01 21:42 ` [PATCH 03/30] xfs: add an inode item lock Dave Chinner
2020-06-02 16:34   ` Brian Foster
2020-06-04  1:54     ` Dave Chinner
2020-06-04 14:03       ` Brian Foster
2020-06-01 21:42 ` [PATCH 04/30] xfs: mark inode buffers in cache Dave Chinner
2020-06-02 16:45   ` Brian Foster
2020-06-02 19:22     ` Darrick J. Wong
2020-06-02 21:29     ` Dave Chinner
2020-06-03 14:57       ` Brian Foster
2020-06-03 21:21         ` Dave Chinner
2020-06-01 21:42 ` [PATCH 05/30] xfs: mark dquot " Dave Chinner
2020-06-02 16:45   ` Brian Foster
2020-06-02 19:00   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 06/30] xfs: mark log recovery buffers for completion Dave Chinner
2020-06-02 16:45   ` Brian Foster
2020-06-02 19:24   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 07/30] xfs: call xfs_buf_iodone directly Dave Chinner
2020-06-02 16:47   ` Brian Foster
2020-06-02 21:38     ` Dave Chinner
2020-06-03 14:58       ` Brian Foster
2020-06-01 21:42 ` [PATCH 08/30] xfs: clean up whacky buffer log item list reinit Dave Chinner
2020-06-02 16:47   ` Brian Foster
2020-06-01 21:42 ` [PATCH 09/30] xfs: make inode IO completion buffer centric Dave Chinner
2020-06-03 14:58   ` Brian Foster
2020-06-01 21:42 ` [PATCH 10/30] xfs: use direct calls for dquot IO completion Dave Chinner
2020-06-02 19:25   ` Darrick J. Wong
2020-06-03 14:58   ` Brian Foster
2020-06-01 21:42 ` [PATCH 11/30] xfs: clean up the buffer iodone callback functions Dave Chinner
2020-06-03 14:58   ` Brian Foster
2020-06-01 21:42 ` [PATCH 12/30] xfs: get rid of log item callbacks Dave Chinner
2020-06-03 14:58   ` Brian Foster
2020-06-01 21:42 ` [PATCH 13/30] xfs: handle buffer log item IO errors directly Dave Chinner
2020-06-02 20:39   ` Darrick J. Wong
2020-06-02 22:17     ` Dave Chinner
2020-06-03 15:02   ` Brian Foster
2020-06-03 21:34     ` Dave Chinner
2020-06-01 21:42 ` [PATCH 14/30] xfs: unwind log item error flagging Dave Chinner
2020-06-02 20:45   ` Darrick J. Wong
2020-06-03 15:02   ` Brian Foster
2020-06-01 21:42 ` [PATCH 15/30] xfs: move xfs_clear_li_failed out of xfs_ail_delete_one() Dave Chinner
2020-06-02 20:47   ` Darrick J. Wong
2020-06-03 15:02   ` Brian Foster
2020-06-01 21:42 ` [PATCH 16/30] xfs: pin inode backing buffer to the inode log item Dave Chinner
2020-06-02 22:30   ` Darrick J. Wong
2020-06-02 22:53     ` Dave Chinner
2020-06-03 18:58   ` Brian Foster
2020-06-03 22:15     ` Dave Chinner
2020-06-04 14:03       ` Brian Foster
2020-06-01 21:42 ` [PATCH 17/30] xfs: make inode reclaim almost non-blocking Dave Chinner
2020-06-01 21:42 ` [PATCH 18/30] xfs: remove IO submission from xfs_reclaim_inode() Dave Chinner
2020-06-02 22:36   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 19/30] xfs: allow multiple reclaimers per AG Dave Chinner
2020-06-01 21:42 ` [PATCH 20/30] xfs: don't block inode reclaim on the ILOCK Dave Chinner
2020-06-01 21:42 ` [PATCH 21/30] xfs: remove SYNC_TRYLOCK from inode reclaim Dave Chinner
2020-06-01 21:42 ` [PATCH 22/30] xfs: remove SYNC_WAIT from xfs_reclaim_inodes() Dave Chinner
2020-06-02 22:43   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 23/30] xfs: clean up inode reclaim comments Dave Chinner
2020-06-02 22:45   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 24/30] xfs: rework stale inodes in xfs_ifree_cluster Dave Chinner
2020-06-02 23:01   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 25/30] xfs: attach inodes to the cluster buffer when dirtied Dave Chinner
2020-06-02 23:03   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 26/30] xfs: xfs_iflush() is no longer necessary Dave Chinner
2020-06-01 21:42 ` [PATCH 27/30] xfs: rename xfs_iflush_int() Dave Chinner
2020-06-01 21:42 ` [PATCH 28/30] xfs: rework xfs_iflush_cluster() dirty inode iteration Dave Chinner
2020-06-02 23:23   ` Darrick J. Wong
2020-06-01 21:42 ` [PATCH 29/30] xfs: factor xfs_iflush_done Dave Chinner
2020-06-01 21:42 ` [PATCH 30/30] xfs: remove xfs_inobp_check() Dave Chinner
  -- strict thread matches above, loose matches on Subject: below --
2020-06-04  7:45 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-04  7:45 ` [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Dave Chinner
2020-06-22  8:15 [PATCH 00/30] xfs: rework inode flushing to make inode reclaim fully asynchronous Dave Chinner
2020-06-22  8:15 ` [PATCH 01/30] xfs: Don't allow logging of XFS_ISTALE inodes Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200602043052.GZ8230@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.