stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: <gregkh@linuxfoundation.org>
To: hch@lst.de, bfoster@redhat.com, david@fromorbit.com,
	dchinner@redhat.com, gregkh@linuxfoundation.org,
	zlang@redhat.com
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "xfs: fix unbalanced inode reclaim flush locking" has been added to the 4.9-stable tree
Date: Tue, 10 Jan 2017 11:33:05 +0100	[thread overview]
Message-ID: <1484044385204111@kroah.com> (raw)
In-Reply-To: <1483976343-661-8-git-send-email-hch@lst.de>


This is a note to let you know that I've just added the patch titled

    xfs: fix unbalanced inode reclaim flush locking

to the 4.9-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     xfs-fix-unbalanced-inode-reclaim-flush-locking.patch
and it can be found in the queue-4.9 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.


>From hch@lst.de  Tue Jan 10 11:23:57 2017
From: Christoph Hellwig <hch@lst.de>
Date: Mon,  9 Jan 2017 16:38:38 +0100
Subject: xfs: fix unbalanced inode reclaim flush locking
To: stable@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, Brian Foster <bfoster@redhat.com>, Dave Chinner <david@fromorbit.com>
Message-ID: <1483976343-661-8-git-send-email-hch@lst.de>


From: Brian Foster <bfoster@redhat.com>

commit 98efe8af1c9ffac47e842b7a75ded903e2f028da upstream.

Filesystem shutdown testing on an older distro kernel has uncovered an
imbalanced locking pattern for the inode flush lock in
xfs_reclaim_inode(). Specifically, there is a double unlock sequence
between the call to xfs_iflush_abort() and xfs_reclaim_inode() at the
"reclaim:" label.

This actually does not cause obvious problems on current kernels due to
the current flush lock implementation. Older kernels use a counting
based flush lock mechanism, however, which effectively breaks the lock
indefinitely when an already unlocked flush lock is repeatedly unlocked.
Though this only currently occurs on filesystem shutdown, it has
reproduced the effect of elevating an fs shutdown to a system-wide crash
or hang.

As it turns out, the flush lock is not actually required for the reclaim
logic in xfs_reclaim_inode() because by that time we have already cycled
the flush lock once while holding ILOCK_EXCL. Therefore, remove the
additional flush lock/unlock cycle around the 'reclaim:' label and
update branches into this label to release the flush lock where
appropriate. Add an assert to xfs_ifunlock() to help prevent future
occurences of the same problem.

Reported-by: Zorro Lang <zlang@redhat.com>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 fs/xfs/xfs_icache.c |   27 ++++++++++++++-------------
 fs/xfs/xfs_inode.h  |   11 ++++++-----
 2 files changed, 20 insertions(+), 18 deletions(-)

--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -123,7 +123,6 @@ __xfs_inode_free(
 {
 	/* asserts to verify all state is correct here */
 	ASSERT(atomic_read(&ip->i_pincount) == 0);
-	ASSERT(!xfs_isiflocked(ip));
 	XFS_STATS_DEC(ip->i_mount, vn_active);
 
 	call_rcu(&VFS_I(ip)->i_rcu, xfs_inode_free_callback);
@@ -133,6 +132,8 @@ void
 xfs_inode_free(
 	struct xfs_inode	*ip)
 {
+	ASSERT(!xfs_isiflocked(ip));
+
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always
 	 * appears to be reclaimed with an invalid inode number when in the
@@ -981,6 +982,7 @@ restart:
 
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		xfs_iunpin_wait(ip);
+		/* xfs_iflush_abort() drops the flush lock */
 		xfs_iflush_abort(ip, false);
 		goto reclaim;
 	}
@@ -989,10 +991,10 @@ restart:
 			goto out_ifunlock;
 		xfs_iunpin_wait(ip);
 	}
-	if (xfs_iflags_test(ip, XFS_ISTALE))
-		goto reclaim;
-	if (xfs_inode_clean(ip))
+	if (xfs_iflags_test(ip, XFS_ISTALE) || xfs_inode_clean(ip)) {
+		xfs_ifunlock(ip);
 		goto reclaim;
+	}
 
 	/*
 	 * Never flush out dirty data during non-blocking reclaim, as it would
@@ -1030,25 +1032,24 @@ restart:
 		xfs_buf_relse(bp);
 	}
 
-	xfs_iflock(ip);
 reclaim:
+	ASSERT(!xfs_isiflocked(ip));
+
 	/*
 	 * Because we use RCU freeing we need to ensure the inode always appears
 	 * to be reclaimed with an invalid inode number when in the free state.
-	 * We do this as early as possible under the ILOCK and flush lock so
-	 * that xfs_iflush_cluster() can be guaranteed to detect races with us
-	 * here. By doing this, we guarantee that once xfs_iflush_cluster has
-	 * locked both the XFS_ILOCK and the flush lock that it will see either
-	 * a valid, flushable inode that will serialise correctly against the
-	 * locks below, or it will see a clean (and invalid) inode that it can
-	 * skip.
+	 * We do this as early as possible under the ILOCK so that
+	 * xfs_iflush_cluster() can be guaranteed to detect races with us here.
+	 * By doing this, we guarantee that once xfs_iflush_cluster has locked
+	 * XFS_ILOCK that it will see either a valid, flushable inode that will
+	 * serialise correctly, or it will see a clean (and invalid) inode that
+	 * it can skip.
 	 */
 	spin_lock(&ip->i_flags_lock);
 	ip->i_flags = XFS_IRECLAIM;
 	ip->i_ino = 0;
 	spin_unlock(&ip->i_flags_lock);
 
-	xfs_ifunlock(ip);
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
 	XFS_STATS_INC(ip->i_mount, xs_ig_reclaims);
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -246,6 +246,11 @@ static inline bool xfs_is_reflink_inode(
  * Synchronize processes attempting to flush the in-core inode back to disk.
  */
 
+static inline int xfs_isiflocked(struct xfs_inode *ip)
+{
+	return xfs_iflags_test(ip, XFS_IFLOCK);
+}
+
 extern void __xfs_iflock(struct xfs_inode *ip);
 
 static inline int xfs_iflock_nowait(struct xfs_inode *ip)
@@ -261,16 +266,12 @@ static inline void xfs_iflock(struct xfs
 
 static inline void xfs_ifunlock(struct xfs_inode *ip)
 {
+	ASSERT(xfs_isiflocked(ip));
 	xfs_iflags_clear(ip, XFS_IFLOCK);
 	smp_mb();
 	wake_up_bit(&ip->i_flags, __XFS_IFLOCK_BIT);
 }
 
-static inline int xfs_isiflocked(struct xfs_inode *ip)
-{
-	return xfs_iflags_test(ip, XFS_IFLOCK);
-}
-
 /*
  * Flags for inode locking.
  * Bit ranges:	1<<1  - 1<<16-1 -- iolock/ilock modes (bitfield)


Patches currently in stable-queue which might be from hch@lst.de are

queue-4.9/xfs-always-succeed-when-deduping-zero-bytes.patch
queue-4.9/xfs-fix-crash-and-data-corruption-due-to-removal-of-busy-cow-extents.patch
queue-4.9/xfs-don-t-allow-di_size-with-high-bit-set.patch
queue-4.9/xfs-new-inode-extent-list-lookup-helpers.patch
queue-4.9/xfs-don-t-call-xfs_sb_quota_from_disk-twice.patch
queue-4.9/xfs-factor-rmap-btree-size-into-the-indlen-calculations.patch
queue-4.9/xfs-check-return-value-of-_trans_reserve_quota_nblks.patch
queue-4.9/xfs-complain-if-we-don-t-get-nextents-bmap-records.patch
queue-4.9/xfs-check-for-bogus-values-in-btree-block-headers.patch
queue-4.9/xfs-use-gpf_nofs-when-allocating-btree-cursors.patch
queue-4.9/xfs-fix-max_retries-_show-and-_store-functions.patch
queue-4.9/xfs-fix-double-cleanup-when-cui-recovery-fails.patch
queue-4.9/xfs-don-t-skip-cow-forks-w-delalloc-blocks-in-cowblocks-scan.patch
queue-4.9/xfs-track-preallocation-separately-in-xfs_bmapi_reserve_delalloc.patch
queue-4.9/xfs-use-the-actual-ag-length-when-reserving-blocks.patch
queue-4.9/xfs-ignore-leaf-attr-ichdr.count-in-verifier-during-log-replay.patch
queue-4.9/xfs-pass-post-eof-speculative-prealloc-blocks-to-bmapi.patch
queue-4.9/xfs-don-t-cap-maximum-dedupe-request-length.patch
queue-4.9/xfs-pass-state-not-whichfork-to-trace_xfs_extlist.patch
queue-4.9/xfs-move-agi-buffer-type-setting-to-xfs_read_agi.patch
queue-4.9/xfs-check-minimum-block-size-for-crc-filesystems.patch
queue-4.9/xfs-handle-cow-fork-in-xfs_bmap_trace_exlist.patch
queue-4.9/pci-msi-check-for-null-affinity-mask-in-pci_irq_get_affinity.patch
queue-4.9/xfs-error-out-if-trying-to-add-attrs-and-anextents-0.patch
queue-4.9/xfs-don-t-bug-on-mixed-direct-and-mapped-i-o.patch
queue-4.9/xfs-use-new-extent-lookup-helpers-xfs_file_iomap_begin_delay.patch
queue-4.9/xfs-fix-unbalanced-inode-reclaim-flush-locking.patch
queue-4.9/genirq-affinity-fix-node-generation-from-cpumask.patch
queue-4.9/xfs-use-new-extent-lookup-helpers-in-__xfs_reflink_reserve_cow.patch
queue-4.9/xfs-don-t-crash-if-reading-a-directory-results-in-an-unexpected-hole.patch
queue-4.9/xfs-remove-prev-argument-to-xfs_bmapi_reserve_delalloc.patch
queue-4.9/xfs-clean-up-cow-fork-reservation-and-tag-inodes-correctly.patch
queue-4.9/xfs-forbid-ag-btrees-with-level-0.patch
queue-4.9/xfs-provide-helper-for-counting-extents-from-if_bytes.patch

  reply	other threads:[~2017-01-10 10:33 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-09 15:38 4.9-stable updates for XFS Christoph Hellwig
2017-01-09 15:38 ` [PATCH 01/32] xfs: don't call xfs_sb_quota_from_disk twice Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: don't call xfs_sb_quota_from_disk twice" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 02/32] xfs: check return value of _trans_reserve_quota_nblks Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: check return value of _trans_reserve_quota_nblks" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 03/32] xfs: don't skip cow forks w/ delalloc blocks in cowblocks scan Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: don't skip cow forks w/ delalloc blocks in cowblocks scan" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 04/32] xfs: don't BUG() on mixed direct and mapped I/O Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: don't BUG() on mixed direct and mapped I/O" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 05/32] xfs: provide helper for counting extents from if_bytes Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: provide helper for counting extents from if_bytes" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 06/32] xfs: check minimum block size for CRC filesystems Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: check minimum block size for CRC filesystems" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 07/32] xfs: fix unbalanced inode reclaim flush locking Christoph Hellwig
2017-01-10 10:33   ` gregkh [this message]
2017-01-10 13:17     ` Patch "xfs: fix unbalanced inode reclaim flush locking" has been added to the 4.9-stable tree Zorro Lang
2017-01-09 15:38 ` [PATCH 08/32] xfs: new inode extent list lookup helpers Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: new inode extent list lookup helpers" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 09/32] xfs: factor rmap btree size into the indlen calculations Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: factor rmap btree size into the indlen calculations" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 10/32] xfs: always succeed when deduping zero bytes Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: always succeed when deduping zero bytes" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 11/32] xfs: remove prev argument to xfs_bmapi_reserve_delalloc Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: remove prev argument to xfs_bmapi_reserve_delalloc" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 12/32] xfs: track preallocation separately in xfs_bmapi_reserve_delalloc() Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: track preallocation separately in xfs_bmapi_reserve_delalloc()" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 13/32] xfs: use new extent lookup helpers in __xfs_reflink_reserve_cow Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: use new extent lookup helpers in __xfs_reflink_reserve_cow" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 14/32] xfs: clean up cow fork reservation and tag inodes correctly Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: clean up cow fork reservation and tag inodes correctly" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 15/32] xfs: use new extent lookup helpers xfs_file_iomap_begin_delay Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: use new extent lookup helpers xfs_file_iomap_begin_delay" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 16/32] xfs: pass post-eof speculative prealloc blocks to bmapi Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: pass post-eof speculative prealloc blocks to bmapi" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 17/32] xfs: Move AGI buffer type setting to xfs_read_agi Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: Move AGI buffer type setting to xfs_read_agi" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 18/32] xfs: pass state not whichfork to trace_xfs_extlist Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: pass state not whichfork to trace_xfs_extlist" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 19/32] xfs: handle cow fork in xfs_bmap_trace_exlist Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: handle cow fork in xfs_bmap_trace_exlist" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 20/32] xfs: forbid AG btrees with level == 0 Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: forbid AG btrees with level == 0" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 21/32] xfs: check for bogus values in btree block headers Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: check for bogus values in btree block headers" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 22/32] xfs: complain if we don't get nextents bmap records Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: complain if we don't get nextents bmap records" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 23/32] xfs: don't crash if reading a directory results in an unexpected hole Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: don't crash if reading a directory results in an unexpected hole" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 24/32] xfs: error out if trying to add attrs and anextents > 0 Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: error out if trying to add attrs and anextents > 0" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 25/32] xfs: don't allow di_size with high bit set Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: don't allow di_size with high bit set" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 26/32] xfs: don't cap maximum dedupe request length Christoph Hellwig
2017-01-10 10:32   ` Patch "xfs: don't cap maximum dedupe request length" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 27/32] xfs: ignore leaf attr ichdr.count in verifier during log replay Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: ignore leaf attr ichdr.count in verifier during log replay" has been added to the 4.9-stable tree gregkh
2017-01-09 15:38 ` [PATCH 28/32] xfs: use GPF_NOFS when allocating btree cursors Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: use GPF_NOFS when allocating btree cursors" has been added to the 4.9-stable tree gregkh
2017-01-09 15:39 ` [PATCH 29/32] xfs: fix double-cleanup when CUI recovery fails Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: fix double-cleanup when CUI recovery fails" has been added to the 4.9-stable tree gregkh
2017-01-09 15:39 ` [PATCH 30/32] xfs: use the actual AG length when reserving blocks Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: use the actual AG length when reserving blocks" has been added to the 4.9-stable tree gregkh
2017-01-09 15:39 ` [PATCH 31/32] xfs: fix crash and data corruption due to removal of busy COW extents Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: fix crash and data corruption due to removal of busy COW extents" has been added to the 4.9-stable tree gregkh
2017-01-09 15:39 ` [PATCH 32/32] xfs: fix max_retries _show and _store functions Christoph Hellwig
2017-01-10 10:33   ` Patch "xfs: fix max_retries _show and _store functions" has been added to the 4.9-stable tree gregkh
2017-01-10  0:21 ` 4.9-stable updates for XFS Darrick J. Wong
2017-01-10 10:37 ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1484044385204111@kroah.com \
    --to=gregkh@linuxfoundation.org \
    --cc=bfoster@redhat.com \
    --cc=david@fromorbit.com \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=stable-commits@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).