stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
	Nikolay Borisov <nborisov@suse.com>,
	Libor Pechacek <lpechacek@suse.com>,
	"Darrick J. Wong" <darrick.wong@oracle.com>
Subject: [PATCH 4.9 70/94] xfs: use ->b_state to fix buffer I/O accounting release race
Date: Mon,  5 Jun 2017 18:17:46 +0200	[thread overview]
Message-ID: <20170605153106.104248073@linuxfoundation.org> (raw)
In-Reply-To: <20170605153103.156843111@linuxfoundation.org>

4.9-stable review patch.  If anyone has any objections, please let me know.

------------------

From: Brian Foster <bfoster@redhat.com>

commit 63db7c815bc0997c29e484d2409684fdd9fcd93b upstream.

We've had user reports of unmount hangs in xfs_wait_buftarg() that
analysis shows is due to btp->bt_io_count == -1. bt_io_count
represents the count of in-flight asynchronous buffers and thus
should always be >= 0. xfs_wait_buftarg() waits for this value to
stabilize to zero in order to ensure that all untracked (with
respect to the lru) buffers have completed I/O processing before
unmount proceeds to tear down in-core data structures.

The value of -1 implies an I/O accounting decrement race. Indeed,
the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
(where the buffer lock is no longer held) means that bp->b_flags can
be updated from an unsafe context. While a user-level reproducer is
currently not available, some intrusive hacks to run racing buffer
lookups/ioacct/releases from multiple threads was used to
successfully manufacture this problem.

Existing callers do not expect to acquire the buffer lock from
xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
this context. It turns out that we already have separate buffer
state bits and associated serialization for dealing with buffer LRU
state in the form of ->b_state and ->b_lock. Therefore, replace the
_XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
accounting wrappers appropriately and make sure they are used with
the correct locking. This ensures that buffer in-flight state can be
modified at buffer release time without racing with modifications
from a buffer lock holder.

Fixes: 9c7504aa72b6 ("xfs: track and serialize in-flight async buffers against unmount")
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Nikolay Borisov <nborisov@suse.com>
Tested-by: Libor Pechacek <lpechacek@suse.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

---
 fs/xfs/xfs_buf.c |   38 ++++++++++++++++++++++++++------------
 fs/xfs/xfs_buf.h |    5 ++---
 2 files changed, 28 insertions(+), 15 deletions(-)

--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -96,12 +96,16 @@ static inline void
 xfs_buf_ioacct_inc(
 	struct xfs_buf	*bp)
 {
-	if (bp->b_flags & (XBF_NO_IOACCT|_XBF_IN_FLIGHT))
+	if (bp->b_flags & XBF_NO_IOACCT)
 		return;
 
 	ASSERT(bp->b_flags & XBF_ASYNC);
-	bp->b_flags |= _XBF_IN_FLIGHT;
-	percpu_counter_inc(&bp->b_target->bt_io_count);
+	spin_lock(&bp->b_lock);
+	if (!(bp->b_state & XFS_BSTATE_IN_FLIGHT)) {
+		bp->b_state |= XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_inc(&bp->b_target->bt_io_count);
+	}
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -109,14 +113,24 @@ xfs_buf_ioacct_inc(
  * freed and unaccount from the buftarg.
  */
 static inline void
-xfs_buf_ioacct_dec(
+__xfs_buf_ioacct_dec(
 	struct xfs_buf	*bp)
 {
-	if (!(bp->b_flags & _XBF_IN_FLIGHT))
-		return;
+	ASSERT(spin_is_locked(&bp->b_lock));
 
-	bp->b_flags &= ~_XBF_IN_FLIGHT;
-	percpu_counter_dec(&bp->b_target->bt_io_count);
+	if (bp->b_state & XFS_BSTATE_IN_FLIGHT) {
+		bp->b_state &= ~XFS_BSTATE_IN_FLIGHT;
+		percpu_counter_dec(&bp->b_target->bt_io_count);
+	}
+}
+
+static inline void
+xfs_buf_ioacct_dec(
+	struct xfs_buf	*bp)
+{
+	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+	spin_unlock(&bp->b_lock);
 }
 
 /*
@@ -148,9 +162,9 @@ xfs_buf_stale(
 	 * unaccounted (released to LRU) before that occurs. Drop in-flight
 	 * status now to preserve accounting consistency.
 	 */
-	xfs_buf_ioacct_dec(bp);
-
 	spin_lock(&bp->b_lock);
+	__xfs_buf_ioacct_dec(bp);
+
 	atomic_set(&bp->b_lru_ref, 0);
 	if (!(bp->b_state & XFS_BSTATE_DISPOSE) &&
 	    (list_lru_del(&bp->b_target->bt_lru, &bp->b_lru)))
@@ -953,12 +967,12 @@ xfs_buf_rele(
 		 * ensures the decrement occurs only once per-buf.
 		 */
 		if ((atomic_read(&bp->b_hold) == 1) && !list_empty(&bp->b_lru))
-			xfs_buf_ioacct_dec(bp);
+			__xfs_buf_ioacct_dec(bp);
 		goto out_unlock;
 	}
 
 	/* the last reference has been dropped ... */
-	xfs_buf_ioacct_dec(bp);
+	__xfs_buf_ioacct_dec(bp);
 	if (!(bp->b_flags & XBF_STALE) && atomic_read(&bp->b_lru_ref)) {
 		/*
 		 * If the buffer is added to the LRU take a new reference to the
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -63,7 +63,6 @@ typedef enum {
 #define _XBF_KMEM	 (1 << 21)/* backed by heap memory */
 #define _XBF_DELWRI_Q	 (1 << 22)/* buffer on a delwri queue */
 #define _XBF_COMPOUND	 (1 << 23)/* compound buffer */
-#define _XBF_IN_FLIGHT	 (1 << 25) /* I/O in flight, for accounting purposes */
 
 typedef unsigned int xfs_buf_flags_t;
 
@@ -83,14 +82,14 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
-	{ _XBF_COMPOUND,	"COMPOUND" }, \
-	{ _XBF_IN_FLIGHT,	"IN_FLIGHT" }
+	{ _XBF_COMPOUND,	"COMPOUND" }
 
 
 /*
  * Internal state flags.
  */
 #define XFS_BSTATE_DISPOSE	 (1 << 0)	/* buffer being discarded */
+#define XFS_BSTATE_IN_FLIGHT	 (1 << 1)	/* I/O in flight */
 
 /*
  * The xfs_buftarg contains 2 notions of "sector size" -

  parent reply	other threads:[~2017-06-05 16:26 UTC|newest]

Thread overview: 92+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-05 16:16 [PATCH 4.9 00/94] 4.9.31-stable review Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 01/94] dccp/tcp: do not inherit mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 02/94] driver: vrf: Fix one possible use-after-free issue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 03/94] ipv6/dccp: do not inherit ipv6_mc_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 04/94] s390/qeth: handle sysfs error during initialization Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 05/94] s390/qeth: unbreak OSM and OSN support Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 06/94] s390/qeth: avoid null pointer dereference on OSN Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 07/94] s390/qeth: add missing hash table initializations Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 08/94] bpf, arm64: fix faulty emission of map access in tail calls Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 09/94] netem: fix skb_orphan_partial() Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 11/94] tcp: avoid fragmenting peculiar skbs in SACK Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 12/94] sctp: fix src address selection if using secondary addresses for ipv6 Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 13/94] sctp: do not inherit ipv6_{mc|ac|fl}_list from parent Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 14/94] net/packet: fix missing net_device reference release Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 15/94] net/mlx5e: Use the correct pause values for ethtool advertising Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 16/94] net/mlx5e: Fix ethtool pause support and advertise reporting Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 17/94] tcp: eliminate negative reordering in tcp_clean_rtx_queue Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 18/94] net: Improve handling of failures on link and route dumps Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 19/94] ipv6: Prevent overrun when parsing v6 header options Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 20/94] ipv6: Check ip6_find_1stfragopt() return value properly Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 21/94] bridge: netlink: check vlan_default_pvid range Greg Kroah-Hartman
2017-06-05 16:16 ` [PATCH 4.9 23/94] bridge: start hello_timer when enabling KERNEL_STP in br_stp_start Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 24/94] ipv6: fix out of bound writes in __ip6_append_data() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 25/94] bonding: fix accounting of active ports in 3ad Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 26/94] net/mlx5: Avoid using pending command interface slots Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 27/94] net: phy: marvell: Limit errata to 88m1101 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 28/94] vlan: Fix tcp checksum offloads in Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 29/94] be2net: Fix offload features for Q-in-Q packets Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 30/94] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 31/94] tcp: avoid fastopen API to be used on AF_UNSPEC Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 32/94] sctp: fix ICMP processing if skb is non-linear Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 33/94] ipv4: add reference counting to metrics Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 34/94] bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 35/94] sparc: Fix -Wstringop-overflow warning Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 36/94] sparc/ftrace: Fix ftrace graph time measurement Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 37/94] fs/ufs: Set UFS default maximum bytes per file Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 38/94] powerpc/spufs: Fix hash faults for kernel regions Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 39/94] drivers/tty: 8250: only call fintek_8250_probe when doing port I/O Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 40/94] i2c: i2c-tiny-usb: fix buffer not being DMA capable Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 41/94] crypto: skcipher - Add missing API setkey checks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 42/94] x86/MCE: Export memory_error() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 43/94] acpi, nfit: Fix the memory error check in nfit_handle_mce() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 44/94] Revert "ACPI / button: Change default behavior to lid_init_state=open" Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 45/94] mmc: sdhci-iproc: suppress spurious interrupt with Multiblock read Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 46/94] iscsi-target: Always wait for kthread_should_stop() before kthread exit Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 47/94] ibmvscsis: Clear left-over abort_cmd pointers Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 48/94] ibmvscsis: Fix the incorrect req_lim_delta Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 49/94] HID: wacom: Have wacom_tpc_irq guard against possible NULL dereference Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 50/94] nvme-rdma: support devices with queue size < 32 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 51/94] nvme: use blk_mq_start_hw_queues() in nvme_kill_queues() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 52/94] nvme: avoid to use blk_mq_abort_requeue_list() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 53/94] scsi: mpt3sas: Force request partial completion alignment Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 57/94] pcmcia: remove left-over %Z format Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 58/94] ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430 Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 59/94] mm/migrate: fix refcount handling when !hugepage_migration_supported() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 60/94] mlock: fix mlock count can not decrease in race condition Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 61/94] mm: consider memblock reservations for deferred memory initialization sizing Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 62/94] RDMA/qib,hfi1: Fix MR reference count leak on write with immediate Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 63/94] PCI/PM: Add needs_resume flag to avoid suspend complete optimization Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 64/94] x86/boot: Use CROSS_COMPILE prefix for readelf Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 65/94] ksm: prevent crash after write_protect_page fails Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 66/94] slub/memcg: cure the brainless abuse of sysfs attributes Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 67/94] mm/slub.c: trace free objects at KERN_INFO Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 68/94] drm/gma500/psb: Actually use VBT mode when it is found Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 69/94] xfs: Fix missed holes in SEEK_HOLE implementation Greg Kroah-Hartman
2017-06-05 16:17 ` Greg Kroah-Hartman [this message]
2017-06-05 16:17 ` [PATCH 4.9 71/94] xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 72/94] xfs: verify inline directory data forks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 73/94] xfs: rework the inline directory verifiers Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 74/94] xfs: fix kernel memory exposure problems Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 75/94] xfs: use dedicated log worker wq to avoid deadlock with cil wq Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 76/94] xfs: fix over-copying of getbmap parameters from userspace Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 77/94] xfs: actually report xattr extents via iomap Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 78/94] xfs: drop iolock from reclaim context to appease lockdep Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 79/94] xfs: fix integer truncation in xfs_bmap_remap_alloc Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 80/94] xfs: handle array index overrun in xfs_dir2_leaf_readbuf() Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 81/94] xfs: prevent multi-fsb dir readahead from reading random blocks Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 82/94] xfs: fix up quotacheck buffer list error handling Greg Kroah-Hartman
2017-06-05 16:17 ` [PATCH 4.9 83/94] xfs: support ability to wait on new inodes Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 84/94] xfs: update ag iterator to support " Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 85/94] xfs: wait on new inodes during quotaoff dquot release Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 86/94] xfs: reserve enough blocks to handle btree splits when remapping Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 87/94] xfs: fix use-after-free in xfs_finish_page_writeback Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 88/94] xfs: fix indlen accounting error on partial delalloc conversion Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 89/94] xfs: BMAPX shouldnt barf on inline-format directories Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 90/94] xfs: bad assertion for delalloc an extent that start at i_size Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 91/94] xfs: xfs_trans_alloc_empty Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 92/94] xfs: avoid mount-time deadlock in CoW extent recovery Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 93/94] xfs: fix unaligned access in xfs_btree_visit_blocks Greg Kroah-Hartman
2017-06-05 16:18 ` [PATCH 4.9 94/94] xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff() Greg Kroah-Hartman
2017-06-05 20:34 ` [PATCH 4.9 00/94] 4.9.31-stable review Shuah Khan
2017-06-05 22:26 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170605153106.104248073@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=bfoster@redhat.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lpechacek@suse.com \
    --cc=nborisov@suse.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).