From: Christoph Hellwig <hch@lst.de>
To: stable@vger.kernel.org
Cc: linux-xfs@vger.kernel.org, Brian Foster <bfoster@redhat.com>,
"Darrick J . Wong" <darrick.wong@oracle.com>
Subject: [PATCH 03/47] xfs: push buffer of flush locked dquot to avoid quotacheck deadlock
Date: Sun, 17 Sep 2017 14:06:28 -0700 [thread overview]
Message-ID: <20170917210712.10804-4-hch@lst.de> (raw)
In-Reply-To: <20170917210712.10804-1-hch@lst.de>
From: Brian Foster <bfoster@redhat.com>
commit 7912e7fef2aebe577f0b46d3cba261f2783c5695 upstream.
Reclaim during quotacheck can lead to deadlocks on the dquot flush
lock:
- Quotacheck populates a local delwri queue with the physical dquot
buffers.
- Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
dirties all of the dquots.
- Reclaim kicks in and attempts to flush a dquot whose buffer is
already queud on the quotacheck queue. The flush succeeds but
queueing to the reclaim delwri queue fails as the backing buffer is
already queued. The flush unlock is now deferred to I/O completion
of the buffer from the quotacheck queue.
- The dqadjust bulkstat continues and dirties the recently flushed
dquot once again.
- Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
the flush lock to update the backing buffers with the in-core
recalculated values. It deadlocks on the redirtied dquot as the
flush lock was already acquired by reclaim, but the buffer resides
on the local delwri queue which isn't submitted until the end of
quotacheck.
This is reproduced by running quotacheck on a filesystem with a
couple million inodes in low memory (512MB-1GB) situations. This is
a regression as of commit 43ff2122e6 ("xfs: on-stack delayed write
buffer lists"), which removed a trylock and buffer I/O submission
from the quotacheck dquot flush sequence.
Quotacheck first resets and collects the physical dquot buffers in a
delwri queue. Then, it traverses the filesystem inodes via bulkstat,
updates the in-core dquots, flushes the corrected dquots to the
backing buffers and finally submits the delwri queue for I/O. Since
the backing buffers are queued across the entire quotacheck
operation, dquot reclaim cannot possibly complete a dquot flush
before quotacheck completes.
Therefore, quotacheck must submit the buffer for I/O in order to
cycle the flush lock and flush the dirty in-core dquot to the
buffer. Add a delwri queue buffer push mechanism to submit an
individual buffer for I/O without losing the delwri queue status and
use it from quotacheck to avoid the deadlock. This restores
quotacheck behavior to as before the regression was introduced.
Reported-by: Martin Svec <martin.svec@zoner.cz>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_buf.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
fs/xfs/xfs_buf.h | 1 +
fs/xfs/xfs_qm.c | 28 ++++++++++++++++++++++++-
fs/xfs/xfs_trace.h | 1 +
4 files changed, 89 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 24940dd3baa8..eca7baecc9f0 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -2022,6 +2022,66 @@ xfs_buf_delwri_submit(
return error;
}
+/*
+ * Push a single buffer on a delwri queue.
+ *
+ * The purpose of this function is to submit a single buffer of a delwri queue
+ * and return with the buffer still on the original queue. The waiting delwri
+ * buffer submission infrastructure guarantees transfer of the delwri queue
+ * buffer reference to a temporary wait list. We reuse this infrastructure to
+ * transfer the buffer back to the original queue.
+ *
+ * Note the buffer transitions from the queued state, to the submitted and wait
+ * listed state and back to the queued state during this call. The buffer
+ * locking and queue management logic between _delwri_pushbuf() and
+ * _delwri_queue() guarantee that the buffer cannot be queued to another list
+ * before returning.
+ */
+int
+xfs_buf_delwri_pushbuf(
+ struct xfs_buf *bp,
+ struct list_head *buffer_list)
+{
+ LIST_HEAD (submit_list);
+ int error;
+
+ ASSERT(bp->b_flags & _XBF_DELWRI_Q);
+
+ trace_xfs_buf_delwri_pushbuf(bp, _RET_IP_);
+
+ /*
+ * Isolate the buffer to a new local list so we can submit it for I/O
+ * independently from the rest of the original list.
+ */
+ xfs_buf_lock(bp);
+ list_move(&bp->b_list, &submit_list);
+ xfs_buf_unlock(bp);
+
+ /*
+ * Delwri submission clears the DELWRI_Q buffer flag and returns with
+ * the buffer on the wait list with an associated reference. Rather than
+ * bounce the buffer from a local wait list back to the original list
+ * after I/O completion, reuse the original list as the wait list.
+ */
+ xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
+
+ /*
+ * The buffer is now under I/O and wait listed as during typical delwri
+ * submission. Lock the buffer to wait for I/O completion. Rather than
+ * remove the buffer from the wait list and release the reference, we
+ * want to return with the buffer queued to the original list. The
+ * buffer already sits on the original list with a wait list reference,
+ * however. If we let the queue inherit that wait list reference, all we
+ * need to do is reset the DELWRI_Q flag.
+ */
+ xfs_buf_lock(bp);
+ error = bp->b_error;
+ bp->b_flags |= _XBF_DELWRI_Q;
+ xfs_buf_unlock(bp);
+
+ return error;
+}
+
int __init
xfs_buf_init(void)
{
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index ad514a8025dd..f961b19b9cc2 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -333,6 +333,7 @@ extern void xfs_buf_delwri_cancel(struct list_head *);
extern bool xfs_buf_delwri_queue(struct xfs_buf *, struct list_head *);
extern int xfs_buf_delwri_submit(struct list_head *);
extern int xfs_buf_delwri_submit_nowait(struct list_head *);
+extern int xfs_buf_delwri_pushbuf(struct xfs_buf *, struct list_head *);
/* Buffer Daemon Setup Routines */
extern int xfs_buf_init(void);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 8b9a9f15f022..8068867a8183 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -1247,6 +1247,7 @@ xfs_qm_flush_one(
struct xfs_dquot *dqp,
void *data)
{
+ struct xfs_mount *mp = dqp->q_mount;
struct list_head *buffer_list = data;
struct xfs_buf *bp = NULL;
int error = 0;
@@ -1257,7 +1258,32 @@ xfs_qm_flush_one(
if (!XFS_DQ_IS_DIRTY(dqp))
goto out_unlock;
- xfs_dqflock(dqp);
+ /*
+ * The only way the dquot is already flush locked by the time quotacheck
+ * gets here is if reclaim flushed it before the dqadjust walk dirtied
+ * it for the final time. Quotacheck collects all dquot bufs in the
+ * local delwri queue before dquots are dirtied, so reclaim can't have
+ * possibly queued it for I/O. The only way out is to push the buffer to
+ * cycle the flush lock.
+ */
+ if (!xfs_dqflock_nowait(dqp)) {
+ /* buf is pinned in-core by delwri list */
+ DEFINE_SINGLE_BUF_MAP(map, dqp->q_blkno,
+ mp->m_quotainfo->qi_dqchunklen);
+ bp = _xfs_buf_find(mp->m_ddev_targp, &map, 1, 0, NULL);
+ if (!bp) {
+ error = -EINVAL;
+ goto out_unlock;
+ }
+ xfs_buf_unlock(bp);
+
+ xfs_buf_delwri_pushbuf(bp, buffer_list);
+ xfs_buf_rele(bp);
+
+ error = -EAGAIN;
+ goto out_unlock;
+ }
+
error = xfs_qm_dqflush(dqp, &bp);
if (error)
goto out_unlock;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 828f383df121..2df73f3a73c1 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -366,6 +366,7 @@ DEFINE_BUF_EVENT(xfs_buf_iowait_done);
DEFINE_BUF_EVENT(xfs_buf_delwri_queue);
DEFINE_BUF_EVENT(xfs_buf_delwri_queued);
DEFINE_BUF_EVENT(xfs_buf_delwri_split);
+DEFINE_BUF_EVENT(xfs_buf_delwri_pushbuf);
DEFINE_BUF_EVENT(xfs_buf_get_uncached);
DEFINE_BUF_EVENT(xfs_bdstrat_shut);
DEFINE_BUF_EVENT(xfs_buf_item_relse);
--
2.14.1
next prev parent reply other threads:[~2017-09-17 21:07 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-09-17 21:06 4.9-stable updates for XFS Christoph Hellwig
2017-09-17 21:06 ` [PATCH 01/47] xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff() Christoph Hellwig
2017-09-17 21:06 ` [PATCH 02/47] xfs: fix spurious spin_is_locked() assert failures on non-smp kernels Christoph Hellwig
2017-09-17 21:06 ` Christoph Hellwig [this message]
2017-09-17 21:06 ` [PATCH 04/47] xfs: try to avoid blowing out the transaction reservation when bunmaping a shared extent Christoph Hellwig
2017-09-17 21:06 ` [PATCH 05/47] xfs: release bli from transaction properly on fs shutdown Christoph Hellwig
2017-09-17 21:06 ` [PATCH 06/47] xfs: remove bli from AIL before release on transaction abort Christoph Hellwig
2017-09-17 21:06 ` [PATCH 07/47] xfs: don't allow bmap on rt files Christoph Hellwig
2017-09-17 21:06 ` [PATCH 08/47] xfs: free uncommitted transactions during log recovery Christoph Hellwig
2017-09-17 21:06 ` [PATCH 09/47] xfs: free cowblocks and retry on buffered write ENOSPC Christoph Hellwig
2017-09-17 21:06 ` [PATCH 10/47] xfs: don't crash on unexpected holes in dir/attr btrees Christoph Hellwig
2017-09-17 21:06 ` [PATCH 11/47] xfs: check _btree_check_block value Christoph Hellwig
2017-09-17 21:06 ` [PATCH 12/47] xfs: set firstfsb to NULLFSBLOCK before feeding it to _bmapi_write Christoph Hellwig
2017-09-17 21:06 ` [PATCH 13/47] xfs: check _alloc_read_agf buffer pointer before using Christoph Hellwig
2017-09-17 21:06 ` [PATCH 14/47] xfs: fix quotacheck dquot id overflow infinite loop Christoph Hellwig
2017-09-17 21:06 ` [PATCH 15/47] xfs: fix multi-AG deadlock in xfs_bunmapi Christoph Hellwig
2017-09-17 21:06 ` [PATCH 16/47] xfs: Fix per-inode DAX flag inheritance Christoph Hellwig
2017-09-17 21:06 ` [PATCH 17/47] xfs: fix inobt inode allocation search optimization Christoph Hellwig
2017-09-17 21:06 ` [PATCH 18/47] xfs: clear MS_ACTIVE after finishing log recovery Christoph Hellwig
2017-09-17 21:06 ` [PATCH 19/47] xfs: don't leak quotacheck dquots when cow recovery Christoph Hellwig
2017-09-17 21:06 ` [PATCH 20/47] iomap: fix integer truncation issues in the zeroing and dirtying helpers Christoph Hellwig
2017-09-17 21:06 ` [PATCH 21/47] xfs: write unmount record for ro mounts Christoph Hellwig
2017-09-17 21:06 ` [PATCH 22/47] xfs: toggle readonly state around xfs_log_mount_finish Christoph Hellwig
2017-09-17 21:06 ` [PATCH 23/47] xfs: remove xfs_trans_ail_delete_bulk Christoph Hellwig
2017-09-17 21:06 ` [PATCH 24/47] xfs: Add infrastructure needed for error propagation during buffer IO failure Christoph Hellwig
2017-09-17 21:06 ` [PATCH 25/47] xfs: Properly retry failed inode items in case of error during buffer writeback Christoph Hellwig
2017-09-17 21:06 ` [PATCH 26/47] xfs: fix recovery failure when log record header wraps log end Christoph Hellwig
2017-09-17 21:06 ` [PATCH 27/47] xfs: always verify the log tail during recovery Christoph Hellwig
2017-09-17 21:06 ` [PATCH 28/47] xfs: fix log recovery corruption error due to tail overwrite Christoph Hellwig
2017-09-17 21:06 ` [PATCH 29/47] xfs: handle -EFSCORRUPTED during head/tail verification Christoph Hellwig
2017-09-17 21:06 ` [PATCH 30/47] xfs: add log recovery tracepoint for head/tail Christoph Hellwig
2017-09-17 21:06 ` [PATCH 31/47] xfs: stop searching for free slots in an inode chunk when there are none Christoph Hellwig
2017-09-17 21:06 ` [PATCH 32/47] xfs: evict all inodes involved with log redo item Christoph Hellwig
2017-09-17 21:06 ` [PATCH 33/47] xfs: check for race with xfs_reclaim_inode() in xfs_ifree_cluster() Christoph Hellwig
2017-09-17 21:06 ` [PATCH 34/47] xfs: open-code xfs_buf_item_dirty() Christoph Hellwig
2017-09-17 21:07 ` [PATCH 35/47] xfs: remove unnecessary dirty bli format check for ordered bufs Christoph Hellwig
2017-09-17 21:07 ` [PATCH 36/47] xfs: ordered buffer log items are never formatted Christoph Hellwig
2017-09-17 21:07 ` [PATCH 37/47] xfs: refactor buffer logging into buffer dirtying helper Christoph Hellwig
2017-09-17 21:07 ` [PATCH 38/47] xfs: don't log dirty ranges for ordered buffers Christoph Hellwig
2017-09-17 21:07 ` [PATCH 39/47] xfs: skip bmbt block ino validation during owner change Christoph Hellwig
2017-09-17 21:07 ` [PATCH 40/47] xfs: move bmbt owner change to last step of extent swap Christoph Hellwig
2017-09-17 21:07 ` [PATCH 41/47] xfs: disallow marking previously dirty buffers as ordered Christoph Hellwig
2017-09-17 21:07 ` [PATCH 42/47] xfs: relog dirty buffers during swapext bmbt owner change Christoph Hellwig
2017-09-17 21:07 ` [PATCH 43/47] xfs: disable per-inode DAX flag Christoph Hellwig
2017-09-17 21:07 ` [PATCH 44/47] xfs: fix incorrect log_flushed on fsync Christoph Hellwig
2017-09-17 21:07 ` [PATCH 45/47] xfs: don't set v3 xflags for v2 inodes Christoph Hellwig
2017-09-17 21:07 ` [PATCH 46/47] xfs: open code end_buffer_async_write in xfs_finish_page_writeback Christoph Hellwig
2017-09-17 21:07 ` [PATCH 47/47] xfs: use kmem_free to free return value of kmem_zalloc Christoph Hellwig
2017-09-18 8:22 ` 4.9-stable updates for XFS Greg KH
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170917210712.10804-4-hch@lst.de \
--to=hch@lst.de \
--cc=bfoster@redhat.com \
--cc=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).