public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: xfs@oss.sgi.com
Subject: [PATCH 05/37] xfs: don't flush inodes from background inode reclaim
Date: Mon, 23 Apr 2012 15:58:35 +1000	[thread overview]
Message-ID: <1335160747-17254-6-git-send-email-david@fromorbit.com> (raw)
In-Reply-To: <1335160747-17254-1-git-send-email-david@fromorbit.com>

From: Christoph Hellwig <hch@infradead.org>

We already flush dirty inodes throug the AIL regularly, there is no reason
to have second thread compete with it and disturb the I/O pattern.  We still
do write inodes when doing a synchronous reclaim from the shrinker or during
unmount for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_sync.c |  102 ++++++++++++++++++++++-------------------------------
 1 file changed, 42 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c
index 7163ca8..89f14e5 100644
--- a/fs/xfs/xfs_sync.c
+++ b/fs/xfs/xfs_sync.c
@@ -631,11 +631,8 @@ xfs_reclaim_inode_grab(
 }
 
 /*
- * Inodes in different states need to be treated differently, and the return
- * value of xfs_iflush is not sufficient to get this right. The following table
- * lists the inode states and the reclaim actions necessary for non-blocking
- * reclaim:
- *
+ * Inodes in different states need to be treated differently. The following
+ * table lists the inode states and the reclaim actions necessary:
  *
  *	inode state	     iflush ret		required action
  *      ---------------      ----------         ---------------
@@ -645,9 +642,8 @@ xfs_reclaim_inode_grab(
  *	stale, unpinned		0		reclaim
  *	clean, pinned(*)	0		requeue
  *	stale, pinned		EAGAIN		requeue
- *	dirty, delwri ok	0		requeue
- *	dirty, delwri blocked	EAGAIN		requeue
- *	dirty, sync flush	0		reclaim
+ *	dirty, async		-		requeue
+ *	dirty, sync		0		reclaim
  *
  * (*) dgc: I don't think the clean, pinned state is possible but it gets
  * handled anyway given the order of checks implemented.
@@ -658,26 +654,23 @@ xfs_reclaim_inode_grab(
  *
  * Also, because we get the flush lock first, we know that any inode that has
  * been flushed delwri has had the flush completed by the time we check that
- * the inode is clean. The clean inode check needs to be done before flushing
- * the inode delwri otherwise we would loop forever requeuing clean inodes as
- * we cannot tell apart a successful delwri flush and a clean inode from the
- * return value of xfs_iflush().
+ * the inode is clean.
  *
- * Note that because the inode is flushed delayed write by background
- * writeback, the flush lock may already be held here and waiting on it can
- * result in very long latencies. Hence for sync reclaims, where we wait on the
- * flush lock, the caller should push out delayed write inodes first before
- * trying to reclaim them to minimise the amount of time spent waiting. For
- * background relaim, we just requeue the inode for the next pass.
+ * Note that because the inode is flushed delayed write by AIL pushing, the
+ * flush lock may already be held here and waiting on it can result in very
+ * long latencies.  Hence for sync reclaims, where we wait on the flush lock,
+ * the caller should push the AIL first before trying to reclaim inodes to
+ * minimise the amount of time spent waiting.  For background relaim, we only
+ * bother to reclaim clean inodes anyway.
  *
  * Hence the order of actions after gaining the locks should be:
  *	bad		=> reclaim
  *	shutdown	=> unpin and reclaim
- *	pinned, delwri	=> requeue
+ *	pinned, async	=> requeue
  *	pinned, sync	=> unpin
  *	stale		=> reclaim
  *	clean		=> reclaim
- *	dirty, delwri	=> flush and requeue
+ *	dirty, async	=> requeue
  *	dirty, sync	=> flush, wait and reclaim
  */
 STATIC int
@@ -716,10 +709,8 @@ restart:
 		goto reclaim;
 	}
 	if (xfs_ipincount(ip)) {
-		if (!(sync_mode & SYNC_WAIT)) {
-			xfs_ifunlock(ip);
-			goto out;
-		}
+		if (!(sync_mode & SYNC_WAIT))
+			goto out_ifunlock;
 		xfs_iunpin_wait(ip);
 	}
 	if (xfs_iflags_test(ip, XFS_ISTALE))
@@ -728,6 +719,13 @@ restart:
 		goto reclaim;
 
 	/*
+	 * Never flush out dirty data during non-blocking reclaim, as it would
+	 * just contend with AIL pushing trying to do the same job.
+	 */
+	if (!(sync_mode & SYNC_WAIT))
+		goto out_ifunlock;
+
+	/*
 	 * Now we have an inode that needs flushing.
 	 *
 	 * We do a nonblocking flush here even if we are doing a SYNC_WAIT
@@ -745,42 +743,13 @@ restart:
 	 * pass through will see the stale flag set on the inode.
 	 */
 	error = xfs_iflush(ip, SYNC_TRYLOCK | sync_mode);
-	if (sync_mode & SYNC_WAIT) {
-		if (error == EAGAIN) {
-			xfs_iunlock(ip, XFS_ILOCK_EXCL);
-			/* backoff longer than in xfs_ifree_cluster */
-			delay(2);
-			goto restart;
-		}
-		xfs_iflock(ip);
-		goto reclaim;
-	}
-
-	/*
-	 * When we have to flush an inode but don't have SYNC_WAIT set, we
-	 * flush the inode out using a delwri buffer and wait for the next
-	 * call into reclaim to find it in a clean state instead of waiting for
-	 * it now. We also don't return errors here - if the error is transient
-	 * then the next reclaim pass will flush the inode, and if the error
-	 * is permanent then the next sync reclaim will reclaim the inode and
-	 * pass on the error.
-	 */
-	if (error && error != EAGAIN && !XFS_FORCED_SHUTDOWN(ip->i_mount)) {
-		xfs_warn(ip->i_mount,
-			"inode 0x%llx background reclaim flush failed with %d",
-			(long long)ip->i_ino, error);
+	if (error == EAGAIN) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+		/* backoff longer than in xfs_ifree_cluster */
+		delay(2);
+		goto restart;
 	}
-out:
-	xfs_iflags_clear(ip, XFS_IRECLAIM);
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	/*
-	 * We could return EAGAIN here to make reclaim rescan the inode tree in
-	 * a short while. However, this just burns CPU time scanning the tree
-	 * waiting for IO to complete and xfssyncd never goes back to the idle
-	 * state. Instead, return 0 to let the next scheduled background reclaim
-	 * attempt to reclaim the inode again.
-	 */
-	return 0;
+	xfs_iflock(ip);
 
 reclaim:
 	xfs_ifunlock(ip);
@@ -814,8 +783,21 @@ reclaim:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 
 	xfs_inode_free(ip);
-
 	return error;
+
+out_ifunlock:
+	xfs_ifunlock(ip);
+out:
+	xfs_iflags_clear(ip, XFS_IRECLAIM);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	/*
+	 * We could return EAGAIN here to make reclaim rescan the inode tree in
+	 * a short while. However, this just burns CPU time scanning the tree
+	 * waiting for IO to complete and xfssyncd never goes back to the idle
+	 * state. Instead, return 0 to let the next scheduled background reclaim
+	 * attempt to reclaim the inode again.
+	 */
+	return 0;
 }
 
 /*
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2012-04-23  5:59 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-23  5:58 [PATCH 00/37] xfs: current 3.4 patch queue Dave Chinner
2012-04-23  5:58 ` [PATCH 01/37] xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown Dave Chinner
2012-04-23  5:58 ` [PATCH 02/37] xfs: remove log item from AIL in xfs_iflush " Dave Chinner
2012-04-23 15:39   ` Mark Tinguely
2012-04-23  5:58 ` [PATCH 03/37] xfs: allow assigning the tail lsn with the AIL lock held Dave Chinner
2012-04-23  5:58 ` [PATCH 04/37] xfs: implement freezing by emptying the AIL Dave Chinner
2012-04-23 15:40   ` Mark Tinguely
2012-04-29 21:43   ` Christoph Hellwig
2012-04-23  5:58 ` Dave Chinner [this message]
2012-04-23  5:58 ` [PATCH 06/37] xfs: do not write the buffer from xfs_iflush Dave Chinner
2012-04-23  5:58 ` [PATCH 07/37] xfs: do not write the buffer from xfs_qm_dqflush Dave Chinner
2012-04-23  5:58 ` [PATCH 08/37] xfs: do not add buffers to the delwri queue until pushed Dave Chinner
2012-04-23  5:58 ` [PATCH 09/37] xfs: on-stack delayed write buffer lists Dave Chinner
2012-04-25 18:34   ` Mark Tinguely
2012-04-29 21:44   ` Christoph Hellwig
2012-04-23  5:58 ` [PATCH 10/37] xfs: remove some obsolete comments in xfs_trans_ail.c Dave Chinner
2012-04-23 15:41   ` Mark Tinguely
2012-04-23  5:58 ` [PATCH 11/37] xfs: pass shutdown method into xfs_trans_ail_delete_bulk Dave Chinner
2012-04-23  5:58 ` [PATCH 12/37] xfs: Do background CIL flushes via a workqueue Dave Chinner
2012-04-23  7:54   ` [PATCH 12/37 V2] " Dave Chinner
2012-04-29 21:46     ` Christoph Hellwig
2012-04-23  5:58 ` [PATCH 13/37] xfs: page type check in writeback only checks last buffer Dave Chinner
2012-04-23  5:58 ` [PATCH 14/37] xfs: Use preallocation for inodes with extsz hints Dave Chinner
2012-04-29 21:47   ` Christoph Hellwig
2012-04-23  5:58 ` [PATCH 15/37] xfs: fix buffer lookup race on allocation failure Dave Chinner
2012-04-23  5:58 ` [PATCH 16/37] xfs: check for buffer errors before waiting Dave Chinner
2012-04-23  5:58 ` [PATCH 17/37] xfs: fix incorrect b_offset initialisation Dave Chinner
2012-04-23  5:58 ` [PATCH 18/37] xfs: use kmem_zone_zalloc for buffers Dave Chinner
2012-04-23  5:58 ` [PATCH 19/37] xfs: clean up buffer get/read call API Dave Chinner
2012-04-23  5:58 ` [PATCH 20/37] xfs: kill b_file_offset Dave Chinner
2012-04-23  5:58 ` [PATCH 21/37] xfs: use blocks for counting length of buffers Dave Chinner
2012-04-23  5:58 ` [PATCH 22/37] xfs: use blocks for storing the desired IO size Dave Chinner
2012-04-23  5:58 ` [PATCH 23/37] xfs: kill xfs_buf_btoc Dave Chinner
2012-04-23  5:58 ` [PATCH 24/37] xfs: kill XBF_LOCK Dave Chinner
2012-04-23  5:58 ` [PATCH 25/37] xfs: kill xfs_read_buf() Dave Chinner
2012-04-23  5:58 ` [PATCH 26/37] xfs: kill XBF_DONTBLOCK Dave Chinner
2012-04-23  5:58 ` [PATCH 27/37] xfs: use iolock on XFS_IOC_ALLOCSP calls Dave Chinner
2012-04-23  5:58 ` [PATCH 28/37] xfs: move xfsagino_t to xfs_types.h Dave Chinner
2012-04-23 15:43   ` Mark Tinguely
2012-04-24 15:10   ` Mark Tinguely
2012-04-29 21:49   ` Christoph Hellwig
2012-04-30  0:32     ` Dave Chinner
2012-04-23  5:58 ` [PATCH 29/37] xfs: move busy extent handling to it's own file Dave Chinner
2012-04-23 17:57   ` Ben Myers
2012-04-24  0:25     ` [PATCH 29/37 V2] " Dave Chinner
2012-04-24 15:56       ` Mark Tinguely
2012-04-24 18:10         ` Mark Tinguely
2012-04-29 10:39           ` [PATCH 29/37 V3] " Dave Chinner
2012-04-29 21:50             ` Christoph Hellwig
2012-04-30  0:36               ` Dave Chinner
2012-04-30  2:17                 ` Dave Chinner
2012-04-23  5:59 ` [PATCH 30/37] xfs: clean up busy extent naming Dave Chinner
2012-04-24 18:11   ` Mark Tinguely
2012-04-29 10:41     ` [PATCH 30/37 V2] " Dave Chinner
2012-04-29 21:50       ` Christoph Hellwig
2012-04-23  5:59 ` [PATCH 31/37] xfs: move xfs_fsb_to_db to xfs_bmap.h Dave Chinner
2012-04-24 19:24   ` Mark Tinguely
2012-04-29 21:53   ` Christoph Hellwig
2012-04-30  2:31     ` Dave Chinner
2012-04-23  5:59 ` [PATCH 32/37] xfs: move xfs_get_extsz_hint() and kill xfs_rw.h Dave Chinner
2012-04-24 19:30   ` Mark Tinguely
2012-04-29 21:53   ` Christoph Hellwig
2012-04-23  5:59 ` [PATCH 33/37] xfs: move xfs_do_force_shutdown() and kill xfs_rw.c Dave Chinner
2012-04-24 19:37   ` Mark Tinguely
2012-04-29 21:54   ` Christoph Hellwig
2012-04-30  2:38     ` Dave Chinner
2012-04-23  5:59 ` [PATCH 34/37] xfs: clean up xfs_bit.h includes Dave Chinner
2012-04-24 19:44   ` Mark Tinguely
2012-04-29 21:55   ` Christoph Hellwig
2012-04-30  2:40     ` Dave Chinner
2012-04-23  5:59 ` [PATCH 35/37] xfs: Properly exclude IO type flags from buffer flags Dave Chinner
2012-04-24 20:02   ` Mark Tinguely
2012-04-29 21:55   ` Christoph Hellwig
2012-04-23  5:59 ` [PATCH 36/37] xfs: flush outstanding buffers on log mount failure Dave Chinner
2012-04-23 15:47   ` Mark Tinguely
2012-04-29 21:55   ` Christoph Hellwig
2012-04-23  5:59 ` [PATCH 37/37] xfs: make XBF_MAPPED the default behaviour Dave Chinner
2012-04-25 18:35   ` Mark Tinguely
2012-04-25 20:09   ` Mark Tinguely
2012-04-25 22:33     ` Dave Chinner
2012-04-29 21:57   ` Christoph Hellwig
2012-04-30  2:45     ` Dave Chinner
2012-04-23 18:01 ` [PATCH 00/37] xfs: current 3.4 patch queue Ben Myers
2012-04-23 23:29   ` Dave Chinner
2012-04-30 14:24     ` Ben Myers
2012-04-28  2:15 ` Ben Myers
2012-04-28 21:28   ` Ben Myers
2012-04-29  0:21     ` Dave Chinner
2012-04-29  0:14   ` Dave Chinner
2012-04-30 14:44     ` Ben Myers
2012-04-30 23:04       ` Dave Chinner
2012-04-30 14:32 ` Assertion failed: RB_EMPTY_NODE(&bp->b_rbnode) Ben Myers
2012-04-30 23:12   ` Dave Chinner
2012-04-30 14:34 ` [PATCH 00/37] xfs: current 3.4 patch queue Ben Myers
2012-04-30 23:20   ` Dave Chinner
2012-04-30 19:25 ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1335160747-17254-6-git-send-email-david@fromorbit.com \
    --to=david@fromorbit.com \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox