public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] XFS: Use delayed write for inodes rather than async
  2010-01-02  3:03 [RFC, PATCH 0/3] Kill async inode writeback Dave Chinner
@ 2010-01-02  3:03 ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-01-02  3:03 UTC (permalink / raw)
  To: xfs

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them.

Make the inode flush delayed write instead of asynchronous which
will push the inode into the backing buffer but not issue any IO.
The buffer will then sit in cache to be flushed by either an AIL
push or the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
A future path will address this non-optimal form of writeback, too.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    2 +-
 fs/xfs/linux-2.6/xfs_sync.c  |   52 +++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_inode.c           |   31 ++++---------------------
 fs/xfs/xfs_inode.h           |    8 ++----
 fs/xfs/xfs_inode_item.c      |   10 +++++--
 fs/xfs/xfs_mount.c           |    3 +-
 6 files changed, 54 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 84ce77a..752fa10 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1074,7 +1074,7 @@ xfs_fs_write_inode(
 		if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
 			goto out_unlock;
 
-		error = xfs_iflush(ip, XFS_IFLUSH_ASYNC_NOBLOCK);
+		error = xfs_iflush(ip, XFS_IFLUSH_DELWRI_NOBLOCK);
 	}
 
  out_unlock:
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index c980d68..f974d1a 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -460,8 +460,8 @@ xfs_quiesce_fs(
 {
 	int	count = 0, pincount;
 
+	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 
 	/*
 	 * This loop must run at least twice.  The first instance of the loop
@@ -585,7 +585,7 @@ xfs_sync_worker(
 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
 		/* dgc: errors ignored here */
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
@@ -687,7 +687,7 @@ xfs_reclaim_inode(
 		spin_unlock(&ip->i_flags_lock);
 		write_unlock(&pag->pag_ici_lock);
 		xfs_perag_put(pag);
-		return -EAGAIN;
+		return EAGAIN;
 	}
 	__xfs_iflags_set(ip, XFS_IRECLAIM);
 	spin_unlock(&ip->i_flags_lock);
@@ -695,32 +695,52 @@ xfs_reclaim_inode(
 	xfs_perag_put(pag);
 
 	/*
-	 * If the inode is still dirty, then flush it out.  If the inode
-	 * is not in the AIL, then it will be OK to flush it delwri as
-	 * long as xfs_iflush() does not keep any references to the inode.
-	 * We leave that decision up to xfs_iflush() since it has the
-	 * knowledge of whether it's OK to simply do a delwri flush of
-	 * the inode or whether we need to wait until the inode is
-	 * pulled from the AIL.
-	 * We get the flush lock regardless, though, just to make sure
-	 * we don't free it while it is being flushed.
+	 * The inode is flushed delayed write. That means the flush lock
+	 * may be held here and we will block for some time on it. Further,
+	 * if we hold the inode lock, we prevent the AIL from locking and
+	 * therefore being able to push the buffer. This means that we'll end
+	 * up waiting here for the xfsbufd to age the buffer and write it out,
+	 * which could be a long time. If we fail to get the flush lock, just
+	 * clear the reclaim in progress state (we haven't cleared the reclaim
+	 * needed state) so that the reclaim is delayed until the flush lock
+	 * can be gained without blocking.
 	 */
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_iflock(ip);
+	if (!xfs_iflock_nowait(ip))
+		goto unlock_and_requeue;
 
 	/*
 	 * In the case of a forced shutdown we rely on xfs_iflush() to
 	 * wait for the inode to be unpinned before returning an error.
 	 */
-	if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
-		/* synchronize with xfs_iflush_done */
-		xfs_iflock(ip);
+	if (!is_bad_inode(VFS_I(ip)) && !xfs_inode_clean(ip)) {
+		/*
+		 * If we are flushing a dirty inode DELWRI, then don't
+		 * immediately wait on the flush lock - requeue the inode for
+		 * reclaim. Every time we re-enter and the flush lock is still
+		 * held we will requeue at the initial flush lock check above.
+		 * Otherwise, for synchronous writeback we synchronize with
+		 * xfs_iflush_done by locking and unlocking the flush lock.
+		 */
+		if (xfs_iflush(ip, sync_mode) == 0) {
+			if (sync_mode == XFS_IFLUSH_DELWRI)
+				goto unlock_and_requeue;
+			xfs_iflock(ip);
+			xfs_ifunlock(ip);
+		}
+	} else {
+		/* need to unlock the clean inodes */
 		xfs_ifunlock(ip);
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_ireclaim(ip);
 	return 0;
+
+unlock_and_requeue:
+	xfs_iflags_clear(ip, XFS_IRECLAIM);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return EAGAIN;
 }
 
 void
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e5c9953..d175dca 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2832,7 +2832,7 @@ xfs_iflush(
 	xfs_dinode_t		*dip;
 	xfs_mount_t		*mp;
 	int			error;
-	int			noblock = (flags == XFS_IFLUSH_ASYNC_NOBLOCK);
+	int			noblock = (flags == XFS_IFLUSH_DELWRI_NOBLOCK);
 	enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) };
 
 	XFS_STATS_INC(xs_iflush_count);
@@ -2905,12 +2905,8 @@ xfs_iflush(
 		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
 			flags = 0;
 			break;
-		case XFS_IFLUSH_ASYNC_NOBLOCK:
-		case XFS_IFLUSH_ASYNC:
-		case XFS_IFLUSH_DELWRI_ELSE_ASYNC:
-			flags = INT_ASYNC;
-			break;
 		case XFS_IFLUSH_DELWRI:
+		case XFS_IFLUSH_DELWRI_NOBLOCK:
 			flags = INT_DELWRI;
 			break;
 		default:
@@ -2920,15 +2916,11 @@ xfs_iflush(
 		}
 	} else {
 		switch (flags) {
+		case XFS_IFLUSH_DELWRI_NOBLOCK:
 		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
-		case XFS_IFLUSH_DELWRI_ELSE_ASYNC:
 		case XFS_IFLUSH_DELWRI:
 			flags = INT_DELWRI;
 			break;
-		case XFS_IFLUSH_ASYNC_NOBLOCK:
-		case XFS_IFLUSH_ASYNC:
-			flags = INT_ASYNC;
-			break;
 		case XFS_IFLUSH_SYNC:
 			flags = 0;
 			break;
@@ -2971,13 +2963,10 @@ xfs_iflush(
 	if (error)
 		goto cluster_corrupt_out;
 
-	if (flags & INT_DELWRI) {
+	if (flags & INT_DELWRI)
 		xfs_bdwrite(mp, bp);
-	} else if (flags & INT_ASYNC) {
-		error = xfs_bawrite(mp, bp);
-	} else {
+	else
 		error = xfs_bwrite(mp, bp);
-	}
 	return error;
 
 corrupt_out:
@@ -3012,16 +3001,6 @@ xfs_iflush_int(
 	iip = ip->i_itemp;
 	mp = ip->i_mount;
 
-
-	/*
-	 * If the inode isn't dirty, then just release the inode
-	 * flush lock and do nothing.
-	 */
-	if (xfs_inode_clean(ip)) {
-		xfs_ifunlock(ip);
-		return 0;
-	}
-
 	/* set *dip = inode's place in the buffer */
 	dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_imap.im_boffset);
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ec1f28c..559feeb 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -423,11 +423,9 @@ static inline void xfs_ifunlock(xfs_inode_t *ip)
  * Flags for xfs_iflush()
  */
 #define	XFS_IFLUSH_DELWRI_ELSE_SYNC	1
-#define	XFS_IFLUSH_DELWRI_ELSE_ASYNC	2
-#define	XFS_IFLUSH_SYNC			3
-#define	XFS_IFLUSH_ASYNC		4
-#define	XFS_IFLUSH_DELWRI		5
-#define	XFS_IFLUSH_ASYNC_NOBLOCK	6
+#define	XFS_IFLUSH_SYNC			2
+#define	XFS_IFLUSH_DELWRI		3
+#define	XFS_IFLUSH_DELWRI_NOBLOCK	4
 
 /*
  * Flags for xfs_itruncate_start().
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f38855d..beb7d9f 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -867,10 +867,14 @@ xfs_inode_item_push(
 	       iip->ili_format.ilf_fields != 0);
 
 	/*
-	 * Write out the inode.  The completion routine ('iflush_done') will
-	 * pull it from the AIL, mark it clean, unlock the flush lock.
+	 * Push the inode to it's backing buffer. This will not remove
+	 * the inode from the AIL - a further push will be required to trigger
+	 * a buffer push. However, this allows all the dirty inodes to be pushed to
+	 * the buffer before it is pushed to disk. THe buffer IO completion
+	 * will pull th einode from the AIL, mark it clean and unlock the flush
+	 * lock.
 	 */
-	(void) xfs_iflush(ip, XFS_IFLUSH_ASYNC);
+	(void) xfs_iflush(ip, XFS_IFLUSH_DELWRI);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	return;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 223d9c3..f3ce47a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1444,7 +1444,8 @@ xfs_unmountfs(
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_ASYNC);
+	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
+	XFS_bflush(mp->m_ddev_targp);
 
 	xfs_qm_unmount(mp);
 
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 0/3] Kill async inode writeback V2
@ 2010-01-05  0:04 Dave Chinner
  2010-01-05  0:04 ` [PATCH 1/3] xfs: Use delayed write for inodes rather than async Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Dave Chinner @ 2010-01-05  0:04 UTC (permalink / raw)
  To: xfs

Currently we do background inode writeback on demand from many
different places - xfssyncd, xfsbufd, xfsaild and the bdi writeback
threads.  The result is that inodes can be pushed at any time and
there is little to no locality to the IO patterns results from such
writeback. Indeed, we can have completing writebacks occurring which
only serves to slow down writeback.

The idea behind this series is to make metadata buffers get
written from xfsbufd via the delayed write queue rather than than from
all these other places. All the other places do is make the buffers
delayed write so that the xfsbufd can issue them.

This means that inode flushes can no longer happen asynchronously,
but we still need a method for ensuring timely dispatch of buffers
that we may be waiting for IO completion on. To do this, we allow
delayed write buffers to be "promoted" in the delayed write queue.
This effectively short-cuts the aging of the buffers, and combined
with a demand flush of the xfsbufd we push all aged and promoted
buffers out at the same time.

Combine this with sorting the delayed write buffers to be written
into disk offset order before dispatch, and we vastly improve the
IO patterns for metadata writeback. IO is issued from one place and
in a disk/elevator friendly order.

Version 2:
- use generic list sort function
- when unmounting, push the delwri buffers first, then do sync inode
  reclaim so that reclaim doesn't block for 15 seconds waiting for
  delwri inode buffers to be aged and written before the inodes can
  be reclaimed.

Perf results (average of 3 runs) on a debug XFS build (means allocation
patterns are randomly varied, so runtimes are also a bit variable):

Untar 2.6.32 kernel tarball, sync, then remove:

                  Untar+sync     rm -rf
xfs-dev:           25.2s          13.0s
xfs-dev-delwri-1:  22.5s           9.1s
xfs-dev-delwri-2:  21.9s           8.4s

4 processes each creating 100,000, five byte files in separate
directories concurrently, then 4 processes removing a directory each
concurrently.

                  create          rm -rf
xfs-dev:           8m32s           4m10s
xfs-dev-delwri-1:  4m55s           3m42s
xfs-dev-delwri-2:  4m56s           3m33s

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/3] xfs: Use delayed write for inodes rather than async
  2010-01-05  0:04 [PATCH 0/3] Kill async inode writeback V2 Dave Chinner
@ 2010-01-05  0:04 ` Dave Chinner
  2010-01-08 10:36   ` Christoph Hellwig
  2010-01-05  0:04 ` [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-01-05  0:04 UTC (permalink / raw)
  To: xfs

We currently do background inode flush asynchronously, resulting in
inodes being written in whatever order the background writeback
issues them.

Make the inode flush delayed write instead of asynchronous which
will push the inode into the backing buffer but not issue any IO.
The buffer will then sit in cache to be flushed by either an AIL
push or the xfsbufd timing out the buffer. This will allow
accumulation of dirty inode buffers in memory and allow optimisation
of inode cluster writeback at the xfsbufd level where we have much
greater queue depths than the block layer elevators.

This effectively means that any inode that is written back by
background writeback will be seen as flush locked during AIL
pushing, and will result in the buffers being pushed from there.
A future path will address this non-optimal form of writeback, too.

A side effect of this delayed write mechanism is that background
inode reclaim will no longer directly flush inodes, nor can it wait
on the flush lock. The result is that inode reclaim must leave the
inode in the reclaimable state until it is clean. Hence attempts to
reclaim a dirty inode in the background will simply skip the inode
until it is clean and this allows other mechanisms (i.e. xfsbufd) to
do more optimal writeback of the dirty buffers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_super.c |    2 +-
 fs/xfs/linux-2.6/xfs_sync.c  |   52 +++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_inode.c           |   31 ++++---------------------
 fs/xfs/xfs_inode.h           |    8 ++----
 fs/xfs/xfs_inode_item.c      |   10 +++++--
 fs/xfs/xfs_mount.c           |    9 ++++++-
 6 files changed, 60 insertions(+), 52 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 0a4fd0e..f3dd67d 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1074,7 +1074,7 @@ xfs_fs_write_inode(
 		if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
 			goto out_unlock;
 
-		error = xfs_iflush(ip, XFS_IFLUSH_ASYNC_NOBLOCK);
+		error = xfs_iflush(ip, XFS_IFLUSH_DELWRI_NOBLOCK);
 	}
 
  out_unlock:
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index c980d68..f974d1a 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -460,8 +460,8 @@ xfs_quiesce_fs(
 {
 	int	count = 0, pincount;
 
+	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 
 	/*
 	 * This loop must run at least twice.  The first instance of the loop
@@ -585,7 +585,7 @@ xfs_sync_worker(
 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
+		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
 		/* dgc: errors ignored here */
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
@@ -687,7 +687,7 @@ xfs_reclaim_inode(
 		spin_unlock(&ip->i_flags_lock);
 		write_unlock(&pag->pag_ici_lock);
 		xfs_perag_put(pag);
-		return -EAGAIN;
+		return EAGAIN;
 	}
 	__xfs_iflags_set(ip, XFS_IRECLAIM);
 	spin_unlock(&ip->i_flags_lock);
@@ -695,32 +695,52 @@ xfs_reclaim_inode(
 	xfs_perag_put(pag);
 
 	/*
-	 * If the inode is still dirty, then flush it out.  If the inode
-	 * is not in the AIL, then it will be OK to flush it delwri as
-	 * long as xfs_iflush() does not keep any references to the inode.
-	 * We leave that decision up to xfs_iflush() since it has the
-	 * knowledge of whether it's OK to simply do a delwri flush of
-	 * the inode or whether we need to wait until the inode is
-	 * pulled from the AIL.
-	 * We get the flush lock regardless, though, just to make sure
-	 * we don't free it while it is being flushed.
+	 * The inode is flushed delayed write. That means the flush lock
+	 * may be held here and we will block for some time on it. Further,
+	 * if we hold the inode lock, we prevent the AIL from locking and
+	 * therefore being able to push the buffer. This means that we'll end
+	 * up waiting here for the xfsbufd to age the buffer and write it out,
+	 * which could be a long time. If we fail to get the flush lock, just
+	 * clear the reclaim in progress state (we haven't cleared the reclaim
+	 * needed state) so that the reclaim is delayed until the flush lock
+	 * can be gained without blocking.
 	 */
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
-	xfs_iflock(ip);
+	if (!xfs_iflock_nowait(ip))
+		goto unlock_and_requeue;
 
 	/*
 	 * In the case of a forced shutdown we rely on xfs_iflush() to
 	 * wait for the inode to be unpinned before returning an error.
 	 */
-	if (!is_bad_inode(VFS_I(ip)) && xfs_iflush(ip, sync_mode) == 0) {
-		/* synchronize with xfs_iflush_done */
-		xfs_iflock(ip);
+	if (!is_bad_inode(VFS_I(ip)) && !xfs_inode_clean(ip)) {
+		/*
+		 * If we are flushing a dirty inode DELWRI, then don't
+		 * immediately wait on the flush lock - requeue the inode for
+		 * reclaim. Every time we re-enter and the flush lock is still
+		 * held we will requeue at the initial flush lock check above.
+		 * Otherwise, for synchronous writeback we synchronize with
+		 * xfs_iflush_done by locking and unlocking the flush lock.
+		 */
+		if (xfs_iflush(ip, sync_mode) == 0) {
+			if (sync_mode == XFS_IFLUSH_DELWRI)
+				goto unlock_and_requeue;
+			xfs_iflock(ip);
+			xfs_ifunlock(ip);
+		}
+	} else {
+		/* need to unlock the clean inodes */
 		xfs_ifunlock(ip);
 	}
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_ireclaim(ip);
 	return 0;
+
+unlock_and_requeue:
+	xfs_iflags_clear(ip, XFS_IRECLAIM);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return EAGAIN;
 }
 
 void
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e5c9953..d175dca 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2832,7 +2832,7 @@ xfs_iflush(
 	xfs_dinode_t		*dip;
 	xfs_mount_t		*mp;
 	int			error;
-	int			noblock = (flags == XFS_IFLUSH_ASYNC_NOBLOCK);
+	int			noblock = (flags == XFS_IFLUSH_DELWRI_NOBLOCK);
 	enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) };
 
 	XFS_STATS_INC(xs_iflush_count);
@@ -2905,12 +2905,8 @@ xfs_iflush(
 		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
 			flags = 0;
 			break;
-		case XFS_IFLUSH_ASYNC_NOBLOCK:
-		case XFS_IFLUSH_ASYNC:
-		case XFS_IFLUSH_DELWRI_ELSE_ASYNC:
-			flags = INT_ASYNC;
-			break;
 		case XFS_IFLUSH_DELWRI:
+		case XFS_IFLUSH_DELWRI_NOBLOCK:
 			flags = INT_DELWRI;
 			break;
 		default:
@@ -2920,15 +2916,11 @@ xfs_iflush(
 		}
 	} else {
 		switch (flags) {
+		case XFS_IFLUSH_DELWRI_NOBLOCK:
 		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
-		case XFS_IFLUSH_DELWRI_ELSE_ASYNC:
 		case XFS_IFLUSH_DELWRI:
 			flags = INT_DELWRI;
 			break;
-		case XFS_IFLUSH_ASYNC_NOBLOCK:
-		case XFS_IFLUSH_ASYNC:
-			flags = INT_ASYNC;
-			break;
 		case XFS_IFLUSH_SYNC:
 			flags = 0;
 			break;
@@ -2971,13 +2963,10 @@ xfs_iflush(
 	if (error)
 		goto cluster_corrupt_out;
 
-	if (flags & INT_DELWRI) {
+	if (flags & INT_DELWRI)
 		xfs_bdwrite(mp, bp);
-	} else if (flags & INT_ASYNC) {
-		error = xfs_bawrite(mp, bp);
-	} else {
+	else
 		error = xfs_bwrite(mp, bp);
-	}
 	return error;
 
 corrupt_out:
@@ -3012,16 +3001,6 @@ xfs_iflush_int(
 	iip = ip->i_itemp;
 	mp = ip->i_mount;
 
-
-	/*
-	 * If the inode isn't dirty, then just release the inode
-	 * flush lock and do nothing.
-	 */
-	if (xfs_inode_clean(ip)) {
-		xfs_ifunlock(ip);
-		return 0;
-	}
-
 	/* set *dip = inode's place in the buffer */
 	dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_imap.im_boffset);
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index ec1f28c..559feeb 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -423,11 +423,9 @@ static inline void xfs_ifunlock(xfs_inode_t *ip)
  * Flags for xfs_iflush()
  */
 #define	XFS_IFLUSH_DELWRI_ELSE_SYNC	1
-#define	XFS_IFLUSH_DELWRI_ELSE_ASYNC	2
-#define	XFS_IFLUSH_SYNC			3
-#define	XFS_IFLUSH_ASYNC		4
-#define	XFS_IFLUSH_DELWRI		5
-#define	XFS_IFLUSH_ASYNC_NOBLOCK	6
+#define	XFS_IFLUSH_SYNC			2
+#define	XFS_IFLUSH_DELWRI		3
+#define	XFS_IFLUSH_DELWRI_NOBLOCK	4
 
 /*
  * Flags for xfs_itruncate_start().
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index f38855d..beb7d9f 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -867,10 +867,14 @@ xfs_inode_item_push(
 	       iip->ili_format.ilf_fields != 0);
 
 	/*
-	 * Write out the inode.  The completion routine ('iflush_done') will
-	 * pull it from the AIL, mark it clean, unlock the flush lock.
+	 * Push the inode to it's backing buffer. This will not remove
+	 * the inode from the AIL - a further push will be required to trigger
+	 * a buffer push. However, this allows all the dirty inodes to be pushed to
+	 * the buffer before it is pushed to disk. THe buffer IO completion
+	 * will pull th einode from the AIL, mark it clean and unlock the flush
+	 * lock.
 	 */
-	(void) xfs_iflush(ip, XFS_IFLUSH_ASYNC);
+	(void) xfs_iflush(ip, XFS_IFLUSH_DELWRI);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	return;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 223d9c3..16c4654 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1444,7 +1444,14 @@ xfs_unmountfs(
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_ASYNC);
+
+	/*
+	 * flush the delwri buffers before the reclaim so that it doesn't
+	 * block for a long time waiting to reclaim inodes that are already
+	 * in the delwri state.
+	 */
+	XFS_bflush(mp->m_ddev_targp);
+	xfs_reclaim_inodes(mp, XFS_IFLUSH_SYNC);
 
 	xfs_qm_unmount(mp);
 
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push
  2010-01-05  0:04 [PATCH 0/3] Kill async inode writeback V2 Dave Chinner
  2010-01-05  0:04 ` [PATCH 1/3] xfs: Use delayed write for inodes rather than async Dave Chinner
@ 2010-01-05  0:04 ` Dave Chinner
  2010-01-08 11:07   ` Christoph Hellwig
  2010-01-05  0:04 ` [PATCH 3/3] xfs: Sort delayed write buffers before dispatch Dave Chinner
  2010-01-06 18:08 ` [PATCH 0/3] Kill async inode writeback V2 Christoph Hellwig
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-01-05  0:04 UTC (permalink / raw)
  To: xfs

All buffers logged into the AIL are marked as delayed write.
When the AIL needs to push the buffer out, it issues an async write of the
buffer. This means that IO patterns are dependent on the order of
buffers in the AIL.

Instead of flushing the buffer, promote the buffer in the delayed
write list so that the next time the xfsbufd is run the buffer will
be flushed by the xfsbufd. Return the state to the xfsaild that the
buffer was promoted so that the xfsaild knows that it needs to cause
the xfsbufd to run to flush the buffers that were promoted.

Using the xfsbufd for issuing the IO allows us to dispatch all
buffer IO from the one queue. This means that we can make much more
enlightened decisions on what order to flush buffers to disk as
we don't have multiple places issuing IO. Optimisations to xfsbufd
will be in a future patch.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_buf.c    |   22 +++++++++
 fs/xfs/linux-2.6/xfs_buf.h    |   11 +++++
 fs/xfs/linux-2.6/xfs_trace.h  |    1 +
 fs/xfs/quota/xfs_dquot_item.c |   87 +++++------------------------------
 fs/xfs/quota/xfs_dquot_item.h |    4 --
 fs/xfs/xfs_buf_item.c         |   64 ++++++++++++++------------
 fs/xfs/xfs_inode_item.c       |   99 ++++++----------------------------------
 fs/xfs/xfs_inode_item.h       |    6 ---
 fs/xfs/xfs_trans_ail.c        |    7 +++
 9 files changed, 104 insertions(+), 197 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index 759cbaf..aaefc33 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -1631,6 +1631,28 @@ xfs_buf_delwri_dequeue(
 	trace_xfs_buf_delwri_dequeue(bp, _RET_IP_);
 }
 
+/*
+ * If a delwri buffer needs to be pushed before it has aged out, then
+ * promote it to the head of the delwri queue so that it will be flushed
+ * on the next xfsbufd run.
+ */
+void
+xfs_buf_delwri_promote(
+	xfs_buf_t	*bp)
+{
+	struct list_head *dwq = &bp->b_target->bt_delwrite_queue;
+	spinlock_t	*dwlk = &bp->b_target->bt_delwrite_lock;
+	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1;
+
+	spin_lock(dwlk);
+	ASSERT(bp->b_flags & XBF_DELWRI);
+	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
+	list_del(&bp->b_list);
+	list_add(&bp->b_list, dwq);
+	bp->b_queuetime = jiffies - age;
+	spin_unlock(dwlk);
+}
+
 STATIC void
 xfs_buf_runall_queues(
 	struct workqueue_struct	*queue)
diff --git a/fs/xfs/linux-2.6/xfs_buf.h b/fs/xfs/linux-2.6/xfs_buf.h
index a34c7b5..a7c6895 100644
--- a/fs/xfs/linux-2.6/xfs_buf.h
+++ b/fs/xfs/linux-2.6/xfs_buf.h
@@ -261,6 +261,7 @@ extern int xfs_buf_ispin(xfs_buf_t *);
 
 /* Delayed Write Buffer Routines */
 extern void xfs_buf_delwri_dequeue(xfs_buf_t *);
+extern void xfs_buf_delwri_promote(xfs_buf_t *);
 
 /* Buffer Daemon Setup Routines */
 extern int xfs_buf_init(void);
@@ -424,6 +425,16 @@ extern void xfs_free_buftarg(struct xfs_mount *, struct xfs_buftarg *);
 extern void xfs_wait_buftarg(xfs_buftarg_t *);
 extern int xfs_setsize_buftarg(xfs_buftarg_t *, unsigned int, unsigned int);
 extern int xfs_flush_buftarg(xfs_buftarg_t *, int);
+
+/*
+ * run the xfsbufd on demand to age buffers. Use in combination with
+ * xfs_buf_delwri_promote() to flus delayed write buffers efficiently.
+ */
+static inline void xfs_flush_buftarg_delwri(xfs_buftarg_t *btp)
+{
+	wake_up_process(btp->bt_task);
+}
+
 #ifdef CONFIG_KDB_MODULES
 extern struct list_head *xfs_get_buftarg_list(void);
 #endif
diff --git a/fs/xfs/linux-2.6/xfs_trace.h b/fs/xfs/linux-2.6/xfs_trace.h
index 2b0819a..bba87b7 100644
--- a/fs/xfs/linux-2.6/xfs_trace.h
+++ b/fs/xfs/linux-2.6/xfs_trace.h
@@ -464,6 +464,7 @@ DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unlock);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_unlock_stale);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_committed);
 DEFINE_BUF_ITEM_EVENT(xfs_buf_item_push);
+DEFINE_BUF_ITEM_EVENT(xfs_buf_item_pushbuf);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_get_buf);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_get_buf_recur);
 DEFINE_BUF_ITEM_EVENT(xfs_trans_getsb);
diff --git a/fs/xfs/quota/xfs_dquot_item.c b/fs/xfs/quota/xfs_dquot_item.c
index d0d4a9a..bc7e00e 100644
--- a/fs/xfs/quota/xfs_dquot_item.c
+++ b/fs/xfs/quota/xfs_dquot_item.c
@@ -212,68 +212,33 @@ xfs_qm_dquot_logitem_pushbuf(
 	xfs_dquot_t	*dqp;
 	xfs_mount_t	*mp;
 	xfs_buf_t	*bp;
-	uint		dopush;
 
 	dqp = qip->qli_dquot;
 	ASSERT(XFS_DQ_IS_LOCKED(dqp));
 
 	/*
-	 * The qli_pushbuf_flag keeps others from
-	 * trying to duplicate our effort.
-	 */
-	ASSERT(qip->qli_pushbuf_flag != 0);
-	ASSERT(qip->qli_push_owner == current_pid());
-
-	/*
 	 * If flushlock isn't locked anymore, chances are that the
 	 * inode flush completed and the inode was taken off the AIL.
 	 * So, just get out.
 	 */
 	if (completion_done(&dqp->q_flush)  ||
 	    ((qip->qli_item.li_flags & XFS_LI_IN_AIL) == 0)) {
-		qip->qli_pushbuf_flag = 0;
 		xfs_dqunlock(dqp);
 		return;
 	}
 	mp = dqp->q_mount;
 	bp = xfs_incore(mp->m_ddev_targp, qip->qli_format.qlf_blkno,
-		    XFS_QI_DQCHUNKLEN(mp),
-		    XFS_INCORE_TRYLOCK);
-	if (bp != NULL) {
-		if (XFS_BUF_ISDELAYWRITE(bp)) {
-			dopush = ((qip->qli_item.li_flags & XFS_LI_IN_AIL) &&
-				  !completion_done(&dqp->q_flush));
-			qip->qli_pushbuf_flag = 0;
-			xfs_dqunlock(dqp);
-
-			if (XFS_BUF_ISPINNED(bp)) {
-				xfs_log_force(mp, (xfs_lsn_t)0,
-					      XFS_LOG_FORCE);
-			}
-			if (dopush) {
-				int	error;
-#ifdef XFSRACEDEBUG
-				delay_for_intr();
-				delay(300);
-#endif
-				error = xfs_bawrite(mp, bp);
-				if (error)
-					xfs_fs_cmn_err(CE_WARN, mp,
-	"xfs_qm_dquot_logitem_pushbuf: pushbuf error %d on qip %p, bp %p",
-							error, qip, bp);
-			} else {
-				xfs_buf_relse(bp);
-			}
-		} else {
-			qip->qli_pushbuf_flag = 0;
-			xfs_dqunlock(dqp);
-			xfs_buf_relse(bp);
-		}
+		    XFS_QI_DQCHUNKLEN(mp), XFS_INCORE_TRYLOCK);
+	if (!bp) {
+		xfs_dqunlock(dqp);
 		return;
 	}
 
-	qip->qli_pushbuf_flag = 0;
 	xfs_dqunlock(dqp);
+	if (XFS_BUF_ISDELAYWRITE(bp))
+		xfs_buf_delwri_promote(bp);
+	xfs_buf_relse(bp);
+	return;
 }
 
 /*
@@ -291,50 +256,24 @@ xfs_qm_dquot_logitem_trylock(
 	xfs_dq_logitem_t	*qip)
 {
 	xfs_dquot_t		*dqp;
-	uint			retval;
 
 	dqp = qip->qli_dquot;
 	if (atomic_read(&dqp->q_pincount) > 0)
-		return (XFS_ITEM_PINNED);
+		return XFS_ITEM_PINNED;
 
 	if (! xfs_qm_dqlock_nowait(dqp))
-		return (XFS_ITEM_LOCKED);
+		return XFS_ITEM_LOCKED;
 
-	retval = XFS_ITEM_SUCCESS;
 	if (!xfs_dqflock_nowait(dqp)) {
 		/*
-		 * The dquot is already being flushed.	It may have been
-		 * flushed delayed write, however, and we don't want to
-		 * get stuck waiting for that to complete.  So, we want to check
-		 * to see if we can lock the dquot's buffer without sleeping.
-		 * If we can and it is marked for delayed write, then we
-		 * hold it and send it out from the push routine.  We don't
-		 * want to do that now since we might sleep in the device
-		 * strategy routine.  We also don't want to grab the buffer lock
-		 * here because we'd like not to call into the buffer cache
-		 * while holding the AIL lock.
-		 * Make sure to only return PUSHBUF if we set pushbuf_flag
-		 * ourselves.  If someone else is doing it then we don't
-		 * want to go to the push routine and duplicate their efforts.
+		 * dquot has already been flushed to the backing buffer,
+		 * leave it locked, pushbuf routine will unlock it.
 		 */
-		if (qip->qli_pushbuf_flag == 0) {
-			qip->qli_pushbuf_flag = 1;
-			ASSERT(qip->qli_format.qlf_blkno == dqp->q_blkno);
-#ifdef DEBUG
-			qip->qli_push_owner = current_pid();
-#endif
-			/*
-			 * The dquot is left locked.
-			 */
-			retval = XFS_ITEM_PUSHBUF;
-		} else {
-			retval = XFS_ITEM_FLUSHING;
-			xfs_dqunlock_nonotify(dqp);
-		}
+		return XFS_ITEM_PUSHBUF;
 	}
 
 	ASSERT(qip->qli_item.li_flags & XFS_LI_IN_AIL);
-	return (retval);
+	return XFS_ITEM_SUCCESS;
 }
 
 
diff --git a/fs/xfs/quota/xfs_dquot_item.h b/fs/xfs/quota/xfs_dquot_item.h
index 5a63253..5acae2a 100644
--- a/fs/xfs/quota/xfs_dquot_item.h
+++ b/fs/xfs/quota/xfs_dquot_item.h
@@ -27,10 +27,6 @@ typedef struct xfs_dq_logitem {
 	xfs_log_item_t		 qli_item;	   /* common portion */
 	struct xfs_dquot	*qli_dquot;	   /* dquot ptr */
 	xfs_lsn_t		 qli_flush_lsn;	   /* lsn at last flush */
-	unsigned short		 qli_pushbuf_flag; /* 1 bit used in push_ail */
-#ifdef DEBUG
-	uint64_t		 qli_push_owner;
-#endif
 	xfs_dq_logformat_t	 qli_format;	   /* logged structure */
 } xfs_dq_logitem_t;
 
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index a30f7e9..0f30250 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -467,8 +467,10 @@ xfs_buf_item_unpin_remove(
 /*
  * This is called to attempt to lock the buffer associated with this
  * buf log item.  Don't sleep on the buffer lock.  If we can't get
- * the lock right away, return 0.  If we can get the lock, pull the
- * buffer from the free list, mark it busy, and return 1.
+ * the lock right away, return 0.  If we can get the lock, take a
+ * reference to the buffer. If this is a delayed write buffer that
+ * needs AIL help to be written back, invoke the pushbuf routine
+ * rather than the normal success path.
  */
 STATIC uint
 xfs_buf_item_trylock(
@@ -477,24 +479,18 @@ xfs_buf_item_trylock(
 	xfs_buf_t	*bp;
 
 	bp = bip->bli_buf;
-
-	if (XFS_BUF_ISPINNED(bp)) {
+	if (XFS_BUF_ISPINNED(bp))
 		return XFS_ITEM_PINNED;
-	}
-
-	if (!XFS_BUF_CPSEMA(bp)) {
+	if (!XFS_BUF_CPSEMA(bp))
 		return XFS_ITEM_LOCKED;
-	}
 
-	/*
-	 * Remove the buffer from the free list.  Only do this
-	 * if it's on the free list.  Private buffers like the
-	 * superblock buffer are not.
-	 */
+	/* take a reference to the buffer.  */
 	XFS_BUF_HOLD(bp);
 
 	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
 	trace_xfs_buf_item_trylock(bip);
+	if (XFS_BUF_ISDELAYWRITE(bp))
+		return XFS_ITEM_PUSHBUF;
 	return XFS_ITEM_SUCCESS;
 }
 
@@ -626,11 +622,9 @@ xfs_buf_item_committed(
 }
 
 /*
- * This is called to asynchronously write the buffer associated with this
- * buf log item out to disk. The buffer will already have been locked by
- * a successful call to xfs_buf_item_trylock().  If the buffer still has
- * B_DELWRI set, then get it going out to disk with a call to bawrite().
- * If not, then just release the buffer.
+ * The buffer is locked, but is not a delayed write buffer. This happens
+ * if we race with IO completion and hence we don't want to try to write it
+ * again. Just release the buffer.
  */
 STATIC void
 xfs_buf_item_push(
@@ -642,17 +636,29 @@ xfs_buf_item_push(
 	trace_xfs_buf_item_push(bip);
 
 	bp = bip->bli_buf;
+	ASSERT(!XFS_BUF_ISDELAYWRITE(bp));
+	xfs_buf_relse(bp);
+}
 
-	if (XFS_BUF_ISDELAYWRITE(bp)) {
-		int	error;
-		error = xfs_bawrite(bip->bli_item.li_mountp, bp);
-		if (error)
-			xfs_fs_cmn_err(CE_WARN, bip->bli_item.li_mountp,
-			"xfs_buf_item_push: pushbuf error %d on bip %p, bp %p",
-					error, bip, bp);
-	} else {
-		xfs_buf_relse(bp);
-	}
+/*
+ * The buffer is locked and is a delayed write buffer. Promote the buffer
+ * in the delayed write queue as the caller knows that they must invoke
+ * the xfsbufd to get this buffer written. We have to unlock the buffer
+ * to allow the xfsbufd to write it, too.
+ */
+STATIC void
+xfs_buf_item_pushbuf(
+	xfs_buf_log_item_t	*bip)
+{
+	xfs_buf_t	*bp;
+
+	ASSERT(!(bip->bli_flags & XFS_BLI_STALE));
+	trace_xfs_buf_item_pushbuf(bip);
+
+	bp = bip->bli_buf;
+	ASSERT(XFS_BUF_ISDELAYWRITE(bp));
+	xfs_buf_delwri_promote(bp);
+	xfs_buf_relse(bp);
 }
 
 /* ARGSUSED */
@@ -677,7 +683,7 @@ static struct xfs_item_ops xfs_buf_item_ops = {
 	.iop_committed	= (xfs_lsn_t(*)(xfs_log_item_t*, xfs_lsn_t))
 					xfs_buf_item_committed,
 	.iop_push	= (void(*)(xfs_log_item_t*))xfs_buf_item_push,
-	.iop_pushbuf	= NULL,
+	.iop_pushbuf	= (void(*)(xfs_log_item_t*))xfs_buf_item_pushbuf,
 	.iop_committing = (void(*)(xfs_log_item_t*, xfs_lsn_t))
 					xfs_buf_item_committing
 };
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index beb7d9f..0c4e719 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -602,33 +602,20 @@ xfs_inode_item_trylock(
 
 	if (!xfs_iflock_nowait(ip)) {
 		/*
-		 * If someone else isn't already trying to push the inode
-		 * buffer, we get to do it.
+		 * inode has already been flushed to the backing buffer,
+		 * leave it locked in shared mode, pushbuf routine will
+		 * unlock it.
 		 */
-		if (iip->ili_pushbuf_flag == 0) {
-			iip->ili_pushbuf_flag = 1;
-#ifdef DEBUG
-			iip->ili_push_owner = current_pid();
-#endif
-			/*
-			 * Inode is left locked in shared mode.
-			 * Pushbuf routine gets to unlock it.
-			 */
-			return XFS_ITEM_PUSHBUF;
-		} else {
-			/*
-			 * We hold the AIL lock, so we must specify the
-			 * NONOTIFY flag so that we won't double trip.
-			 */
-			xfs_iunlock(ip, XFS_ILOCK_SHARED|XFS_IUNLOCK_NONOTIFY);
-			return XFS_ITEM_FLUSHING;
-		}
-		/* NOTREACHED */
+		return XFS_ITEM_PUSHBUF;
 	}
 
 	/* Stale items should force out the iclog */
 	if (ip->i_flags & XFS_ISTALE) {
 		xfs_ifunlock(ip);
+		/*
+		 * we hold the AIL lock - notify the unlock routine of this
+		 * so it doesn't try to get the lock again.
+		 */
 		xfs_iunlock(ip, XFS_ILOCK_SHARED|XFS_IUNLOCK_NONOTIFY);
 		return XFS_ITEM_PINNED;
 	}
@@ -746,11 +733,8 @@ xfs_inode_item_committed(
  * This gets called by xfs_trans_push_ail(), when IOP_TRYLOCK
  * failed to get the inode flush lock but did get the inode locked SHARED.
  * Here we're trying to see if the inode buffer is incore, and if so whether it's
- * marked delayed write. If that's the case, we'll initiate a bawrite on that
- * buffer to expedite the process.
- *
- * We aren't holding the AIL lock (or the flush lock) when this gets called,
- * so it is inherently race-y.
+ * marked delayed write. If that's the case, we'll promote it and that will
+ * allow the caller to write the buffer by triggering the xfsbufd to run.
  */
 STATIC void
 xfs_inode_item_pushbuf(
@@ -759,26 +743,16 @@ xfs_inode_item_pushbuf(
 	xfs_inode_t	*ip;
 	xfs_mount_t	*mp;
 	xfs_buf_t	*bp;
-	uint		dopush;
 
 	ip = iip->ili_inode;
-
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED));
 
 	/*
-	 * The ili_pushbuf_flag keeps others from
-	 * trying to duplicate our effort.
-	 */
-	ASSERT(iip->ili_pushbuf_flag != 0);
-	ASSERT(iip->ili_push_owner == current_pid());
-
-	/*
 	 * If a flush is not in progress anymore, chances are that the
 	 * inode was taken off the AIL. So, just get out.
 	 */
 	if (completion_done(&ip->i_flush) ||
 	    ((iip->ili_item.li_flags & XFS_LI_IN_AIL) == 0)) {
-		iip->ili_pushbuf_flag = 0;
 		xfs_iunlock(ip, XFS_ILOCK_SHARED);
 		return;
 	}
@@ -787,54 +761,12 @@ xfs_inode_item_pushbuf(
 	bp = xfs_incore(mp->m_ddev_targp, iip->ili_format.ilf_blkno,
 		    iip->ili_format.ilf_len, XFS_INCORE_TRYLOCK);
 
-	if (bp != NULL) {
-		if (XFS_BUF_ISDELAYWRITE(bp)) {
-			/*
-			 * We were racing with iflush because we don't hold
-			 * the AIL lock or the flush lock. However, at this point,
-			 * we have the buffer, and we know that it's dirty.
-			 * So, it's possible that iflush raced with us, and
-			 * this item is already taken off the AIL.
-			 * If not, we can flush it async.
-			 */
-			dopush = ((iip->ili_item.li_flags & XFS_LI_IN_AIL) &&
-				  !completion_done(&ip->i_flush));
-			iip->ili_pushbuf_flag = 0;
-			xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-			trace_xfs_inode_item_push(bp, _RET_IP_);
-
-			if (XFS_BUF_ISPINNED(bp)) {
-				xfs_log_force(mp, (xfs_lsn_t)0,
-					      XFS_LOG_FORCE);
-			}
-			if (dopush) {
-				int	error;
-				error = xfs_bawrite(mp, bp);
-				if (error)
-					xfs_fs_cmn_err(CE_WARN, mp,
-		"xfs_inode_item_pushbuf: pushbuf error %d on iip %p, bp %p",
-							error, iip, bp);
-			} else {
-				xfs_buf_relse(bp);
-			}
-		} else {
-			iip->ili_pushbuf_flag = 0;
-			xfs_iunlock(ip, XFS_ILOCK_SHARED);
-			xfs_buf_relse(bp);
-		}
-		return;
-	}
-	/*
-	 * We have to be careful about resetting pushbuf flag too early (above).
-	 * Even though in theory we can do it as soon as we have the buflock,
-	 * we don't want others to be doing work needlessly. They'll come to
-	 * this function thinking that pushing the buffer is their
-	 * responsibility only to find that the buffer is still locked by
-	 * another doing the same thing
-	 */
-	iip->ili_pushbuf_flag = 0;
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+	if (!bp)
+		return;
+	if (XFS_BUF_ISDELAYWRITE(bp))
+		xfs_buf_delwri_promote(bp);
+	xfs_buf_relse(bp);
 	return;
 }
 
@@ -938,7 +870,6 @@ xfs_inode_item_init(
 	/*
 	   We have zeroed memory. No need ...
 	   iip->ili_extents_buf = NULL;
-	   iip->ili_pushbuf_flag = 0;
 	 */
 
 	iip->ili_format.ilf_type = XFS_LI_INODE;
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index cc8df1a..9a46795 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -144,12 +144,6 @@ typedef struct xfs_inode_log_item {
 						      data exts */
 	struct xfs_bmbt_rec	*ili_aextents_buf; /* array of logged
 						      attr exts */
-	unsigned int            ili_pushbuf_flag;  /* one bit used in push_ail */
-
-#ifdef DEBUG
-	uint64_t                ili_push_owner;    /* one who sets pushbuf_flag
-						      above gets to push the buf */
-#endif
 #ifdef XFS_TRANS_DEBUG
 	int			ili_root_size;
 	char			*ili_orig_root;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 063dfbd..feb779a 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -253,6 +253,7 @@ xfsaild_push(
 	int		flush_log, count, stuck;
 	xfs_mount_t	*mp = ailp->xa_mount;
 	struct xfs_ail_cursor	*cur = &ailp->xa_cursors;
+	int		push_xfsbufd = 0;
 
 	spin_lock(&ailp->xa_lock);
 	xfs_trans_ail_cursor_init(ailp, cur);
@@ -308,6 +309,7 @@ xfsaild_push(
 			XFS_STATS_INC(xs_push_ail_pushbuf);
 			IOP_PUSHBUF(lip);
 			last_pushed_lsn = lsn;
+			push_xfsbufd = 1;
 			break;
 
 		case XFS_ITEM_PINNED:
@@ -374,6 +376,11 @@ xfsaild_push(
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
 	}
 
+	if (push_xfsbufd) {
+		/* we've got delayed write buffers to flush */
+		xfs_flush_buftarg_delwri(mp->m_ddev_targp);
+	}
+
 	if (!count) {
 		/* We're past our target or empty, so idle */
 		last_pushed_lsn = 0;
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/3] xfs: Sort delayed write buffers before dispatch
  2010-01-05  0:04 [PATCH 0/3] Kill async inode writeback V2 Dave Chinner
  2010-01-05  0:04 ` [PATCH 1/3] xfs: Use delayed write for inodes rather than async Dave Chinner
  2010-01-05  0:04 ` [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push Dave Chinner
@ 2010-01-05  0:04 ` Dave Chinner
  2010-01-08 11:11   ` Christoph Hellwig
  2010-01-06 18:08 ` [PATCH 0/3] Kill async inode writeback V2 Christoph Hellwig
  3 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-01-05  0:04 UTC (permalink / raw)
  To: xfs

Currently when the xfsbufd writes delayed write buffers, it pushes
them to disk in the order they come off the delayed write list. If
there are lots of buffers ѕpread widely over the disk, this results
in overwhelming the elevator sort queues in the block layer and we
end up losing the posibility of merging adjacent buffers to minimise
the number of IOs.

Use the new generic list_sort function to sort the delwri dispatch
queue before issue to ensure that the buffers are pushed in the most
friendly order possible to the lower layers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_buf.c |   90 +++++++++++++++++++++++++++++++------------
 1 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_buf.c b/fs/xfs/linux-2.6/xfs_buf.c
index aaefc33..15c0dea 100644
--- a/fs/xfs/linux-2.6/xfs_buf.c
+++ b/fs/xfs/linux-2.6/xfs_buf.c
@@ -33,6 +33,7 @@
 #include <linux/migrate.h>
 #include <linux/backing-dev.h>
 #include <linux/freezer.h>
+#include <linux/sort.h>
 
 #include "xfs_sb.h"
 #include "xfs_inum.h"
@@ -1644,12 +1645,18 @@ xfs_buf_delwri_promote(
 	spinlock_t	*dwlk = &bp->b_target->bt_delwrite_lock;
 	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1;
 
-	spin_lock(dwlk);
 	ASSERT(bp->b_flags & XBF_DELWRI);
 	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
-	list_del(&bp->b_list);
-	list_add(&bp->b_list, dwq);
+
+	/*
+	 * Check the buffer age before locking the delayed write queue as we
+	 * don't need to promote buffers that are already past the flush age.
+	 */
+	if (bp->b_queuetime < jiffies - age)
+		return;
 	bp->b_queuetime = jiffies - age;
+	spin_lock(dwlk);
+	list_move(&bp->b_list, dwq);
 	spin_unlock(dwlk);
 }
 
@@ -1723,14 +1730,42 @@ xfs_buf_delwri_split(
 
 }
 
+/*
+ * Compare function is more complex than it needs to be because
+ * the return value is only 32 bits and we are doing comparisons
+ * on 64 bit values
+ */
+int
+xfs_buf_cmp(
+	void		*priv,
+	struct list_head *a,
+	struct list_head *b)
+{
+	struct xfs_buf	*ap = container_of(a, struct xfs_buf, b_list);
+	struct xfs_buf	*bp = container_of(b, struct xfs_buf, b_list);
+	xfs_daddr_t		diff;
+
+	diff = ap->b_bn - bp->b_bn;
+	if (diff < 0)
+		return -1;
+	if (diff > 0)
+		return 1;
+	return 0;
+}
+
+void
+xfs_buf_delwri_sort(
+	xfs_buftarg_t	*target,
+	struct list_head *list)
+{
+	list_sort(NULL, list, xfs_buf_cmp);
+}
+
 STATIC int
 xfsbufd(
 	void		*data)
 {
-	struct list_head tmp;
-	xfs_buftarg_t	*target = (xfs_buftarg_t *)data;
-	int		count;
-	xfs_buf_t	*bp;
+	xfs_buftarg_t   *target = (xfs_buftarg_t *)data;
 
 	current->flags |= PF_MEMALLOC;
 
@@ -1739,6 +1774,8 @@ xfsbufd(
 	do {
 		long	age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
 		long	tout = age;
+		int	count = 0;
+		struct list_head tmp;
 
 		if (unlikely(freezing(current))) {
 			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
@@ -1753,11 +1790,10 @@ xfsbufd(
 		schedule_timeout_interruptible(tout);
 
 		xfs_buf_delwri_split(target, &tmp, age);
-		count = 0;
+		xfs_buf_delwri_sort(target, &tmp);
 		while (!list_empty(&tmp)) {
-			bp = list_entry(tmp.next, xfs_buf_t, b_list);
-			ASSERT(target == bp->b_target);
-
+			struct xfs_buf *bp;
+			bp = list_first_entry(&tmp, struct xfs_buf, b_list);
 			list_del_init(&bp->b_list);
 			xfs_buf_iostrategy(bp);
 			count++;
@@ -1783,38 +1819,42 @@ xfs_flush_buftarg(
 	xfs_buftarg_t	*target,
 	int		wait)
 {
-	struct list_head tmp;
-	xfs_buf_t	*bp, *n;
+	xfs_buf_t	*bp;
 	int		pincount = 0;
+	LIST_HEAD(tmp_list);
+	LIST_HEAD(wait_list);
 
 	xfs_buf_runall_queues(xfsconvertd_workqueue);
 	xfs_buf_runall_queues(xfsdatad_workqueue);
 	xfs_buf_runall_queues(xfslogd_workqueue);
 
 	set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
-	pincount = xfs_buf_delwri_split(target, &tmp, 0);
+	pincount = xfs_buf_delwri_split(target, &tmp_list, 0);
 
 	/*
-	 * Dropped the delayed write list lock, now walk the temporary list
+	 * Dropped the delayed write list lock, now walk the temporary list.
+	 * All I/O is issued async and then if we need to wait for completion
+	 * we do that after issuing all the IO.
 	 */
-	list_for_each_entry_safe(bp, n, &tmp, b_list) {
+	xfs_buf_delwri_sort(target, &tmp_list);
+	while (!list_empty(&tmp_list)) {
+		struct xfs_buf *bp;
+		bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
 		ASSERT(target == bp->b_target);
-		if (wait)
+		list_del_init(&bp->b_list);
+		if (wait) {
 			bp->b_flags &= ~XBF_ASYNC;
-		else
-			list_del_init(&bp->b_list);
-
+			list_add(&bp->b_list, &wait_list);
+		}
 		xfs_buf_iostrategy(bp);
 	}
 
 	if (wait)
 		blk_run_address_space(target->bt_mapping);
 
-	/*
-	 * Remaining list items must be flushed before returning
-	 */
-	while (!list_empty(&tmp)) {
-		bp = list_entry(tmp.next, xfs_buf_t, b_list);
+	/* Now wait for IO to complete if required. */
+	while (!list_empty(&wait_list)) {
+		bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
 
 		list_del_init(&bp->b_list);
 		xfs_iowait(bp);
-- 
1.6.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Kill async inode writeback V2
  2010-01-05  0:04 [PATCH 0/3] Kill async inode writeback V2 Dave Chinner
                   ` (2 preceding siblings ...)
  2010-01-05  0:04 ` [PATCH 3/3] xfs: Sort delayed write buffers before dispatch Dave Chinner
@ 2010-01-06 18:08 ` Christoph Hellwig
  2010-01-06 22:49   ` Dave Chinner
  3 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-06 18:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Btw, after this series XFS_IFLUSH_DELWRI_ELSE_SYNC is also unused,
might be worth to throw something like the patch below in to clean
up xfs_iflush:

I'm also not sure we do enough of the noblock calls either with or
without your series.  There seem to be a lot more non-blocking sync
calls than iflush calls.

Index: linux-2.6/fs/xfs/xfs_inode.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.c	2010-01-04 16:51:27.885262385 +0100
+++ linux-2.6/fs/xfs/xfs_inode.c	2010-01-04 17:03:28.096003992 +0100
@@ -2827,8 +2827,6 @@ xfs_iflush(
 	xfs_dinode_t		*dip;
 	xfs_mount_t		*mp;
 	int			error;
-	int			noblock = (flags == XFS_IFLUSH_DELWRI_NOBLOCK);
-	enum { INT_DELWRI = (1 << 0), INT_ASYNC = (1 << 1) };
 
 	XFS_STATS_INC(xs_iflush_count);
 
@@ -2860,7 +2858,7 @@ xfs_iflush(
 	 * in the same cluster are dirty, they will probably write the inode
 	 * out for us if they occur after the log force completes.
 	 */
-	if (noblock && xfs_ipincount(ip)) {
+	if ((flags & XFS_IFLUSH_NOBLOCK) && xfs_ipincount(ip)) {
 		xfs_iunpin_nowait(ip);
 		xfs_ifunlock(ip);
 		return EAGAIN;
@@ -2881,52 +2879,11 @@ xfs_iflush(
 	}
 
 	/*
-	 * Decide how buffer will be flushed out.  This is done before
-	 * the call to xfs_iflush_int because this field is zeroed by it.
-	 */
-	if (iip != NULL && iip->ili_format.ilf_fields != 0) {
-		/*
-		 * Flush out the inode buffer according to the directions
-		 * of the caller.  In the cases where the caller has given
-		 * us a choice choose the non-delwri case.  This is because
-		 * the inode is in the AIL and we need to get it out soon.
-		 */
-		switch (flags) {
-		case XFS_IFLUSH_SYNC:
-		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
-			flags = 0;
-			break;
-		case XFS_IFLUSH_DELWRI:
-		case XFS_IFLUSH_DELWRI_NOBLOCK:
-			flags = INT_DELWRI;
-			break;
-		default:
-			ASSERT(0);
-			flags = 0;
-			break;
-		}
-	} else {
-		switch (flags) {
-		case XFS_IFLUSH_DELWRI_NOBLOCK:
-		case XFS_IFLUSH_DELWRI_ELSE_SYNC:
-		case XFS_IFLUSH_DELWRI:
-			flags = INT_DELWRI;
-			break;
-		case XFS_IFLUSH_SYNC:
-			flags = 0;
-			break;
-		default:
-			ASSERT(0);
-			flags = 0;
-			break;
-		}
-	}
-
-	/*
 	 * Get the buffer containing the on-disk inode.
 	 */
 	error = xfs_itobp(mp, NULL, ip, &dip, &bp,
-				noblock ? XFS_BUF_TRYLOCK : XFS_BUF_LOCK);
+				(flags & XFS_IFLUSH_NOBLOCK) ?
+				 XFS_BUF_TRYLOCK : XFS_BUF_LOCK);
 	if (error || !bp) {
 		xfs_ifunlock(ip);
 		return error;
@@ -2954,10 +2911,10 @@ xfs_iflush(
 	if (error)
 		goto cluster_corrupt_out;
 
-	if (flags & INT_DELWRI)
-		xfs_bdwrite(mp, bp);
-	else
+	if (flags & XFS_IFLUSH_SYNC)
 		error = xfs_bwrite(mp, bp);
+	else
+		xfs_bdwrite(mp, bp);
 	return error;
 
 corrupt_out:
Index: linux-2.6/fs/xfs/xfs_inode.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode.h	2010-01-04 16:59:53.765261226 +0100
+++ linux-2.6/fs/xfs/xfs_inode.h	2010-01-04 17:03:14.421006071 +0100
@@ -422,10 +422,8 @@ static inline void xfs_ifunlock(xfs_inod
 /*
  * Flags for xfs_iflush()
  */
-#define	XFS_IFLUSH_DELWRI_ELSE_SYNC	1
-#define	XFS_IFLUSH_SYNC			2
-#define	XFS_IFLUSH_DELWRI		3
-#define	XFS_IFLUSH_DELWRI_NOBLOCK	4
+#define	XFS_IFLUSH_SYNC		1
+#define	XFS_IFLUSH_NOBLOCK	2
 
 /*
  * Flags for xfs_itruncate_start().
Index: linux-2.6/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_super.c	2010-01-04 17:01:27.788003888 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_super.c	2010-01-04 17:01:33.362004189 +0100
@@ -1075,7 +1075,7 @@ xfs_fs_write_inode(
 		if (xfs_ipincount(ip) || !xfs_iflock_nowait(ip))
 			goto out_unlock;
 
-		error = xfs_iflush(ip, XFS_IFLUSH_DELWRI_NOBLOCK);
+		error = xfs_iflush(ip, XFS_IFLUSH_NOBLOCK);
 	}
 
  out_unlock:
Index: linux-2.6/fs/xfs/linux-2.6/xfs_sync.c
===================================================================
--- linux-2.6.orig/fs/xfs/linux-2.6/xfs_sync.c	2010-01-04 17:01:27.801255295 +0100
+++ linux-2.6/fs/xfs/linux-2.6/xfs_sync.c	2010-01-04 17:02:00.155005972 +0100
@@ -260,8 +260,7 @@ xfs_sync_inode_attr(
 		goto out_unlock;
 	}
 
-	error = xfs_iflush(ip, (flags & SYNC_WAIT) ?
-			   XFS_IFLUSH_SYNC : XFS_IFLUSH_DELWRI);
+	error = xfs_iflush(ip, (flags & SYNC_WAIT) ? XFS_IFLUSH_SYNC : 0);
 
  out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
@@ -460,7 +459,7 @@ xfs_quiesce_fs(
 {
 	int	count = 0, pincount;
 
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
+	xfs_reclaim_inodes(mp, 0);
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
 
 	/*
@@ -585,7 +584,7 @@ xfs_sync_worker(
 
 	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE);
-		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
+		xfs_reclaim_inodes(mp, 0);
 		/* dgc: errors ignored here */
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
@@ -718,7 +717,7 @@ xfs_reclaim_inode(
 		 * xfs_iflush_done by locking and unlocking the flush lock.
 		 */
 		if (xfs_iflush(ip, sync_mode) == 0) {
-			if (sync_mode == XFS_IFLUSH_DELWRI)
+			if (sync_mode == 0)
 				goto unlock_and_requeue;
 			xfs_iflock(ip);
 			xfs_ifunlock(ip);
Index: linux-2.6/fs/xfs/xfs_inode_item.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_inode_item.c	2010-01-04 17:03:37.557003904 +0100
+++ linux-2.6/fs/xfs/xfs_inode_item.c	2010-01-04 17:03:42.356006322 +0100
@@ -806,7 +806,7 @@ xfs_inode_item_push(
 	 * will pull th einode from the AIL, mark it clean and unlock the flush
 	 * lock.
 	 */
-	(void) xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+	(void) xfs_iflush(ip, 0);
 	xfs_iunlock(ip, XFS_ILOCK_SHARED);
 
 	return;
Index: linux-2.6/fs/xfs/xfs_mount.c
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_mount.c	2010-01-04 17:01:27.815003765 +0100
+++ linux-2.6/fs/xfs/xfs_mount.c	2010-01-04 17:02:16.218005804 +0100
@@ -1373,7 +1373,7 @@ xfs_unmountfs(
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
+	xfs_reclaim_inodes(mp, 0);
 	XFS_bflush(mp->m_ddev_targp);
 
 	xfs_qm_unmount(mp);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Kill async inode writeback V2
  2010-01-06 18:08 ` [PATCH 0/3] Kill async inode writeback V2 Christoph Hellwig
@ 2010-01-06 22:49   ` Dave Chinner
  2010-01-08 10:14     ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-01-06 22:49 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Jan 06, 2010 at 01:08:00PM -0500, Christoph Hellwig wrote:
> Btw, after this series XFS_IFLUSH_DELWRI_ELSE_SYNC is also unused,
> might be worth to throw something like the patch below in to clean
> up xfs_iflush:

Yes, makes sense. I'll add the patch to my QA series after updating
it for the slight changes to the unmount reclaim I ahd in the
second posting of the patch. Can I get a signoff from you for this?

> I'm also not sure we do enough of the noblock calls either with or
> without your series.  There seem to be a lot more non-blocking sync
> calls than iflush calls.

I don't quite follow - inode flushes from the bdi threads should be
the majority of flushes (i.e. from xfs_fs_write_inode()) and they
are non-blocking. the xfssyncd does delwri writeback (maybe that
should be non-blocking and then we can get rid of that flag, too),
so the only sync inode writeback path is from xfs_fs_write_inode()
for sync flushing, as well as the unmount reclaim path....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/3] Kill async inode writeback V2
  2010-01-06 22:49   ` Dave Chinner
@ 2010-01-08 10:14     ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-08 10:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Jan 07, 2010 at 09:49:44AM +1100, Dave Chinner wrote:
> On Wed, Jan 06, 2010 at 01:08:00PM -0500, Christoph Hellwig wrote:
> > Btw, after this series XFS_IFLUSH_DELWRI_ELSE_SYNC is also unused,
> > might be worth to throw something like the patch below in to clean
> > up xfs_iflush:
> 
> Yes, makes sense. I'll add the patch to my QA series after updating
> it for the slight changes to the unmount reclaim I ahd in the
> second posting of the patch. Can I get a signoff from you for this?


Signed-off-by: Christoph Hellwig <hch@lst.de>

> 
> > I'm also not sure we do enough of the noblock calls either with or
> > without your series.  There seem to be a lot more non-blocking sync
> > calls than iflush calls.
> 
> I don't quite follow - inode flushes from the bdi threads should be
> the majority of flushes (i.e. from xfs_fs_write_inode()) and they
> are non-blocking. the xfssyncd does delwri writeback (maybe that
> should be non-blocking and then we can get rid of that flag, too),
> so the only sync inode writeback path is from xfs_fs_write_inode()
> for sync flushing, as well as the unmount reclaim path....

Sorry, I mean non-blocking delwri calls above.  xfs_sync_worker should
certainly be non-blocking as the whole daemon is operating that way. And
possibly xfs_sync_attr as well.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] xfs: Use delayed write for inodes rather than async
  2010-01-05  0:04 ` [PATCH 1/3] xfs: Use delayed write for inodes rather than async Dave Chinner
@ 2010-01-08 10:36   ` Christoph Hellwig
  2010-01-08 11:05     ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-08 10:36 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +++ b/fs/xfs/linux-2.6/xfs_sync.c
> @@ -460,8 +460,8 @@ xfs_quiesce_fs(
>  {
>  	int	count = 0, pincount;
>  
> +	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
>  	xfs_flush_buftarg(mp->m_ddev_targp, 0);
> -	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);

Hmm.  I think the current code here is simply wrong.  We do need to
flush all delwri buffers after the inode reclaim.  Maybe we should
get this hunk in for .33?

> -		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
> +		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
>  		/* dgc: errors ignored here */
>  		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
>  		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
> @@ -687,7 +687,7 @@ xfs_reclaim_inode(
>  		spin_unlock(&ip->i_flags_lock);
>  		write_unlock(&pag->pag_ici_lock);
>  		xfs_perag_put(pag);
> -		return -EAGAIN;
> +		return EAGAIN;

Unrelated bug in the upsteam code.  But your inode direct reclaim
changes should sort this out already.

> @@ -3012,16 +3001,6 @@ xfs_iflush_int(
>  	iip = ip->i_itemp;
>  	mp = ip->i_mount;
>  
> -
> -	/*
> -	 * If the inode isn't dirty, then just release the inode
> -	 * flush lock and do nothing.
> -	 */
> -	if (xfs_inode_clean(ip)) {
> -		xfs_ifunlock(ip);
> -		return 0;
> -	}
> -

while we now have this check in xfs_reclaim_inode there still are
various other callers.  Did you audit them all to make sure we don't
need the check here anymore?

> index 223d9c3..16c4654 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -1444,7 +1444,14 @@ xfs_unmountfs(
>  	 * need to force the log first.
>  	 */
>  	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
> -	xfs_reclaim_inodes(mp, XFS_IFLUSH_ASYNC);
> +
> +	/*
> +	 * flush the delwri buffers before the reclaim so that it doesn't
> +	 * block for a long time waiting to reclaim inodes that are already
> +	 * in the delwri state.
> +	 */
> +	XFS_bflush(mp->m_ddev_targp);
> +	xfs_reclaim_inodes(mp, XFS_IFLUSH_SYNC);

Wouldn't it be more efficient to also write them out delwri and then
flush out the delwri queue again?

Either way the current code seems fishy to me with an async writeout
here.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] xfs: Use delayed write for inodes rather than async
  2010-01-08 10:36   ` Christoph Hellwig
@ 2010-01-08 11:05     ` Dave Chinner
  2010-01-08 11:14       ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Dave Chinner @ 2010-01-08 11:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 08, 2010 at 05:36:21AM -0500, Christoph Hellwig wrote:
> > +++ b/fs/xfs/linux-2.6/xfs_sync.c
> > @@ -460,8 +460,8 @@ xfs_quiesce_fs(
> >  {
> >  	int	count = 0, pincount;
> >  
> > +	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
> >  	xfs_flush_buftarg(mp->m_ddev_targp, 0);
> > -	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
> 
> Hmm.  I think the current code here is simply wrong.  We do need to
> flush all delwri buffers after the inode reclaim.  Maybe we should
> get this hunk in for .33?

I don't think it really matters for the existing code as we do the
xfs_flush_buftarg(SYNC_WAIT) in the loop below which will push out
inodes flushed during reclaim.

Hmmm - given that xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI) can skip
inodes, there probably should be a sync reclaim done in the flush
loop to ensure we've caught them.

> > -		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
> > +		xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
> >  		/* dgc: errors ignored here */
> >  		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
> >  		error = xfs_sync_fsdata(mp, SYNC_TRYLOCK);
> > @@ -687,7 +687,7 @@ xfs_reclaim_inode(
> >  		spin_unlock(&ip->i_flags_lock);
> >  		write_unlock(&pag->pag_ici_lock);
> >  		xfs_perag_put(pag);
> > -		return -EAGAIN;
> > +		return EAGAIN;
> 
> Unrelated bug in the upsteam code.  But your inode direct reclaim
> changes should sort this out already.

*nod*

> > @@ -3012,16 +3001,6 @@ xfs_iflush_int(
> >  	iip = ip->i_itemp;
> >  	mp = ip->i_mount;
> >  
> > -
> > -	/*
> > -	 * If the inode isn't dirty, then just release the inode
> > -	 * flush lock and do nothing.
> > -	 */
> > -	if (xfs_inode_clean(ip)) {
> > -		xfs_ifunlock(ip);
> > -		return 0;
> > -	}
> > -
> 
> while we now have this check in xfs_reclaim_inode there still are
> various other callers.  Did you audit them all to make sure we don't
> need the check here anymore?

Yes - xfs_iflush_int() gets called only from xfs_iflush() and
xfs_iflush_cluster() and both check first.

> > index 223d9c3..16c4654 100644
> > --- a/fs/xfs/xfs_mount.c
> > +++ b/fs/xfs/xfs_mount.c
> > @@ -1444,7 +1444,14 @@ xfs_unmountfs(
> >  	 * need to force the log first.
> >  	 */
> >  	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
> > -	xfs_reclaim_inodes(mp, XFS_IFLUSH_ASYNC);
> > +
> > +	/*
> > +	 * flush the delwri buffers before the reclaim so that it doesn't
> > +	 * block for a long time waiting to reclaim inodes that are already
> > +	 * in the delwri state.
> > +	 */
> > +	XFS_bflush(mp->m_ddev_targp);
> > +	xfs_reclaim_inodes(mp, XFS_IFLUSH_SYNC);
> 
> Wouldn't it be more efficient to also write them out delwri and then
> flush out the delwri queue again?

The delayed write flush can skip inodes, so we need to do a sync
flush to guarantee that we reclaim all dirty inodes. The flush is done
first so the sync flush doesn't block on the flush locks for too
long for inodes that are already locked for delwri flushing.
Perhaps a:

	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
	XFS_bflush(mp->m_ddev_targp);
	xfs_reclaim_inodes(mp, XFS_IFLUSH_SYNC);

sequence would be better here?

> Either way the current code seems fishy to me with an async writeout
> here.

Agreed.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push
  2010-01-05  0:04 ` [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push Dave Chinner
@ 2010-01-08 11:07   ` Christoph Hellwig
  2010-01-08 11:15     ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-08 11:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +/*
> + * If a delwri buffer needs to be pushed before it has aged out, then
> + * promote it to the head of the delwri queue so that it will be flushed
> + * on the next xfsbufd run.
> + */
> +void
> +xfs_buf_delwri_promote(
> +	xfs_buf_t	*bp)
> +{
> +	struct list_head *dwq = &bp->b_target->bt_delwrite_queue;
> +	spinlock_t	*dwlk = &bp->b_target->bt_delwrite_lock;
> +	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1;
> +
> +	spin_lock(dwlk);
> +	ASSERT(bp->b_flags & XBF_DELWRI);
> +	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
> +	list_del(&bp->b_list);
> +	list_add(&bp->b_list, dwq);
> +	bp->b_queuetime = jiffies - age;
> +	spin_unlock(dwlk);

Sorry for the nitpicking, but:

 a) can you use the struct types instead of the typedefs where possible?
 b) second the pointer to spinlock style used here like in some other
    buf code is rather odd.  What about this instead:

void
xfs_buf_delwri_promote(
	struct xfs_buf		*bp)
{
	struct xfs_buftarg	*target = bp->b_target;

	spin_lock(&target->bt_delwrite_lock);
	ASSERT(bp->b_flags & XBF_DELWRI);
	ASSERT(bp->b_flags & _XBF_DELWRI_Q);

	list_move(&bp->b_list, &target->bt_delwrite_queue);
	bp->b_queuetime = jiffies -
			  xfs_buf_age_centisecs * msecs_to_jiffies(10) - 1;
	spin_unlock(&target->bt_delwrite_lock);
}

Also the queuetime calculation could use some comments.

>  extern void xfs_wait_buftarg(xfs_buftarg_t *);
>  extern int xfs_setsize_buftarg(xfs_buftarg_t *, unsigned int, unsigned int);
>  extern int xfs_flush_buftarg(xfs_buftarg_t *, int);
> +
> +/*
> + * run the xfsbufd on demand to age buffers. Use in combination with
> + * xfs_buf_delwri_promote() to flus delayed write buffers efficiently.
> + */
> +static inline void xfs_flush_buftarg_delwri(xfs_buftarg_t *btp)
> +{
> +	wake_up_process(btp->bt_task);
> +}

The function name is extremly misleading.  It's an xfsbufd wakeup, so it
should be named like that.  In doubt I'd just opencode the
wake_up_process call instead.


The changes to the various log items look good, especially as we bring
some more commonality into the various items.


You removed the only call to trace_xfs_inode_item_push, so you might
aswell remove the trace point declaration, too.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: Sort delayed write buffers before dispatch
  2010-01-05  0:04 ` [PATCH 3/3] xfs: Sort delayed write buffers before dispatch Dave Chinner
@ 2010-01-08 11:11   ` Christoph Hellwig
  2010-01-08 11:17     ` Dave Chinner
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-08 11:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +/*
> + * Compare function is more complex than it needs to be because
> + * the return value is only 32 bits and we are doing comparisons
> + * on 64 bit values
> + */
> +int
> +xfs_buf_cmp(

Should be marked static.

> +void
> +xfs_buf_delwri_sort(
> +	xfs_buftarg_t	*target,
> +	struct list_head *list)
> +{
> +	list_sort(NULL, list, xfs_buf_cmp);
> +}

Same here.  Not sure I would even bother with the wrapper.  Also the
first argument is entirely unused.

>  STATIC int
>  xfsbufd(
>  	void		*data)
>  {
> +	xfs_buftarg_t   *target = (xfs_buftarg_t *)data;
>  
>  	current->flags |= PF_MEMALLOC;
>  
> @@ -1739,6 +1774,8 @@ xfsbufd(
>  	do {
>  		long	age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
>  		long	tout = age;
> +		int	count = 0;
> +		struct list_head tmp;
>  
>  		if (unlikely(freezing(current))) {
>  			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> @@ -1753,11 +1790,10 @@ xfsbufd(
>  		schedule_timeout_interruptible(tout);
>  
>  		xfs_buf_delwri_split(target, &tmp, age);
> +		xfs_buf_delwri_sort(target, &tmp);
>  		while (!list_empty(&tmp)) {
> +			struct xfs_buf *bp;
> +			bp = list_first_entry(&tmp, struct xfs_buf, b_list);
>  			list_del_init(&bp->b_list);
>  			xfs_buf_iostrategy(bp);
>  			count++;
> 
>  
>  	if (wait)
>  		blk_run_address_space(target->bt_mapping);
>  
> +	/* Now wait for IO to complete if required. */
> +	while (!list_empty(&wait_list)) {
> +		bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
>  
>  		list_del_init(&bp->b_list);
>  		xfs_iowait(bp);

As a tiny optimization you might want to move this into the if (wait)
block

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/3] xfs: Use delayed write for inodes rather than async
  2010-01-08 11:05     ` Dave Chinner
@ 2010-01-08 11:14       ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2010-01-08 11:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Fri, Jan 08, 2010 at 10:05:24PM +1100, Dave Chinner wrote:
> I don't think it really matters for the existing code as we do the
> xfs_flush_buftarg(SYNC_WAIT) in the loop below which will push out
> inodes flushed during reclaim.

True.

> Hmmm - given that xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI) can skip
> inodes, there probably should be a sync reclaim done in the flush
> loop to ensure we've caught them.

Indeed, the skipping behaviour is rather confusing and needs to be taken
care off.

> Yes - xfs_iflush_int() gets called only from xfs_iflush() and
> xfs_iflush_cluster() and both check first.

Ok.

> The delayed write flush can skip inodes, so we need to do a sync
> flush to guarantee that we reclaim all dirty inodes. The flush is done
> first so the sync flush doesn't block on the flush locks for too
> long for inodes that are already locked for delwri flushing.
> Perhaps a:
> 
> 	xfs_reclaim_inodes(mp, XFS_IFLUSH_DELWRI);
> 	XFS_bflush(mp->m_ddev_targp);
> 	xfs_reclaim_inodes(mp, XFS_IFLUSH_SYNC);
> 
> sequence would be better here?

I guess that would be optimal.  Maybe with a little comment explaining
why we do it.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push
  2010-01-08 11:07   ` Christoph Hellwig
@ 2010-01-08 11:15     ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-01-08 11:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 08, 2010 at 06:07:19AM -0500, Christoph Hellwig wrote:
> > +/*
> > + * If a delwri buffer needs to be pushed before it has aged out, then
> > + * promote it to the head of the delwri queue so that it will be flushed
> > + * on the next xfsbufd run.
> > + */
> > +void
> > +xfs_buf_delwri_promote(
> > +	xfs_buf_t	*bp)
> > +{
> > +	struct list_head *dwq = &bp->b_target->bt_delwrite_queue;
> > +	spinlock_t	*dwlk = &bp->b_target->bt_delwrite_lock;
> > +	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10) + 1;
> > +
> > +	spin_lock(dwlk);
> > +	ASSERT(bp->b_flags & XBF_DELWRI);
> > +	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
> > +	list_del(&bp->b_list);
> > +	list_add(&bp->b_list, dwq);
> > +	bp->b_queuetime = jiffies - age;
> > +	spin_unlock(dwlk);
> 
> Sorry for the nitpicking, but:
> 
>  a) can you use the struct types instead of the typedefs where possible?

Sure - I missed that one when I went back over this patch after a
c'n'p to create the function.

>  b) second the pointer to spinlock style used here like in some other
>     buf code is rather odd.  What about this instead:
> 
> void
> xfs_buf_delwri_promote(
> 	struct xfs_buf		*bp)
> {
> 	struct xfs_buftarg	*target = bp->b_target;
> 
> 	spin_lock(&target->bt_delwrite_lock);
> 	ASSERT(bp->b_flags & XBF_DELWRI);
> 	ASSERT(bp->b_flags & _XBF_DELWRI_Q);
> 
> 	list_move(&bp->b_list, &target->bt_delwrite_queue);
> 	bp->b_queuetime = jiffies -
> 			  xfs_buf_age_centisecs * msecs_to_jiffies(10) - 1;
> 	spin_unlock(&target->bt_delwrite_lock);
> }

Yup, will change.

> Also the queuetime calculation could use some comments.

OK, will do.

> >  extern void xfs_wait_buftarg(xfs_buftarg_t *);
> >  extern int xfs_setsize_buftarg(xfs_buftarg_t *, unsigned int, unsigned int);
> >  extern int xfs_flush_buftarg(xfs_buftarg_t *, int);
> > +
> > +/*
> > + * run the xfsbufd on demand to age buffers. Use in combination with
> > + * xfs_buf_delwri_promote() to flus delayed write buffers efficiently.
> > + */
> > +static inline void xfs_flush_buftarg_delwri(xfs_buftarg_t *btp)
> > +{
> > +	wake_up_process(btp->bt_task);
> > +}
> 
> The function name is extremly misleading.  It's an xfsbufd wakeup, so it
> should be named like that.  In doubt I'd just opencode the
> wake_up_process call instead.

OK, I just didn't want to be digging deep into the buftarg structure
in places that don't really know about it.

> The changes to the various log items look good, especially as we bring
> some more commonality into the various items.
> 
> 
> You removed the only call to trace_xfs_inode_item_push, so you might
> aswell remove the trace point declaration, too.

Will do.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/3] xfs: Sort delayed write buffers before dispatch
  2010-01-08 11:11   ` Christoph Hellwig
@ 2010-01-08 11:17     ` Dave Chinner
  0 siblings, 0 replies; 15+ messages in thread
From: Dave Chinner @ 2010-01-08 11:17 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Jan 08, 2010 at 06:11:52AM -0500, Christoph Hellwig wrote:
> > +/*
> > + * Compare function is more complex than it needs to be because
> > + * the return value is only 32 bits and we are doing comparisons
> > + * on 64 bit values
> > + */
> > +int
> > +xfs_buf_cmp(
> 
> Should be marked static.
> 
> > +void
> > +xfs_buf_delwri_sort(
> > +	xfs_buftarg_t	*target,
> > +	struct list_head *list)
> > +{
> > +	list_sort(NULL, list, xfs_buf_cmp);
> > +}
> 
> Same here.  Not sure I would even bother with the wrapper.  Also the
> first argument is entirely unused.
> 
> >  STATIC int
> >  xfsbufd(
> >  	void		*data)
> >  {
> > +	xfs_buftarg_t   *target = (xfs_buftarg_t *)data;
> >  
> >  	current->flags |= PF_MEMALLOC;
> >  
> > @@ -1739,6 +1774,8 @@ xfsbufd(
> >  	do {
> >  		long	age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
> >  		long	tout = age;
> > +		int	count = 0;
> > +		struct list_head tmp;
> >  
> >  		if (unlikely(freezing(current))) {
> >  			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
> > @@ -1753,11 +1790,10 @@ xfsbufd(
> >  		schedule_timeout_interruptible(tout);
> >  
> >  		xfs_buf_delwri_split(target, &tmp, age);
> > +		xfs_buf_delwri_sort(target, &tmp);
> >  		while (!list_empty(&tmp)) {
> > +			struct xfs_buf *bp;
> > +			bp = list_first_entry(&tmp, struct xfs_buf, b_list);
> >  			list_del_init(&bp->b_list);
> >  			xfs_buf_iostrategy(bp);
> >  			count++;
> > 
> >  
> >  	if (wait)
> >  		blk_run_address_space(target->bt_mapping);
> >  
> > +	/* Now wait for IO to complete if required. */
> > +	while (!list_empty(&wait_list)) {
> > +		bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
> >  
> >  		list_del_init(&bp->b_list);
> >  		xfs_iowait(bp);
> 
> As a tiny optimization you might want to move this into the if (wait)
> block

All makes sense, will update.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-01-08 11:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-01-05  0:04 [PATCH 0/3] Kill async inode writeback V2 Dave Chinner
2010-01-05  0:04 ` [PATCH 1/3] xfs: Use delayed write for inodes rather than async Dave Chinner
2010-01-08 10:36   ` Christoph Hellwig
2010-01-08 11:05     ` Dave Chinner
2010-01-08 11:14       ` Christoph Hellwig
2010-01-05  0:04 ` [PATCH 2/3] xfs: Don't issue buffer IO direct from AIL push Dave Chinner
2010-01-08 11:07   ` Christoph Hellwig
2010-01-08 11:15     ` Dave Chinner
2010-01-05  0:04 ` [PATCH 3/3] xfs: Sort delayed write buffers before dispatch Dave Chinner
2010-01-08 11:11   ` Christoph Hellwig
2010-01-08 11:17     ` Dave Chinner
2010-01-06 18:08 ` [PATCH 0/3] Kill async inode writeback V2 Christoph Hellwig
2010-01-06 22:49   ` Dave Chinner
2010-01-08 10:14     ` Christoph Hellwig
  -- strict thread matches above, loose matches on Subject: below --
2010-01-02  3:03 [RFC, PATCH 0/3] Kill async inode writeback Dave Chinner
2010-01-02  3:03 ` [PATCH 1/3] XFS: Use delayed write for inodes rather than async Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox