[PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2

public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
@ 2008-07-23  0:41 Dave Chinner
  2008-07-23  0:41 ` [PATCH 1/4] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V2 Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  0:41 UTC (permalink / raw)
  To: xfs

The list of all inodes on a mount is superfluous. We can traverse
all inodes now by walking the per-AG inode radix trees without
needing a separate list. This enables us to remove a bunch of
complex list traversal code and remove another two pointers from
the xfs_inode.

Also, by replacing the sync traversal with an ascending inode
number traversal, we will issue better inode I/O patterns for
writeback triggered by xfssyncd or unmount.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/4] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V2
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
@ 2008-07-23  0:41 ` Dave Chinner
  2008-07-23  0:41 ` [PATCH 2/4] XFS: Use the inode tree for finding dirty inodes V2 Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  0:41 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner

xfs_iflush_all() walks the m_inodes list to find inodes that
need reclaiming. We already have such a list - the m_del_inodes
list. Replace xfs_iflush_all() with a call to xfs_finish_reclaim_all()
and clean up xfs_finish_reclaim_all() to handle the different flush
modes now needed.

Originally based on a patch from Christoph Hellwig.

Version 2
o revert xfs_syncsub() inode reclaim behaviour back to original
  code
o xfs_quiesce_fs() should use XFS_IFLUSH_DELWRI_ELSE_ASYNC, not
  XFS_IFLUSH_ASYNC, to prevent change of behaviour.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_inode.c    |   39 ---------------------------------------
 fs/xfs/xfs_inode.h    |    3 +--
 fs/xfs/xfs_mount.c    |    2 +-
 fs/xfs/xfs_vfsops.c   |    4 ++--
 fs/xfs/xfs_vnodeops.c |   42 ++++++++++++++++++------------------------
 5 files changed, 22 insertions(+), 68 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 20b6f87..ae19b05 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3456,45 +3456,6 @@ corrupt_out:
 	return XFS_ERROR(EFSCORRUPTED);
 }
 
-
-/*
- * Flush all inactive inodes in mp.
- */
-void
-xfs_iflush_all(
-	xfs_mount_t	*mp)
-{
-	xfs_inode_t	*ip;
-	bhv_vnode_t	*vp;
-
- again:
-	XFS_MOUNT_ILOCK(mp);
-	ip = mp->m_inodes;
-	if (ip == NULL)
-		goto out;
-
-	do {
-		/* Make sure we skip markers inserted by sync */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
-		}
-
-		vp = XFS_ITOV_NULL(ip);
-		if (!vp) {
-			XFS_MOUNT_IUNLOCK(mp);
-			xfs_finish_reclaim(ip, 0, XFS_IFLUSH_ASYNC);
-			goto again;
-		}
-
-		ASSERT(vn_count(vp) == 0);
-
-		ip = ip->i_mnext;
-	} while (ip != mp->m_inodes);
- out:
-	XFS_MOUNT_IUNLOCK(mp);
-}
-
 #ifdef XFS_ILOCK_TRACE
 ktrace_t	*xfs_ilock_trace_buf;
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 17a04b6..7ce41d3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -480,7 +480,7 @@ void		xfs_iunlock_map_shared(xfs_inode_t *, uint);
 void		xfs_ifunlock(xfs_inode_t *);
 void		xfs_ireclaim(xfs_inode_t *);
 int		xfs_finish_reclaim(xfs_inode_t *, int, int);
-int		xfs_finish_reclaim_all(struct xfs_mount *, int);
+int		xfs_finish_reclaim_all(struct xfs_mount *, int, int);
 
 /*
  * xfs_inode.c prototypes.
@@ -518,7 +518,6 @@ void		xfs_ipin(xfs_inode_t *);
 void		xfs_iunpin(xfs_inode_t *);
 int		xfs_iextents_copy(xfs_inode_t *, xfs_bmbt_rec_t *, int);
 int		xfs_iflush(xfs_inode_t *, uint);
-void		xfs_iflush_all(struct xfs_mount *);
 void		xfs_ichgtime(xfs_inode_t *, int);
 xfs_fsize_t	xfs_file_last_byte(xfs_inode_t *);
 void		xfs_lock_inodes(xfs_inode_t **, int, uint);
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 6c5d132..0c23f6a 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1268,7 +1268,7 @@ xfs_unmountfs(xfs_mount_t *mp)
 	 * need to force the log first.
 	 */
 	xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
-	xfs_iflush_all(mp);
+	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_ASYNC);
 
 	XFS_QM_DQPURGEALL(mp, XFS_QMOPT_QUOTALL | XFS_QMOPT_UMOUNTING);
 
diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index 4a9a433..ab3f004 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -65,7 +65,7 @@ xfs_quiesce_fs(
 	int			count = 0, pincount;
 
 	xfs_flush_buftarg(mp->m_ddev_targp, 0);
-	xfs_finish_reclaim_all(mp, 0);
+	xfs_finish_reclaim_all(mp, 0, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 
 	/* This loop must run at least twice.
 	 * The first instance of the loop will flush
@@ -654,7 +654,7 @@ xfs_syncsub(
 
 	if (flags & (SYNC_ATTR|SYNC_DELWRI)) {
 		if (flags & SYNC_BDFLUSH)
-			xfs_finish_reclaim_all(mp, 1);
+			xfs_finish_reclaim_all(mp, 1, XFS_IFLUSH_DELWRI_ELSE_ASYNC);
 		else
 			error = xfs_sync_inodes(mp, flags, bypassed);
 	}
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index b792a12..2af1be3 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2985,36 +2985,30 @@ xfs_finish_reclaim(
 }
 
 int
-xfs_finish_reclaim_all(xfs_mount_t *mp, int noblock)
+xfs_finish_reclaim_all(
+	xfs_mount_t	*mp,
+	int		 noblock,
+	int		mode)
 {
-	int		purged;
 	xfs_inode_t	*ip, *n;
-	int		done = 0;
 
-	while (!done) {
-		purged = 0;
-		XFS_MOUNT_ILOCK(mp);
-		list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
-			if (noblock) {
-				if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
-					continue;
-				if (xfs_ipincount(ip) ||
-				    !xfs_iflock_nowait(ip)) {
-					xfs_iunlock(ip, XFS_ILOCK_EXCL);
-					continue;
-				}
+restart:
+	XFS_MOUNT_ILOCK(mp);
+	list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
+		if (noblock) {
+			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
+				continue;
+			if (xfs_ipincount(ip) ||
+			    !xfs_iflock_nowait(ip)) {
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
+				continue;
 			}
-			XFS_MOUNT_IUNLOCK(mp);
-			if (xfs_finish_reclaim(ip, noblock,
-					XFS_IFLUSH_DELWRI_ELSE_ASYNC))
-				delay(1);
-			purged = 1;
-			break;
 		}
-
-		done = !purged;
+		XFS_MOUNT_IUNLOCK(mp);
+		if (xfs_finish_reclaim(ip, noblock, mode))
+			delay(1);
+		goto restart;
 	}
-
 	XFS_MOUNT_IUNLOCK(mp);
 	return 0;
 }
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/4] XFS: Use the inode tree for finding dirty inodes V2
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
  2008-07-23  0:41 ` [PATCH 1/4] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V2 Dave Chinner
@ 2008-07-23  0:41 ` Dave Chinner
  2008-07-23  0:41 ` [PATCH 3/4] XFS: Traverse inode trees when releasing dquots V2 Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  0:41 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner

Update xfs_sync_inodes to walk the inode radix tree cache to find
dirty inodes. This removes a huge bunch of nasty, messy code for
traversing the mount inode list safely and removes another user of
the mount inode list.

Version 2
o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE
o move check for ag initialisation to caller.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_vfsops.c |  361 ++++++++++++++------------------------------------
 1 files changed, 101 insertions(+), 260 deletions(-)

diff --git a/fs/xfs/xfs_vfsops.c b/fs/xfs/xfs_vfsops.c
index ab3f004..197a308 100644
--- a/fs/xfs/xfs_vfsops.c
+++ b/fs/xfs/xfs_vfsops.c
@@ -271,356 +271,197 @@ xfs_sync(
 }
 
 /*
- * xfs sync routine for internal use
- *
- * This routine supports all of the flags defined for the generic vfs_sync
- * interface as explained above under xfs_sync.
- *
+ * Sync all the inodes in the given AG according to the
+ * direction given by the flags.
  */
-int
-xfs_sync_inodes(
+STATIC int
+xfs_sync_inodes_ag(
 	xfs_mount_t	*mp,
+	int		ag,
 	int		flags,
-	int             *bypassed)
+	int		*bypassed)
 {
 	xfs_inode_t	*ip = NULL;
 	bhv_vnode_t	*vp = NULL;
-	int		error;
-	int		last_error;
-	uint64_t	fflag;
-	uint		lock_flags;
-	uint		base_lock_flags;
-	boolean_t	mount_locked;
-	boolean_t	vnode_refed;
-	int		preempt;
-	xfs_iptr_t	*ipointer;
-#ifdef DEBUG
-	boolean_t	ipointer_in = B_FALSE;
-
-#define IPOINTER_SET	ipointer_in = B_TRUE
-#define IPOINTER_CLR	ipointer_in = B_FALSE
-#else
-#define IPOINTER_SET
-#define IPOINTER_CLR
-#endif
-
-
-/* Insert a marker record into the inode list after inode ip. The list
- * must be locked when this is called. After the call the list will no
- * longer be locked.
- */
-#define IPOINTER_INSERT(ip, mp)	{ \
-		ASSERT(ipointer_in == B_FALSE); \
-		ipointer->ip_mnext = ip->i_mnext; \
-		ipointer->ip_mprev = ip; \
-		ip->i_mnext = (xfs_inode_t *)ipointer; \
-		ipointer->ip_mnext->i_mprev = (xfs_inode_t *)ipointer; \
-		preempt = 0; \
-		XFS_MOUNT_IUNLOCK(mp); \
-		mount_locked = B_FALSE; \
-		IPOINTER_SET; \
-	}
-
-/* Remove the marker from the inode list. If the marker was the only item
- * in the list then there are no remaining inodes and we should zero out
- * the whole list. If we are the current head of the list then move the head
- * past us.
- */
-#define IPOINTER_REMOVE(ip, mp)	{ \
-		ASSERT(ipointer_in == B_TRUE); \
-		if (ipointer->ip_mnext != (xfs_inode_t *)ipointer) { \
-			ip = ipointer->ip_mnext; \
-			ip->i_mprev = ipointer->ip_mprev; \
-			ipointer->ip_mprev->i_mnext = ip; \
-			if (mp->m_inodes == (xfs_inode_t *)ipointer) { \
-				mp->m_inodes = ip; \
-			} \
-		} else { \
-			ASSERT(mp->m_inodes == (xfs_inode_t *)ipointer); \
-			mp->m_inodes = NULL; \
-			ip = NULL; \
-		} \
-		IPOINTER_CLR; \
-	}
-
-#define XFS_PREEMPT_MASK	0x7f
-
-	ASSERT(!(flags & SYNC_BDFLUSH));
-
-	if (bypassed)
-		*bypassed = 0;
-	if (mp->m_flags & XFS_MOUNT_RDONLY)
-		return 0;
-	error = 0;
-	last_error = 0;
-	preempt = 0;
-
-	/* Allocate a reference marker */
-	ipointer = (xfs_iptr_t *)kmem_zalloc(sizeof(xfs_iptr_t), KM_SLEEP);
+	xfs_perag_t	*pag = &mp->m_perag[ag];
+	boolean_t	vnode_refed = B_FALSE;
+	int		nr_found;
+	int		first_index = 0;
+	int		error = 0;
+	int		last_error = 0;
+	int		fflag = XFS_B_ASYNC;
+	int		lock_flags = XFS_ILOCK_SHARED;
 
-	fflag = XFS_B_ASYNC;		/* default is don't wait */
 	if (flags & SYNC_DELWRI)
 		fflag = XFS_B_DELWRI;
 	if (flags & SYNC_WAIT)
 		fflag = 0;		/* synchronous overrides all */
 
-	base_lock_flags = XFS_ILOCK_SHARED;
 	if (flags & (SYNC_DELWRI | SYNC_CLOSE)) {
 		/*
 		 * We need the I/O lock if we're going to call any of
 		 * the flush/inval routines.
 		 */
-		base_lock_flags |= XFS_IOLOCK_SHARED;
+		lock_flags |= XFS_IOLOCK_SHARED;
 	}
 
-	XFS_MOUNT_ILOCK(mp);
-
-	ip = mp->m_inodes;
-
-	mount_locked = B_TRUE;
-	vnode_refed  = B_FALSE;
-
-	IPOINTER_CLR;
-
 	do {
-		ASSERT(ipointer_in == B_FALSE);
-		ASSERT(vnode_refed == B_FALSE);
-
-		lock_flags = base_lock_flags;
-
 		/*
-		 * There were no inodes in the list, just break out
-		 * of the loop.
+		 * use a gang lookup to find the next inode in the tree
+		 * as the tree is sparse and a gang lookup walks to find
+		 * the number of objects requested.
 		 */
-		if (ip == NULL) {
-			break;
-		}
+		read_lock(&pag->pag_ici_lock);
+		nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+				(void**)&ip, first_index, 1);
 
-		/*
-		 * We found another sync thread marker - skip it
-		 */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
+		if (!nr_found) {
+			read_unlock(&pag->pag_ici_lock);
+			break;
 		}
 
-		vp = XFS_ITOV_NULL(ip);
+		/* update the index for the next lookup */
+		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 
 		/*
-		 * If the vnode is gone then this is being torn down,
-		 * call reclaim if it is flushed, else let regular flush
-		 * code deal with it later in the loop.
+		 * skip inodes in reclaim. Let xfs_syncsub do that for
+		 * us so we don't need to worry.
 		 */
-
-		if (vp == NULL) {
-			/* Skip ones already in reclaim */
-			if (ip->i_flags & XFS_IRECLAIM) {
-				ip = ip->i_mnext;
-				continue;
-			}
-			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-				ip = ip->i_mnext;
-			} else if ((xfs_ipincount(ip) == 0) &&
-				    xfs_iflock_nowait(ip)) {
-				IPOINTER_INSERT(ip, mp);
-
-				xfs_finish_reclaim(ip, 1,
-						XFS_IFLUSH_DELWRI_ELSE_ASYNC);
-
-				XFS_MOUNT_ILOCK(mp);
-				mount_locked = B_TRUE;
-				IPOINTER_REMOVE(ip, mp);
-			} else {
-				xfs_iunlock(ip, XFS_ILOCK_EXCL);
-				ip = ip->i_mnext;
-			}
+		vp = XFS_ITOV_NULL(ip);
+		if (!vp) {
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 
+		/* bad inodes are dealt with elsewhere */
 		if (VN_BAD(vp)) {
-			ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
 
+		/* nothing to sync during shutdown */
 		if (XFS_FORCED_SHUTDOWN(mp) && !(flags & SYNC_CLOSE)) {
-			XFS_MOUNT_IUNLOCK(mp);
-			kmem_free(ipointer);
+			read_unlock(&pag->pag_ici_lock);
 			return 0;
 		}
 
 		/*
-		 * Try to lock without sleeping.  We're out of order with
-		 * the inode list lock here, so if we fail we need to drop
-		 * the mount lock and try again.  If we're called from
-		 * bdflush() here, then don't bother.
-		 *
-		 * The inode lock here actually coordinates with the
-		 * almost spurious inode lock in xfs_ireclaim() to prevent
-		 * the vnode we handle here without a reference from
-		 * being freed while we reference it.  If we lock the inode
-		 * while it's on the mount list here, then the spurious inode
-		 * lock in xfs_ireclaim() after the inode is pulled from
-		 * the mount list will sleep until we release it here.
-		 * This keeps the vnode from being freed while we reference
-		 * it.
+		 * The inode lock here actually coordinates with the almost
+		 * spurious inode lock in xfs_ireclaim() to prevent the vnode
+		 * we handle here without a reference from being freed while we
+		 * reference it.  If we lock the inode while it's on the mount
+		 * list here, then the spurious inode lock in xfs_ireclaim()
+		 * after the inode is pulled from the mount list will sleep
+		 * until we release it here.  This keeps the vnode from being
+		 * freed while we reference it.
 		 */
 		if (xfs_ilock_nowait(ip, lock_flags) == 0) {
-			if (vp == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
-
 			vp = vn_grab(vp);
-			if (vp == NULL) {
-				ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
+			if (!vp)
 				continue;
-			}
-
-			IPOINTER_INSERT(ip, mp);
 			xfs_ilock(ip, lock_flags);
 
 			ASSERT(vp == XFS_ITOV(ip));
 			ASSERT(ip->i_mount == mp);
 
 			vnode_refed = B_TRUE;
+		} else {
+			/* safe to unlock here as we have a reference */
+			read_unlock(&pag->pag_ici_lock);
 		}
-
-		/* From here on in the loop we may have a marker record
-		 * in the inode list.
-		 */
-
 		/*
 		 * If we have to flush data or wait for I/O completion
 		 * we need to drop the ilock that we currently hold.
 		 * If we need to drop the lock, insert a marker if we
 		 * have not already done so.
 		 */
-		if ((flags & (SYNC_CLOSE|SYNC_IOWAIT)) ||
-		    ((flags & SYNC_DELWRI) && VN_DIRTY(vp))) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
+		if (flags & SYNC_CLOSE) {
 			xfs_iunlock(ip, XFS_ILOCK_SHARED);
-
-			if (flags & SYNC_CLOSE) {
-				/* Shutdown case. Flush and invalidate. */
-				if (XFS_FORCED_SHUTDOWN(mp))
-					xfs_tosspages(ip, 0, -1,
-							     FI_REMAPF);
-				else
-					error = xfs_flushinval_pages(ip,
-							0, -1, FI_REMAPF);
-			} else if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
-				error = xfs_flush_pages(ip, 0,
-							-1, fflag, FI_NONE);
-			}
-
-			/*
-			 * When freezing, we need to wait ensure all I/O (including direct
-			 * I/O) is complete to ensure no further data modification can take
-			 * place after this point
-			 */
+			if (XFS_FORCED_SHUTDOWN(mp))
+				xfs_tosspages(ip, 0, -1, FI_REMAPF);
+			else
+				error = xfs_flushinval_pages(ip, 0, -1,
+							FI_REMAPF);
+			/* wait for I/O on freeze */
 			if (flags & SYNC_IOWAIT)
 				vn_iowait(ip);
 
 			xfs_ilock(ip, XFS_ILOCK_SHARED);
 		}
 
-		if ((flags & SYNC_ATTR) &&
-		    (ip->i_update_core ||
-		     (ip->i_itemp && ip->i_itemp->ili_format.ilf_fields))) {
-			if (mount_locked)
-				IPOINTER_INSERT(ip, mp);
+		if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
+			xfs_iunlock(ip, XFS_ILOCK_SHARED);
+			error = xfs_flush_pages(ip, 0, -1, fflag, FI_NONE);
+			if (flags & SYNC_IOWAIT)
+				vn_iowait(ip);
+			xfs_ilock(ip, XFS_ILOCK_SHARED);
+		}
 
+		if ((flags & SYNC_ATTR) && !xfs_inode_clean(ip)) {
 			if (flags & SYNC_WAIT) {
 				xfs_iflock(ip);
-				error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
-
-			/*
-			 * If we can't acquire the flush lock, then the inode
-			 * is already being flushed so don't bother waiting.
-			 *
-			 * If we can lock it then do a delwri flush so we can
-			 * combine multiple inode flushes in each disk write.
-			 */
+				if (!xfs_inode_clean(ip))
+					error = xfs_iflush(ip, XFS_IFLUSH_SYNC);
+				else
+					xfs_ifunlock(ip);
 			} else if (xfs_iflock_nowait(ip)) {
-				error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+				if (!xfs_inode_clean(ip))
+					error = xfs_iflush(ip, XFS_IFLUSH_DELWRI);
+				else
+					xfs_ifunlock(ip);
 			} else if (bypassed) {
 				(*bypassed)++;
 			}
 		}
 
-		if (lock_flags != 0) {
+		if (lock_flags)
 			xfs_iunlock(ip, lock_flags);
-		}
 
 		if (vnode_refed) {
-			/*
-			 * If we had to take a reference on the vnode
-			 * above, then wait until after we've unlocked
-			 * the inode to release the reference.  This is
-			 * because we can be already holding the inode
-			 * lock when IRELE() calls xfs_inactive().
-			 *
-			 * Make sure to drop the mount lock before calling
-			 * IRELE() so that we don't trip over ourselves if
-			 * we have to go for the mount lock again in the
-			 * inactive code.
-			 */
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-
 			IRELE(ip);
-
 			vnode_refed = B_FALSE;
 		}
 
-		if (error) {
+		if (error)
 			last_error = error;
-		}
-
 		/*
 		 * bail out if the filesystem is corrupted.
 		 */
-		if (error == EFSCORRUPTED)  {
-			if (!mount_locked) {
-				XFS_MOUNT_ILOCK(mp);
-				IPOINTER_REMOVE(ip, mp);
-			}
-			XFS_MOUNT_IUNLOCK(mp);
-			ASSERT(ipointer_in == B_FALSE);
-			kmem_free(ipointer);
+		if (error == EFSCORRUPTED)
 			return XFS_ERROR(error);
-		}
 
-		/* Let other threads have a chance at the mount lock
-		 * if we have looped many times without dropping the
-		 * lock.
-		 */
-		if ((++preempt & XFS_PREEMPT_MASK) == 0) {
-			if (mount_locked) {
-				IPOINTER_INSERT(ip, mp);
-			}
-		}
-
-		if (mount_locked == B_FALSE) {
-			XFS_MOUNT_ILOCK(mp);
-			mount_locked = B_TRUE;
-			IPOINTER_REMOVE(ip, mp);
-			continue;
-		}
-
-		ASSERT(ipointer_in == B_FALSE);
-		ip = ip->i_mnext;
+	} while (nr_found);
 
-	} while (ip != mp->m_inodes);
+	return last_error;
+}
 
-	XFS_MOUNT_IUNLOCK(mp);
+int
+xfs_sync_inodes(
+	xfs_mount_t	*mp,
+	int		flags,
+	int             *bypassed)
+{
+	int		error;
+	int		last_error;
+	int		i;
 
-	ASSERT(ipointer_in == B_FALSE);
+	if (bypassed)
+		*bypassed = 0;
+	if (mp->m_flags & XFS_MOUNT_RDONLY)
+		return 0;
+	error = 0;
+	last_error = 0;
 
-	kmem_free(ipointer);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		if (!mp->m_perag[i].pag_ici_init)
+			continue;
+		error = xfs_sync_inodes_ag(mp, i, flags, bypassed);
+		if (error)
+			last_error = error;
+		if (error == EFSCORRUPTED)
+			break;
+	}
 	return XFS_ERROR(last_error);
 }
 
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/4] XFS: Traverse inode trees when releasing dquots V2
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
  2008-07-23  0:41 ` [PATCH 1/4] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V2 Dave Chinner
  2008-07-23  0:41 ` [PATCH 2/4] XFS: Use the inode tree for finding dirty inodes V2 Dave Chinner
@ 2008-07-23  0:41 ` Dave Chinner
  2008-07-23  0:41 ` [PATCH 4/4] XFS: remove the mount inode list Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  0:41 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner

Make releasing all inode dquots traverse the per-ag
inode radix trees rather than the mount inode list.
This removes another user of the mount inode list.

Version 2
o add comment explaining use of gang lookups for a single inode
o use IRELE, not VN_RELE
o move check for ag initialisation to caller.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/quota/xfs_qm_syscalls.c |  125 ++++++++++++++++++---------------------
 1 files changed, 58 insertions(+), 67 deletions(-)

diff --git a/fs/xfs/quota/xfs_qm_syscalls.c b/fs/xfs/quota/xfs_qm_syscalls.c
index adfb872..6924bbd 100644
--- a/fs/xfs/quota/xfs_qm_syscalls.c
+++ b/fs/xfs/quota/xfs_qm_syscalls.c
@@ -1022,101 +1022,92 @@ xfs_qm_export_flags(
 
 
 /*
- * Go thru all the inodes in the file system, releasing their dquots.
- * Note that the mount structure gets modified to indicate that quotas are off
- * AFTER this, in the case of quotaoff. This also gets called from
- * xfs_rootumount.
+ * Release all the dquots on the inodes in an AG.
  */
-void
-xfs_qm_dqrele_all_inodes(
-	struct xfs_mount *mp,
-	uint		 flags)
+STATIC void
+xfs_qm_dqrele_inodes_ag(
+	xfs_mount_t	*mp,
+	int		ag,
+	uint		flags)
 {
-	xfs_inode_t	*ip, *topino;
-	uint		ireclaims;
-	bhv_vnode_t	*vp;
-	boolean_t	vnode_refd;
+	xfs_inode_t	*ip = NULL;
+	bhv_vnode_t	*vp = NULL;
+	xfs_perag_t	*pag = &mp->m_perag[ag];
+	int		first_index = 0;
+	int		nr_found;
 
-	ASSERT(mp->m_quotainfo);
-
-	XFS_MOUNT_ILOCK(mp);
-again:
-	ip = mp->m_inodes;
-	if (ip == NULL) {
-		XFS_MOUNT_IUNLOCK(mp);
-		return;
-	}
 	do {
-		/* Skip markers inserted by xfs_sync */
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
-			continue;
+		boolean_t	vnode_refd = B_FALSE;
+
+		/*
+		 * use a gang lookup to find the next inode in the tree
+		 * as the tree is sparse and a gang lookup walks to find
+		 * the number of objects requested.
+		 */
+		read_lock(&pag->pag_ici_lock);
+		nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+				(void**)&ip, first_index, 1);
+
+		if (!nr_found) {
+			read_unlock(&pag->pag_ici_lock);
+			break;
 		}
+
+		/* update the index for the next lookup */
+		first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+
 		/* Root inode, rbmip and rsumip have associated blocks */
-		if (ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
-			ASSERT(ip->i_udquot == NULL);
-			ASSERT(ip->i_gdquot == NULL);
-			ip = ip->i_mnext;
-			continue;
-		}
 		vp = XFS_ITOV_NULL(ip);
-		if (!vp) {
+		if (!vp || ip == XFS_QI_UQIP(mp) || ip == XFS_QI_GQIP(mp)) {
 			ASSERT(ip->i_udquot == NULL);
 			ASSERT(ip->i_gdquot == NULL);
-			ip = ip->i_mnext;
+			read_unlock(&pag->pag_ici_lock);
 			continue;
 		}
-		vnode_refd = B_FALSE;
 		if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0) {
-			ireclaims = mp->m_ireclaims;
-			topino = mp->m_inodes;
 			vp = vn_grab(vp);
+			read_unlock(&pag->pag_ici_lock);
 			if (!vp)
-				goto again;
-
-			XFS_MOUNT_IUNLOCK(mp);
-			/* XXX restart limit ? */
-			xfs_ilock(ip, XFS_ILOCK_EXCL);
+				continue;
 			vnode_refd = B_TRUE;
+			xfs_ilock(ip, XFS_ILOCK_EXCL);
 		} else {
-			ireclaims = mp->m_ireclaims;
-			topino = mp->m_inodes;
-			XFS_MOUNT_IUNLOCK(mp);
+			read_unlock(&pag->pag_ici_lock);
 		}
-
-		/*
-		 * We don't keep the mountlock across the dqrele() call,
-		 * since it can take a while..
-		 */
 		if ((flags & XFS_UQUOTA_ACCT) && ip->i_udquot) {
 			xfs_qm_dqrele(ip->i_udquot);
 			ip->i_udquot = NULL;
 		}
-		if (flags & (XFS_PQUOTA_ACCT|XFS_GQUOTA_ACCT) && ip->i_gdquot) {
+		if (flags & (XFS_PQUOTA_ACCT|XFS_GQUOTA_ACCT) &&
+		    ip->i_gdquot) {
 			xfs_qm_dqrele(ip->i_gdquot);
 			ip->i_gdquot = NULL;
 		}
 		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		/*
-		 * Wait until we've dropped the ilock and mountlock to
-		 * do the vn_rele. Or be condemned to an eternity in the
-		 * inactive code in hell.
-		 */
 		if (vnode_refd)
 			IRELE(ip);
-		XFS_MOUNT_ILOCK(mp);
-		/*
-		 * If an inode was inserted or removed, we gotta
-		 * start over again.
-		 */
-		if (topino != mp->m_inodes || mp->m_ireclaims != ireclaims) {
-			/* XXX use a sentinel */
-			goto again;
-		}
-		ip = ip->i_mnext;
-	} while (ip != mp->m_inodes);
+	} while (nr_found);
+}
 
-	XFS_MOUNT_IUNLOCK(mp);
+/*
+ * Go thru all the inodes in the file system, releasing their dquots.
+ * Note that the mount structure gets modified to indicate that quotas are off
+ * AFTER this, in the case of quotaoff. This also gets called from
+ * xfs_rootumount.
+ */
+void
+xfs_qm_dqrele_all_inodes(
+	struct xfs_mount *mp,
+	uint		 flags)
+{
+	int		i;
+
+	ASSERT(mp->m_quotainfo);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		if (!mp->m_perag[i].pag_ici_init)
+			continue;
+		xfs_qm_dqrele_inodes_ag(mp, i, flags);
+	}
 }
 
 /*------------------------------------------------------------------------*/
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 4/4] XFS: remove the mount inode list
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
                   ` (2 preceding siblings ...)
  2008-07-23  0:41 ` [PATCH 3/4] XFS: Traverse inode trees when releasing dquots V2 Dave Chinner
@ 2008-07-23  0:41 ` Dave Chinner
  2008-07-23 20:46   ` Christoph Hellwig
  2008-07-23  2:23 ` [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Mark Goodwin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  0:41 UTC (permalink / raw)
  To: xfs; +Cc: Dave Chinner

Now we've removed all users of the mount inode list,
we can kill it. This reduces the size of the xfs_inode
by 2 pointers.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_iget.c  |   42 +-----------------------------------------
 fs/xfs/xfs_inode.h |    8 --------
 fs/xfs/xfs_mount.c |    5 -----
 fs/xfs/xfs_mount.h |    1 -
 4 files changed, 1 insertions(+), 55 deletions(-)

diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index b07604b..f16ca6c 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -76,7 +76,6 @@ xfs_iget_core(
 {
 	struct inode	*old_inode;
 	xfs_inode_t	*ip;
-	xfs_inode_t	*iq;
 	int		error;
 	unsigned long	first_index, mask;
 	xfs_perag_t	*pag;
@@ -260,24 +259,6 @@ finish_inode:
 
 	write_unlock(&pag->pag_ici_lock);
 	radix_tree_preload_end();
-
-	/*
-	 * Link ip to its mount and thread it on the mount's inode list.
-	 */
-	XFS_MOUNT_ILOCK(mp);
-	if ((iq = mp->m_inodes)) {
-		ASSERT(iq->i_mprev->i_mnext == iq);
-		ip->i_mprev = iq->i_mprev;
-		iq->i_mprev->i_mnext = ip;
-		iq->i_mprev = ip;
-		ip->i_mnext = iq;
-	} else {
-		ip->i_mnext = ip;
-		ip->i_mprev = ip;
-	}
-	mp->m_inodes = ip;
-
-	XFS_MOUNT_IUNLOCK(mp);
 	xfs_put_perag(mp, pag);
 
  return_ip:
@@ -490,36 +471,15 @@ xfs_iextract(
 {
 	xfs_mount_t	*mp = ip->i_mount;
 	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
-	xfs_inode_t	*iq;
 
 	write_lock(&pag->pag_ici_lock);
 	radix_tree_delete(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino));
 	write_unlock(&pag->pag_ici_lock);
 	xfs_put_perag(mp, pag);
 
-	/*
-	 * Remove from mount's inode list.
-	 */
-	XFS_MOUNT_ILOCK(mp);
-	ASSERT((ip->i_mnext != NULL) && (ip->i_mprev != NULL));
-	iq = ip->i_mnext;
-	iq->i_mprev = ip->i_mprev;
-	ip->i_mprev->i_mnext = iq;
-
-	/*
-	 * Fix up the head pointer if it points to the inode being deleted.
-	 */
-	if (mp->m_inodes == ip) {
-		if (ip == iq) {
-			mp->m_inodes = NULL;
-		} else {
-			mp->m_inodes = iq;
-		}
-	}
-
 	/* Deal with the deleted inodes list */
+	XFS_MOUNT_ILOCK(mp);
 	list_del_init(&ip->i_reclaim);
-
 	mp->m_ireclaims++;
 	XFS_MOUNT_IUNLOCK(mp);
 }
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 7ce41d3..c440b1d 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -192,16 +192,8 @@ typedef struct xfs_icdinode {
 	__uint32_t	di_gen;		/* generation number */
 } xfs_icdinode_t;
 
-typedef struct {
-	struct xfs_inode	*ip_mnext;	/* next inode in mount list */
-	struct xfs_inode	*ip_mprev;	/* ptr to prev inode */
-	struct xfs_mount	*ip_mount;	/* fs mount struct ptr */
-} xfs_iptr_t;
-
 typedef struct xfs_inode {
 	/* Inode linking and identification information. */
-	struct xfs_inode	*i_mnext;	/* next inode in mount list */
-	struct xfs_inode	*i_mprev;	/* ptr to prev inode */
 	struct xfs_mount	*i_mount;	/* fs mount struct ptr */
 	struct list_head	i_reclaim;	/* reclaim list */
 	bhv_vnode_t		*i_vnode;	/* vnode backpointer */
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 0c23f6a..364c075 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1314,11 +1314,6 @@ xfs_unmountfs(xfs_mount_t *mp)
 
 	xfs_freesb(mp);
 
-	/*
-	 * All inodes from this mount point should be freed.
-	 */
-	ASSERT(mp->m_inodes == NULL);
-
 	if ((mp->m_flags & XFS_MOUNT_NOUUID) == 0)
 		uuid_table_remove(&mp->m_sb.sb_uuid);
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 6482005..4f257d7 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -247,7 +247,6 @@ typedef struct xfs_mount {
 	xfs_agnumber_t		m_agirotor;	/* last ag dir inode alloced */
 	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
 	xfs_agnumber_t		m_maxagi;	/* highest inode alloc group */
-	struct xfs_inode	*m_inodes;	/* active inode list */
 	struct list_head	m_del_inodes;	/* inodes to reclaim */
 	mutex_t			m_ilock;	/* inode list mutex */
 	uint			m_ireclaims;	/* count of calls to reclaim*/
-- 
1.5.6

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
                   ` (3 preceding siblings ...)
  2008-07-23  0:41 ` [PATCH 4/4] XFS: remove the mount inode list Dave Chinner
@ 2008-07-23  2:23 ` Mark Goodwin
  2008-07-23  4:33   ` Dave Chinner
  2008-07-23  7:17 ` Christoph Hellwig
       [not found] ` <20080811140850.GA12521@infradead.org>
  6 siblings, 1 reply; 11+ messages in thread
From: Mark Goodwin @ 2008-07-23  2:23 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs



Dave Chinner wrote:
> The list of all inodes on a mount is superfluous. We can traverse
> all inodes now by walking the per-AG inode radix trees without
> needing a separate list. This enables us to remove a bunch of
> complex list traversal code and remove another two pointers from
> the xfs_inode.

sounds like a good move.

> Also, by replacing the sync traversal with an ascending inode
> number traversal, we will issue better inode I/O patterns for
> writeback triggered by xfssyncd or unmount.

Dave, have you made any performance measurements showing this to
be the case? If so, what is the improvement? Or should we just assume
such traversals will be more naturally sequential and therefore more
efficient?

Cheers
-- Mark

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
  2008-07-23  2:23 ` [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Mark Goodwin
@ 2008-07-23  4:33   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23  4:33 UTC (permalink / raw)
  To: Mark Goodwin; +Cc: xfs

On Wed, Jul 23, 2008 at 12:23:36PM +1000, Mark Goodwin wrote:
>
>
> Dave Chinner wrote:
>> The list of all inodes on a mount is superfluous. We can traverse
>> all inodes now by walking the per-AG inode radix trees without
>> needing a separate list. This enables us to remove a bunch of
>> complex list traversal code and remove another two pointers from
>> the xfs_inode.
>
> sounds like a good move.
>
>> Also, by replacing the sync traversal with an ascending inode
>> number traversal, we will issue better inode I/O patterns for
>> writeback triggered by xfssyncd or unmount.
>
> Dave, have you made any performance measurements showing this to
> be the case?  If so, what is the improvement?

I don't have a test rig I can use to measure it sufficiently
accurately.

> Or should we just assume
> such traversals will be more naturally sequential and therefore more
> efficient?

Not an assumption - I've verified this occurs with blktrace. I'm
seeing ascending order dispatch and a reduction in the number and
distance of seeks as well as I/Os being issued to "disk" during such
events (like unmount, for example)....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
  2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
                   ` (4 preceding siblings ...)
  2008-07-23  2:23 ` [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Mark Goodwin
@ 2008-07-23  7:17 ` Christoph Hellwig
  2008-07-23 14:30   ` Dave Chinner
       [not found] ` <20080811140850.GA12521@infradead.org>
  6 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2008-07-23  7:17 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jul 23, 2008 at 10:41:09AM +1000, Dave Chinner wrote:
> The list of all inodes on a mount is superfluous. We can traverse
> all inodes now by walking the per-AG inode radix trees without
> needing a separate list. This enables us to remove a bunch of
> complex list traversal code and remove another two pointers from
> the xfs_inode.
> 
> Also, by replacing the sync traversal with an ascending inode
> number traversal, we will issue better inode I/O patterns for
> writeback triggered by xfssyncd or unmount.

All patches looks good to me.  Well, minus the xfsidbg issue in 4
which would be a merge blocker for that patch.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
  2008-07-23  7:17 ` Christoph Hellwig
@ 2008-07-23 14:30   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-07-23 14:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Jul 23, 2008 at 03:17:48AM -0400, Christoph Hellwig wrote:
> On Wed, Jul 23, 2008 at 10:41:09AM +1000, Dave Chinner wrote:
> > The list of all inodes on a mount is superfluous. We can traverse
> > all inodes now by walking the per-AG inode radix trees without
> > needing a separate list. This enables us to remove a bunch of
> > complex list traversal code and remove another two pointers from
> > the xfs_inode.
> > 
> > Also, by replacing the sync traversal with an ascending inode
> > number traversal, we will issue better inode I/O patterns for
> > writeback triggered by xfssyncd or unmount.
> 
> All patches looks good to me.  Well, minus the xfsidbg issue in 4
> which would be a merge blocker for that patch.

Compile tested version of the xfsidbg code. I have no kdb enabled
machines to test it on, so this is as good as I can do right
now.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com


Kill the idbg users of the mount inode list and
replace with radix tree walks.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfsidbg.c |  124 +++++++++++++++++++++++++++++++++----------------------
 1 file changed, 76 insertions(+), 48 deletions(-)

Index: linux-2.6-xfs/fs/xfs/xfsidbg.c
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/xfsidbg.c	2008-07-24 00:20:26.886167169 +1000
+++ linux-2.6-xfs/fs/xfs/xfsidbg.c	2008-07-24 00:20:39.950162171 +1000
@@ -5734,20 +5734,27 @@ xfsidbg_xiclogtrace(xlog_in_core_t *iclo
 static void
 xfsidbg_xinodes(xfs_mount_t *mp)
 {
-	xfs_inode_t	*ip;
+	int		i;
 
 	kdb_printf("xfs_mount at 0x%p\n", mp);
-	ip = mp->m_inodes;
-	if (ip != NULL) {
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		xfs_perag_t	*pag = &mp->m_perag[i];
+		xfs_inode_t	*ip = NULL;
+		int		first_index = 0;
+		int		nr_found;
+
+		if (!pag->pag_ici_init)
+			continue;
 		do {
-			if (ip->i_mount == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
+			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+						(void**)&ip, first_index, 1);
+			if (!nr_found)
+				break;
+			/* update the index for the next lookup */
+			first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 			kdb_printf("\n");
 			xfsidbg_xnode(ip);
-			ip = ip->i_mnext;
-		} while (ip != mp->m_inodes);
+		} while (nr_found);
 	}
 	kdb_printf("\nEnd of Inodes\n");
 }
@@ -5755,23 +5762,30 @@ xfsidbg_xinodes(xfs_mount_t *mp)
 static void
 xfsidbg_delayed_blocks(xfs_mount_t *mp)
 {
-	xfs_inode_t	*ip;
 	unsigned int	total = 0;
 	unsigned int	icount = 0;
+	int		i;
 
-	ip = mp->m_inodes;
-	if (ip != NULL) {
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		xfs_perag_t	*pag = &mp->m_perag[i];
+		xfs_inode_t	*ip = NULL;
+		int		first_index = 0;
+		int		nr_found;
+
+		if (!pag->pag_ici_init)
+			continue;
 		do {
-			if (ip->i_mount == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
+			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+						(void**)&ip, first_index, 1);
+			if (!nr_found)
+				break;
+			/* update the index for the next lookup */
+			first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 			if (ip->i_delayed_blks) {
 				total += ip->i_delayed_blks;
 				icount++;
 			}
-			ip = ip->i_mnext;
-		} while (ip != mp->m_inodes);
+		} while (nr_found);
 	}
 	kdb_printf("delayed blocks total: %d in %d inodes\n", total, icount);
 }
@@ -5779,21 +5793,28 @@ xfsidbg_delayed_blocks(xfs_mount_t *mp)
 static void
 xfsidbg_xinodes_quiesce(xfs_mount_t *mp)
 {
-	xfs_inode_t	*ip;
+	int		i;
 
 	kdb_printf("xfs_mount at 0x%p\n", mp);
-	ip = mp->m_inodes;
-	if (ip != NULL) {
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		xfs_perag_t	*pag = &mp->m_perag[i];
+		xfs_inode_t	*ip = NULL;
+		int		first_index = 0;
+		int		nr_found;
+
+		if (!pag->pag_ici_init)
+			continue;
 		do {
-			if (ip->i_mount == NULL) {
-				ip = ip->i_mnext;
-				continue;
-			}
+			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+						(void**)&ip, first_index, 1);
+			if (!nr_found)
+				break;
+			/* update the index for the next lookup */
+			first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
 			if (!(ip->i_flags & XFS_IQUIESCE)) {
 				kdb_printf("ip 0x%p not quiesced\n", ip);
 			}
-			ip = ip->i_mnext;
-		} while (ip != mp->m_inodes);
+		} while (nr_found);
 	}
 	kdb_printf("\nEnd of Inodes\n");
 }
@@ -6319,8 +6340,8 @@ xfsidbg_xmount(xfs_mount_t *mp)
 		mp->m_rtdev_targp ? mp->m_rtdev_targp->bt_dev : 0);
 	kdb_printf("bsize %d agfrotor %d xfs_rotorstep %d agirotor %d\n",
 		mp->m_bsize, mp->m_agfrotor, xfs_rotorstep, mp->m_agirotor);
-	kdb_printf("inodes 0x%p ilock 0x%p ireclaims 0x%x\n",
-		mp->m_inodes, &mp->m_ilock, mp->m_ireclaims);
+	kdb_printf("ilock 0x%p ireclaims 0x%x\n",
+		&mp->m_ilock, mp->m_ireclaims);
 	kdb_printf("readio_log 0x%x readio_blocks 0x%x ",
 		mp->m_readio_log, mp->m_readio_blocks);
 	kdb_printf("writeio_log 0x%x writeio_blocks 0x%x\n",
@@ -6409,11 +6430,7 @@ xfsidbg_xnode(xfs_inode_t *ip)
 		NULL
 	};
 
-	kdb_printf("mount 0x%p mnext 0x%p mprev 0x%p vnode 0x%p \n",
-		ip->i_mount,
-		ip->i_mnext,
-		ip->i_mprev,
-		XFS_ITOV_NULL(ip));
+	kdb_printf("mount 0x%p vnode 0x%p \n", ip->i_mount, XFS_ITOV_NULL(ip));
 	kdb_printf("dev %x ino %s\n",
 		ip->i_mount->m_ddev_targp->bt_dev,
 		xfs_fmtino(ip->i_ino, ip->i_mount));
@@ -6614,22 +6631,33 @@ xfsidbg_xqm_dquot(xfs_dquot_t *dqp)
 static void
 xfsidbg_xqm_dqattached_inos(xfs_mount_t	*mp)
 {
-	xfs_inode_t	*ip;
-	int		n = 0;
+	int		i, n = 0;
 
-	ip = mp->m_inodes;
-	do {
-		if (ip->i_mount == NULL) {
-			ip = ip->i_mnext;
+	kdb_printf("xfs_mount at 0x%p\n", mp);
+	for (i = 0; i < mp->m_sb.sb_agcount; i++) {
+		xfs_perag_t	*pag = &mp->m_perag[i];
+		xfs_inode_t	*ip = NULL;
+		int		first_index = 0;
+		int		nr_found;
+
+		if (!pag->pag_ici_init)
 			continue;
-		}
-		if (ip->i_udquot || ip->i_gdquot) {
-			n++;
-			kdb_printf("inode = 0x%p, ino %d: udq 0x%p, gdq 0x%p\n",
-				ip, (int)ip->i_ino, ip->i_udquot, ip->i_gdquot);
-		}
-		ip = ip->i_mnext;
-	} while (ip != mp->m_inodes);
+		do {
+			nr_found = radix_tree_gang_lookup(&pag->pag_ici_root,
+						(void**)&ip, first_index, 1);
+			if (!nr_found)
+				break;
+			/* update the index for the next lookup */
+			first_index = XFS_INO_TO_AGINO(mp, ip->i_ino + 1);
+			if (ip->i_udquot || ip->i_gdquot) {
+				n++;
+				kdb_printf("inode = 0x%p, ino %d: udq 0x%p, gdq 0x%p\n",
+					ip, (int)ip->i_ino, ip->i_udquot, ip->i_gdquot);
+			}
+			kdb_printf("\n");
+			xfsidbg_xnode(ip);
+		} while (nr_found);
+	}
 	kdb_printf("\nNumber of inodes with dquots attached: %d\n", n);
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 4/4] XFS: remove the mount inode list
  2008-07-23  0:41 ` [PATCH 4/4] XFS: remove the mount inode list Dave Chinner
@ 2008-07-23 20:46   ` Christoph Hellwig
  0 siblings, 0 replies; 11+ messages in thread
From: Christoph Hellwig @ 2008-07-23 20:46 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Jul 23, 2008 at 10:41:13AM +1000, Dave Chinner wrote:
> Now we've removed all users of the mount inode list,
> we can kill it. This reduces the size of the xfs_inode
> by 2 pointers.

With this m_ilock can also become a spinlock.  Might not be worth
depending on how soon you want to kill m_del_inodes.


Signed-off-by: Christoph Hellwig <hch@lst.de>

Index: linux-2.6-xfs/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/linux-2.6/xfs_super.c	2008-07-23 15:18:52.000000000 +0200
+++ linux-2.6-xfs/fs/xfs/linux-2.6/xfs_super.c	2008-07-23 15:19:28.000000000 +0200
@@ -1554,7 +1554,7 @@ xfs_fs_fill_super(
 		goto out;
 
 	spin_lock_init(&mp->m_sb_lock);
-	mutex_init(&mp->m_ilock);
+	spin_lock_init(&mp->m_ilock);
 	mutex_init(&mp->m_growlock);
 	atomic_set(&mp->m_active_trans, 0);
 	INIT_LIST_HEAD(&mp->m_sync_list);
Index: linux-2.6-xfs/fs/xfs/xfs_iget.c
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/xfs_iget.c	2008-07-23 15:18:52.000000000 +0200
+++ linux-2.6-xfs/fs/xfs/xfs_iget.c	2008-07-23 15:19:28.000000000 +0200
@@ -144,9 +144,9 @@ again:
 			xfs_iflags_clear(ip, XFS_IRECLAIMABLE);
 			read_unlock(&pag->pag_ici_lock);
 
-			XFS_MOUNT_ILOCK(mp);
+			spin_lock(&mp->m_ilock);
 			list_del_init(&ip->i_reclaim);
-			XFS_MOUNT_IUNLOCK(mp);
+			spin_unlock(&mp->m_ilock);
 
 			goto finish_inode;
 
@@ -484,10 +484,10 @@ xfs_iextract(
 	xfs_put_perag(mp, pag);
 
 	/* Deal with the deleted inodes list */
-	XFS_MOUNT_ILOCK(mp);
+	spin_lock(&mp->m_ilock);
 	list_del_init(&ip->i_reclaim);
 	mp->m_ireclaims++;
-	XFS_MOUNT_IUNLOCK(mp);
+	spin_unlock(&mp->m_ilock);
 }
 
 /*
Index: linux-2.6-xfs/fs/xfs/xfs_mount.c
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/xfs_mount.c	2008-07-23 15:18:52.000000000 +0200
+++ linux-2.6-xfs/fs/xfs/xfs_mount.c	2008-07-23 15:19:28.000000000 +0200
@@ -142,7 +142,7 @@ xfs_mount_free(
 
 	spinlock_destroy(&mp->m_ail_lock);
 	spinlock_destroy(&mp->m_sb_lock);
-	mutex_destroy(&mp->m_ilock);
+	spinlock_destroy(&mp->m_ilock);
 	mutex_destroy(&mp->m_growlock);
 	if (mp->m_quotainfo)
 		XFS_QM_DONE(mp);
Index: linux-2.6-xfs/fs/xfs/xfs_mount.h
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/xfs_mount.h	2008-07-23 15:18:52.000000000 +0200
+++ linux-2.6-xfs/fs/xfs/xfs_mount.h	2008-07-23 15:19:28.000000000 +0200
@@ -248,7 +248,7 @@ typedef struct xfs_mount {
 	spinlock_t		m_agirotor_lock;/* .. and lock protecting it */
 	xfs_agnumber_t		m_maxagi;	/* highest inode alloc group */
 	struct list_head	m_del_inodes;	/* inodes to reclaim */
-	mutex_t			m_ilock;	/* inode list mutex */
+	spinlock_t		m_ilock;	/* synchronize m_del_inodes */
 	uint			m_ireclaims;	/* count of calls to reclaim*/
 	uint			m_readio_log;	/* min read size log bytes */
 	uint			m_readio_blocks; /* min read size blocks */
@@ -509,9 +509,6 @@ typedef struct xfs_mod_sb {
 	int64_t		msb_delta;	/* Change to make to specified field */
 } xfs_mod_sb_t;
 
-#define	XFS_MOUNT_ILOCK(mp)	mutex_lock(&((mp)->m_ilock))
-#define	XFS_MOUNT_IUNLOCK(mp)	mutex_unlock(&((mp)->m_ilock))
-
 extern void	xfs_mod_sb(xfs_trans_t *, __int64_t);
 extern int	xfs_log_sbcount(xfs_mount_t *, uint);
 extern int	xfs_mountfs(xfs_mount_t *mp, int);
Index: linux-2.6-xfs/fs/xfs/xfs_vnodeops.c
===================================================================
--- linux-2.6-xfs.orig/fs/xfs/xfs_vnodeops.c	2008-07-23 15:18:52.000000000 +0200
+++ linux-2.6-xfs/fs/xfs/xfs_vnodeops.c	2008-07-23 15:19:28.000000000 +0200
@@ -2834,14 +2834,14 @@ xfs_reclaim(
 		xfs_mount_t	*mp = ip->i_mount;
 
 		/* Protect sync and unpin from us */
-		XFS_MOUNT_ILOCK(mp);
+		spin_lock(&mp->m_ilock);
 		spin_lock(&ip->i_flags_lock);
 		__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 		vn_to_inode(vp)->i_private = NULL;
 		ip->i_vnode = NULL;
 		spin_unlock(&ip->i_flags_lock);
 		list_add_tail(&ip->i_reclaim, &mp->m_del_inodes);
-		XFS_MOUNT_IUNLOCK(mp);
+		spin_unlock(&mp->m_ilock);
 	}
 	return 0;
 }
@@ -2922,7 +2922,7 @@ xfs_finish_reclaim_all(
 	xfs_inode_t	*ip, *n;
 
 restart:
-	XFS_MOUNT_ILOCK(mp);
+	spin_lock(&mp->m_ilock);
 	list_for_each_entry_safe(ip, n, &mp->m_del_inodes, i_reclaim) {
 		if (noblock) {
 			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL) == 0)
@@ -2933,12 +2933,12 @@ restart:
 				continue;
 			}
 		}
-		XFS_MOUNT_IUNLOCK(mp);
+		spin_unlock(&mp->m_ilock);
 		if (xfs_finish_reclaim(ip, noblock, mode))
 			delay(1);
 		goto restart;
 	}
-	XFS_MOUNT_IUNLOCK(mp);
+	spin_unlock(&mp->m_ilock);
 	return 0;
 }
 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2
       [not found] ` <20080811140850.GA12521@infradead.org>
@ 2008-08-12  0:19   ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2008-08-12  0:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Mon, Aug 11, 2008 at 10:08:50AM -0400, Christoph Hellwig wrote:
> On Wed, Jul 23, 2008 at 10:41:09AM +1000, Dave Chinner wrote:
> > The list of all inodes on a mount is superfluous. We can traverse
> > all inodes now by walking the per-AG inode radix trees without
> > needing a separate list. This enables us to remove a bunch of
> > complex list traversal code and remove another two pointers from
> > the xfs_inode.
> > 
> > Also, by replacing the sync traversal with an ascending inode
> > number traversal, we will issue better inode I/O patterns for
> > writeback triggered by xfssyncd or unmount.
> 
> Any plans to send an updated version for the current tree and including
> the xfsidbg fixups?

Sure, I've been waiting for the git tree to get updated with all the
patch I had previously sent. I've been away for the past few days,
so it hasn't happened yet.

As to the xfsidbg changes, as I've said before the XFS git trees
do not have that file in it nor do I use kdb at all anymore. It
means I have to convert complex patch series to a CVS tree to create
a idbg patches that I can't test or then integrate back into the the
git tree. I dislike sending untested patches, especially when they
are non-trivial. It also means i can't use the git patch-bomb tools
to send them as part of the patch series.

Quite frankly, xfsidbg is a PITA to update because it's not in a
git tree, and hence I don't update it because I don't need it to
develop or debug code....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2008-08-12  0:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-07-23  0:41 [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Dave Chinner
2008-07-23  0:41 ` [PATCH 1/4] XFS: Remove xfs_iflush_all and clean up xfs_finish_reclaim_all() V2 Dave Chinner
2008-07-23  0:41 ` [PATCH 2/4] XFS: Use the inode tree for finding dirty inodes V2 Dave Chinner
2008-07-23  0:41 ` [PATCH 3/4] XFS: Traverse inode trees when releasing dquots V2 Dave Chinner
2008-07-23  0:41 ` [PATCH 4/4] XFS: remove the mount inode list Dave Chinner
2008-07-23 20:46   ` Christoph Hellwig
2008-07-23  2:23 ` [PATCH 0/4] XFS: replace the mount inode list with radix tree traversals V2 Mark Goodwin
2008-07-23  4:33   ` Dave Chinner
2008-07-23  7:17 ` Christoph Hellwig
2008-07-23 14:30   ` Dave Chinner
     [not found] ` <20080811140850.GA12521@infradead.org>
2008-08-12  0:19   ` Dave Chinner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox