public inbox for linux-xfs@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC, PATCH 0/7] XFS: dynamic busy extent tracking
@ 2008-10-07 22:09 Dave Chinner
  2008-10-07 22:09 ` [PATCH 1/7] XFS: rename xfs_get_perag Dave Chinner
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

The busy extent tracking in XFS is currently very static and has
some performance issues. We can only track 128 busy extents per AG,
and when we overflow this we fall back to synchronous transactions.
Also, every time we re-use a busy extent we cause a synchronous log
force, which stops all allocation and freeing in that AG while the
log force is in progress.

This limits the amount of transaction aggregation we can potentially
do because of the fact that we do log forces when a busy extent is
found which would effectively close off an aggregating transaction
group. It is relatively easy to trigger overflows or reuse of busy
extents and cause thіs to occur.

Further, when it comes to issuing block discard messages to tell the
lower layer that we're finished with a block, we need to track all
freed extents, not just those we can fit in a static array.  Hence
adding block discard functionality to XFS also requires us to rework
the way we track busy extents.

This patch series is a work in progress. It's probably better to
look at the finished product rather than the inividual patches
because the later patches undo a lot of the earlier factoring and
shuffling. I'll need to completely redo the series before final
subbbmission. However, I'm posting this now because it passes XFSQA
and it seems to be a good time to post it for comments before I
start cleaning it up....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] XFS: rename xfs_get_perag
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-08 18:41   ` Christoph Hellwig
  2008-10-07 22:09 ` [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree Dave Chinner
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

xfs_get_perag is really getting the perag that an inode belongs to
based on it's inode number. Rename it appropriately so we can use
xfs_perag_get() to get the perag from a provided ag number. Convert
a number of sites over to using this interface.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |   21 ++++++++++++---------
 fs/xfs/xfs_iget.c           |   14 +++++++-------
 fs/xfs/xfs_inode.c          |    6 +++---
 fs/xfs/xfs_mount.h          |   15 ++++++++++++---
 4 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index ee1648b..08b2acf 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -57,7 +57,7 @@ xfs_sync_inodes_ag(
 	int		ag,
 	int		flags)
 {
-	xfs_perag_t	*pag = &mp->m_perag[ag];
+	xfs_perag_t	*pag = xfs_perag_get(mp, ag);
 	int		nr_found;
 	uint32_t	first_index = 0;
 	int		error = 0;
@@ -197,10 +197,11 @@ xfs_sync_inodes_ag(
 		 * bail out if the filesystem is corrupted.
 		 */
 		if (error == EFSCORRUPTED)
-			return XFS_ERROR(error);
+			break;
 
 	} while (nr_found);
 
+	xfs_perag_put(pag);
 	return last_error;
 }
 
@@ -598,7 +599,7 @@ xfs_reclaim_inode(
 	int		locked,
 	int		sync_mode)
 {
-	xfs_perag_t	*pag = xfs_get_perag(ip->i_mount, ip->i_ino);
+	xfs_perag_t	*pag = xfs_perag_get_from_ino(ip->i_mount, ip->i_ino);
 
 	/* The hash lock here protects a thread in xfs_iget_core from
 	 * racing with us on linking the inode back with a vnode.
@@ -615,12 +616,13 @@ xfs_reclaim_inode(
 			xfs_ifunlock(ip);
 			xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		}
+		xfs_perag_put(pag);
 		return 1;
 	}
 	__xfs_iflags_set(ip, XFS_IRECLAIM);
 	spin_unlock(&ip->i_flags_lock);
 	write_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(ip->i_mount, pag);
+	xfs_perag_put(pag);
 
 	/*
 	 * If the inode is still dirty, then flush it out.  If the inode
@@ -663,7 +665,7 @@ xfs_inode_set_reclaim_tag(
 	xfs_inode_t	*ip)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
+	xfs_perag_t	*pag = xfs_perag_get_from_ino(mp, ip->i_ino);
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
@@ -672,7 +674,7 @@ xfs_inode_set_reclaim_tag(
 	__xfs_iflags_set(ip, XFS_IRECLAIMABLE);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 }
 
 void
@@ -690,14 +692,14 @@ xfs_inode_clear_reclaim_tag(
 	xfs_inode_t	*ip)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
+	xfs_perag_t	*pag = xfs_perag_get_from_ino(mp, ip->i_ino);
 
 	read_lock(&pag->pag_ici_lock);
 	spin_lock(&ip->i_flags_lock);
 	__xfs_inode_clear_reclaim_tag(mp, pag, ip);
 	spin_unlock(&ip->i_flags_lock);
 	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 }
 
 
@@ -709,7 +711,7 @@ xfs_reclaim_inodes_ag(
 	int		mode)
 {
 	xfs_inode_t	*ip = NULL;
-	xfs_perag_t	*pag = &mp->m_perag[ag];
+	xfs_perag_t	*pag = xfs_perag_get(mp, ag);
 	int		nr_found;
 	uint32_t	first_index;
 	int		skipped;
@@ -779,6 +781,7 @@ restart:
 		delay(1);
 		goto restart;
 	}
+	xfs_perag_put(pag);
 	return;
 
 }
diff --git a/fs/xfs/xfs_iget.c b/fs/xfs/xfs_iget.c
index a1f209b..b34b732 100644
--- a/fs/xfs/xfs_iget.c
+++ b/fs/xfs/xfs_iget.c
@@ -245,7 +245,7 @@ xfs_iget(
 		return EINVAL;
 
 	/* get the perag structure and ensure that it's inode capable */
-	pag = xfs_get_perag(mp, ino);
+	pag = xfs_perag_get_from_ino(mp, ino);
 	if (!pag->pagi_inodeok)
 		return EINVAL;
 	ASSERT(pag->pag_ici_init);
@@ -269,7 +269,7 @@ again:
 		if (error)
 			goto out_error_or_again;
 	}
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 
 	xfs_iflags_set(ip, XFS_IMODIFIED);
 	*ipp = ip;
@@ -289,7 +289,7 @@ out_error_or_again:
 		delay(1);
 		goto again;
 	}
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 	return error;
 }
 
@@ -307,11 +307,11 @@ xfs_inode_incore(xfs_mount_t	*mp,
 	xfs_inode_t	*ip;
 	xfs_perag_t	*pag;
 
-	pag = xfs_get_perag(mp, ino);
+	pag = xfs_perag_get_from_ino(mp, ino);
 	read_lock(&pag->pag_ici_lock);
 	ip = radix_tree_lookup(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ino));
 	read_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 
 	/* the returned inode must match the transaction */
 	if (ip && (ip->i_transp != tp))
@@ -410,12 +410,12 @@ xfs_iextract(
 	xfs_inode_t	*ip)
 {
 	xfs_mount_t	*mp = ip->i_mount;
-	xfs_perag_t	*pag = xfs_get_perag(mp, ip->i_ino);
+	xfs_perag_t	*pag = xfs_perag_get_from_ino(mp, ip->i_ino);
 
 	write_lock(&pag->pag_ici_lock);
 	radix_tree_delete(&pag->pag_ici_root, XFS_INO_TO_AGINO(mp, ip->i_ino));
 	write_unlock(&pag->pag_ici_lock);
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 
 	mp->m_ireclaims++;
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index eba2ff0..17dbf24 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2111,7 +2111,7 @@ xfs_ifree_cluster(
 	xfs_inode_t		*ip, **ip_found;
 	xfs_inode_log_item_t	*iip;
 	xfs_log_item_t		*lip;
-	xfs_perag_t		*pag = xfs_get_perag(mp, inum);
+	xfs_perag_t		*pag = xfs_perag_get_from_ino(mp, inum);
 
 	if (mp->m_sb.sb_blocksize >= XFS_INODE_CLUSTER_SIZE(mp)) {
 		blks_per_cluster = 1;
@@ -2253,7 +2253,7 @@ xfs_ifree_cluster(
 	}
 
 	kmem_free(ip_found);
-	xfs_put_perag(mp, pag);
+	xfs_perag_put(pag);
 }
 
 /*
@@ -2973,7 +2973,7 @@ xfs_iflush_cluster(
 	xfs_buf_t	*bp)
 {
 	xfs_mount_t		*mp = ip->i_mount;
-	xfs_perag_t		*pag = xfs_get_perag(mp, ip->i_ino);
+	xfs_perag_t		*pag = xfs_perag_get_from_ino(mp, ip->i_ino);
 	unsigned long		first_index, mask;
 	unsigned long		inodes_per_cluster;
 	int			ilist_size;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index f927ada..55cc89f 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -455,18 +455,27 @@ xfs_daddr_to_agbno(struct xfs_mount *mp, xfs_daddr_t d)
  * perag get/put wrappers for eventual ref counting
  */
 static inline xfs_perag_t *
-xfs_get_perag(struct xfs_mount *mp, xfs_ino_t ino)
+xfs_perag_get(struct xfs_mount *mp, xfs_agnumber_t agno)
 {
-	return &mp->m_perag[XFS_INO_TO_AGNO(mp, ino)];
+	return &mp->m_perag[agno];
 }
 
 static inline void
-xfs_put_perag(struct xfs_mount *mp, xfs_perag_t *pag)
+xfs_perag_put(xfs_perag_t *pag)
 {
 	/* nothing to see here, move along */
 }
 
 /*
+ * Get the perag associated with the given inode number
+ */
+static inline xfs_perag_t *
+xfs_perag_get_from_ino(struct xfs_mount *mp, xfs_ino_t ino)
+{
+	return xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ino));
+}
+
+/*
  * Per-cpu superblock locking functions
  */
 #ifdef HAVE_PERCPU_SB
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
  2008-10-07 22:09 ` [PATCH 1/7] XFS: rename xfs_get_perag Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-08 18:49   ` Christoph Hellwig
  2008-10-07 22:09 ` [PATCH 3/7] XFS: Don't immediately reallocate busy extents Dave Chinner
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

When we free a metadata extent, we record it in the per-AG busy
extent array so that it is not re-used before the freeing
transaction hits the disk. This array is fixed size, so when it
overflows we make further allocation transactions synchronous
because we cannot track more freed extents until those transactions
hit the disk and are completed. Under heavy mixed allocation and
freeing workloads with large log buffers, we can overflow this array
quite easily.

This is important as we want to be able to issue block discard
commands when we free extents, and that has to happen after the free
transaction has been committed to disk. This is made complex by the
way the overflows are handled - we don't record the extent to be
freed in the busy list, so we've got no callback to issue the
discard request from. Hence we need a different tracking mechanism
to be easily able to issue discard blocks.

Further, the array is sparsely populated, which means that inserts
need to search for a free slot, and array searches often have to
search many more slots that are actually used to check all the
busy extents. Quite inefficient, really.

To fix all these problems, make the busy extent tracking dynamic and
faster to search by converting it to a per-AG rbtree and indexing it
by block number. We always search by block number, so this means
that the search and insert scaling becomes O(log n) instead of
O(array size).  This will also allow us to efficiently track far
more busy extents than the current method allows.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_ag.h         |   20 ++--
 fs/xfs/xfs_alloc.c      |  235 +++++++++++++++++++++++++++--------------------
 fs/xfs/xfs_alloc.h      |    5 +-
 fs/xfs/xfs_mount.c      |    8 +--
 fs/xfs/xfs_trans.c      |    6 +-
 fs/xfs/xfs_trans.h      |   10 +-
 fs/xfs/xfs_trans_item.c |   19 +---
 fs/xfs/xfs_trans_priv.h |    3 -
 8 files changed, 161 insertions(+), 145 deletions(-)

diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 2bfd863..e7aaa1c 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -156,14 +156,16 @@ typedef struct xfs_agfl {
 } xfs_agfl_t;
 
 /*
- * Busy block/extent entry.  Used in perag to mark blocks that have been freed
- * but whose transactions aren't committed to disk yet.
+ * Busy block/extent entry.  Indexed by a rbtree in perag to mark blocks that
+ * have been freed but whose transactions aren't committed to disk yet.
  */
-typedef struct xfs_perag_busy {
-	xfs_agblock_t	busy_start;
-	xfs_extlen_t	busy_length;
-	struct xfs_trans *busy_tp;	/* transaction that did the free */
-} xfs_perag_busy_t;
+struct xfs_busy_extent {
+	struct rb_node	rb_node;
+	xfs_agnumber_t	agno;
+	xfs_agblock_t	bno;
+	xfs_extlen_t	length;
+	struct xfs_trans *tp;	/* transaction that did the free */
+};
 
 /*
  * Per-ag incore structure, copies of information in agf and agi,
@@ -192,9 +194,9 @@ typedef struct xfs_perag
 	xfs_agino_t	pagi_freecount;	/* number of free inodes */
 	xfs_agino_t	pagi_count;	/* number of allocated inodes */
 	int		pagb_count;	/* pagb slots in use */
-	xfs_perag_busy_t *pagb_list;	/* unstable blocks */
 #ifdef __KERNEL__
-	spinlock_t	pagb_lock;	/* lock for pagb_list */
+	spinlock_t	pagb_lock;	/* lock for pagb_tree */
+	struct rb_root	pagb_tree;	/* ordered tree of busy extents */
 
 	atomic_t        pagf_fstrms;    /* # of filestreams active in this AG */
 
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 0a2a872..98908ba 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -60,18 +60,18 @@ ktrace_t *xfs_alloc_trace_buf;
 	xfs_alloc_trace_free(__func__, s, mp, a, b, x, f, __LINE__)
 #define	TRACE_MODAGF(s,a,f)	\
 	xfs_alloc_trace_modagf(__func__, s, mp, a, f, __LINE__)
-#define	TRACE_BUSY(__func__,s,ag,agb,l,sl,tp)	\
-	xfs_alloc_trace_busy(__func__, s, mp, ag, agb, l, sl, tp, XFS_ALLOC_KTRACE_BUSY, __LINE__)
-#define	TRACE_UNBUSY(__func__,s,ag,sl,tp)	\
-	xfs_alloc_trace_busy(__func__, s, mp, ag, -1, -1, sl, tp, XFS_ALLOC_KTRACE_UNBUSY, __LINE__)
+#define	TRACE_BUSY(__func__,s,ag,agb,l,tp)	\
+	xfs_alloc_trace_busy(__func__, s, mp, ag, agb, l, 0, tp, XFS_ALLOC_KTRACE_BUSY, __LINE__)
+#define	TRACE_UNBUSY(__func__,s,ag,agb,l,tp)	\
+	xfs_alloc_trace_busy(__func__, s, mp, ag, agb, l, 0, tp, XFS_ALLOC_KTRACE_UNBUSY, __LINE__)
 #define	TRACE_BUSYSEARCH(__func__,s,ag,agb,l,tp)	\
 	xfs_alloc_trace_busy(__func__, s, mp, ag, agb, l, 0, tp, XFS_ALLOC_KTRACE_BUSYSEARCH, __LINE__)
 #else
 #define	TRACE_ALLOC(s,a)
 #define	TRACE_FREE(s,a,b,x,f)
 #define	TRACE_MODAGF(s,a,f)
-#define	TRACE_BUSY(s,a,ag,agb,l,sl,tp)
-#define	TRACE_UNBUSY(fname,s,ag,sl,tp)
+#define	TRACE_BUSY(s,a,ag,agb,l,tp)
+#define	TRACE_UNBUSY(fname,s,ag,agb,l,tp)
 #define	TRACE_BUSYSEARCH(fname,s,ag,agb,l,tp)
 #endif	/* XFS_ALLOC_TRACE */
 
@@ -2293,8 +2293,7 @@ xfs_alloc_read_agf(
 		pag->pagf_levels[XFS_BTNUM_CNTi] =
 			be32_to_cpu(agf->agf_levels[XFS_BTNUM_CNTi]);
 		spin_lock_init(&pag->pagb_lock);
-		pag->pagb_list = kmem_zalloc(XFS_PAGB_NUM_SLOTS *
-					sizeof(xfs_perag_busy_t), KM_SLEEP);
+		pag->pagb_tree = RB_ROOT;
 		pag->pagf_init = 1;
 	}
 #ifdef DEBUG
@@ -2578,128 +2577,162 @@ error0:
  * xfs_alloc_mark_busy - add to the per-ag busy list
  * xfs_alloc_clear_busy - remove an item from the per-ag busy list
  */
-void
-xfs_alloc_mark_busy(xfs_trans_t *tp,
-		    xfs_agnumber_t agno,
-		    xfs_agblock_t bno,
-		    xfs_extlen_t len)
+static void
+xfs_alloc_busy_insert(
+	struct xfs_perag	*pag,
+	xfs_agblock_t		bno,
+	struct rb_node		*node)
 {
-	xfs_mount_t		*mp;
-	xfs_perag_busy_t	*bsy;
-	int			n;
+	struct rb_node		**rbp = &pag->pagb_tree.rb_node;
+	struct rb_node		*parent = NULL;
+	struct xfs_busy_extent	*busyp;
+
+	while (*rbp) {
+		parent = *rbp;
+		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
+
+		if (bno < busyp->bno)
+			rbp = &(*rbp)->rb_left;
+		else if (bno > busyp->bno)
+			rbp = &(*rbp)->rb_right;
+		else
+			BUG();
+	}
 
-	mp = tp->t_mountp;
-	spin_lock(&mp->m_perag[agno].pagb_lock);
+	rb_link_node(node, parent, rbp);
+	rb_insert_color(node, &pag->pagb_tree);
+}
 
-	/* search pagb_list for an open slot */
-	for (bsy = mp->m_perag[agno].pagb_list, n = 0;
-	     n < XFS_PAGB_NUM_SLOTS;
-	     bsy++, n++) {
-		if (bsy->busy_tp == NULL) {
-			break;
-		}
-	}
+void
+xfs_alloc_mark_busy(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_busy_extent	*busyp;
+	struct xfs_perag	*pag;
 
-	if (n < XFS_PAGB_NUM_SLOTS) {
-		bsy = &mp->m_perag[agno].pagb_list[n];
-		mp->m_perag[agno].pagb_count++;
-		TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, n, tp);
-		bsy->busy_start = bno;
-		bsy->busy_length = len;
-		bsy->busy_tp = tp;
-		xfs_trans_add_busy(tp, agno, n);
-	} else {
-		TRACE_BUSY("xfs_alloc_mark_busy", "FULL", agno, bno, len, -1, tp);
+	busyp = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
+	if (!busyp) {
 		/*
-		 * The busy list is full!  Since it is now not possible to
-		 * track the free block, make this a synchronous transaction
-		 * to insure that the block is not reused before this
-		 * transaction commits.
+		 * No Memory!  Since it is now not possible to track the free
+		 * block, make this a synchronous transaction to insure that
+		 * the block is not reused before this transaction commits.
 		 */
+		TRACE_BUSY("xfs_alloc_mark_busy", "ENOMEM", agno, bno, len, tp);
 		xfs_trans_set_sync(tp);
+		return;
 	}
 
-	spin_unlock(&mp->m_perag[agno].pagb_lock);
+	busyp->agno = agno;
+	busyp->bno = bno;
+	busyp->length = len;
+	busyp->tp = tp;
+
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	spin_lock(&pag->pagb_lock);
+	xfs_alloc_busy_insert(pag, bno, &busyp->rb_node);
+	xfs_trans_add_busy(tp, busyp);
+	TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, tp);
+	spin_unlock(&pag->pagb_lock);
+	xfs_perag_put(pag);
+}
+
+/*
+ * Search for a busy extent within the range of the extent we are about to
+ * allocate.  You need to be holding the busy extent tree lock when calling
+ * __xfs_alloc_busy_search().
+ */
+static struct xfs_busy_extent *
+__xfs_alloc_busy_search(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_perag	*pag;
+	struct rb_node		*rbp;
+	xfs_agblock_t		uend, bend;
+
+	uend = bno + len - 1;
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	rbp = pag->pagb_tree.rb_node;
+	while (rbp) {
+		struct xfs_busy_extent	*busyp;
+
+		busyp = rb_entry(rbp, struct xfs_busy_extent, rb_node);
+		bend = busyp->bno + busyp->length - 1;
+		if (uend < busyp->bno)
+			rbp = rbp->rb_left;
+		else if (bno > bend)
+			rbp = rbp->rb_right;
+		else {
+			/* (start1,length1) within (start2, length2) */
+			TRACE_BUSYSEARCH("xfs_alloc_search_busy",
+					 "found1", agno, bno, len, tp);
+			xfs_perag_put(pag);
+			return busyp;
+		}
+	}
+	xfs_perag_put(pag);
+	return NULL;
 }
 
 void
-xfs_alloc_clear_busy(xfs_trans_t *tp,
-		     xfs_agnumber_t agno,
-		     int idx)
+xfs_alloc_clear_busy(
+	struct xfs_trans	*tp,
+	struct xfs_busy_extent	*busyp)
 {
-	xfs_mount_t		*mp;
-	xfs_perag_busy_t	*list;
+	struct xfs_perag	*pag;
 
-	mp = tp->t_mountp;
+	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
+	spin_lock(&pag->pagb_lock);
 
-	spin_lock(&mp->m_perag[agno].pagb_lock);
-	list = mp->m_perag[agno].pagb_list;
+	/* check that the busyp is still in the rbtree */
+	ASSERT(__xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
+						busyp->length) == busyp);
 
-	ASSERT(idx < XFS_PAGB_NUM_SLOTS);
-	if (list[idx].busy_tp == tp) {
-		TRACE_UNBUSY("xfs_alloc_clear_busy", "found", agno, idx, tp);
-		list[idx].busy_tp = NULL;
-		mp->m_perag[agno].pagb_count--;
-	} else {
-		TRACE_UNBUSY("xfs_alloc_clear_busy", "missing", agno, idx, tp);
-	}
+	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
+							busyp->length, tp);
+	rb_erase(&busyp->rb_node, &pag->pagb_tree);
+	spin_unlock(&pag->pagb_lock);
+	xfs_perag_put(pag);
 
-	spin_unlock(&mp->m_perag[agno].pagb_lock);
+	kmem_free(busyp);
 }
 
-
 /*
  * If we find the extent in the busy list, force the log out to get the
  * extent out of the busy list so the caller can use it straight away.
  */
 STATIC void
-xfs_alloc_search_busy(xfs_trans_t *tp,
-		    xfs_agnumber_t agno,
-		    xfs_agblock_t bno,
-		    xfs_extlen_t len)
+xfs_alloc_search_busy(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
 {
-	xfs_mount_t		*mp;
-	xfs_perag_busy_t	*bsy;
-	xfs_agblock_t		uend, bend;
+	struct xfs_perag	*pag;
+	struct xfs_busy_extent	*busyp;
 	xfs_lsn_t		lsn;
-	int			cnt;
-
-	mp = tp->t_mountp;
-
-	spin_lock(&mp->m_perag[agno].pagb_lock);
-	cnt = mp->m_perag[agno].pagb_count;
-
-	uend = bno + len - 1;
 
-	/* search pagb_list for this slot, skipping open slots */
-	for (bsy = mp->m_perag[agno].pagb_list; cnt; bsy++) {
-
-		/*
-		 * (start1,length1) within (start2, length2)
-		 */
-		if (bsy->busy_tp != NULL) {
-			bend = bsy->busy_start + bsy->busy_length - 1;
-			if ((bno > bend) || (uend < bsy->busy_start)) {
-				cnt--;
-			} else {
-				TRACE_BUSYSEARCH("xfs_alloc_search_busy",
-					 "found1", agno, bno, len, tp);
-				break;
-			}
-		}
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	spin_lock(&pag->pagb_lock);
+	busyp = __xfs_alloc_busy_search(tp, agno, bno, len);
+	if (!busyp) {
+		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "not-found", agno, bno, len, tp);
+		spin_unlock(&pag->pagb_lock);
+		xfs_perag_put(pag);
+		return;
 	}
-
 	/*
-	 * If a block was found, force the log through the LSN of the
+	 * A block was found, force the log through the LSN of the
 	 * transaction that freed the block
 	 */
-	if (cnt) {
-		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "found", agno, bno, len, tp);
-		lsn = bsy->busy_tp->t_commit_lsn;
-		spin_unlock(&mp->m_perag[agno].pagb_lock);
-		xfs_log_force(mp, lsn, XFS_LOG_FORCE|XFS_LOG_SYNC);
-	} else {
-		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "not-found", agno, bno, len, tp);
-		spin_unlock(&mp->m_perag[agno].pagb_lock);
-	}
+	TRACE_BUSYSEARCH("xfs_alloc_search_busy", "found", agno, bno, len, tp);
+	lsn = busyp->tp->t_commit_lsn;
+	spin_unlock(&pag->pagb_lock);
+	xfs_log_force(tp->t_mountp, lsn, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	xfs_perag_put(pag);
 }
diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
index 5881727..21040af 100644
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -22,6 +22,7 @@ struct xfs_buf;
 struct xfs_mount;
 struct xfs_perag;
 struct xfs_trans;
+struct xfs_busy_extent;
 
 /*
  * Freespace allocation types.  Argument to xfs_alloc_[v]extent.
@@ -128,9 +129,7 @@ xfs_alloc_mark_busy(xfs_trans_t *tp,
 		xfs_extlen_t len);
 
 void
-xfs_alloc_clear_busy(xfs_trans_t *tp,
-		xfs_agnumber_t ag,
-		int idx);
+xfs_alloc_clear_busy(xfs_trans_t *tp, struct xfs_busy_extent *busyp);
 
 #endif	/* __KERNEL__ */
 
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 794a4e2..2239685 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -131,14 +131,8 @@ STATIC void
 xfs_free_perag(
 	xfs_mount_t	*mp)
 {
-	if (mp->m_perag) {
-		int	agno;
-
-		for (agno = 0; agno < mp->m_maxagi; agno++)
-			if (mp->m_perag[agno].pagb_list)
-				kmem_free(mp->m_perag[agno].pagb_list);
+	if (mp->m_perag)
 		kmem_free(mp->m_perag);
-	}
 }
 
 /*
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index ad137ef..e88b9bf 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1338,10 +1338,8 @@ xfs_trans_committed(
 	lbcp = &tp->t_busy;
 	while (lbcp != NULL) {
 		for (i = 0, lbsp = lbcp->lbc_busy; i < lbcp->lbc_unused; i++, lbsp++) {
-			if (!XFS_LBC_ISFREE(lbcp, i)) {
-				xfs_alloc_clear_busy(tp, lbsp->lbc_ag,
-						     lbsp->lbc_idx);
-			}
+			if (!XFS_LBC_ISFREE(lbcp, i))
+				xfs_alloc_clear_busy(tp, lbsp->lbc_busyp);
 		}
 		lbcp = lbcp->lbc_next;
 	}
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index d6fe4a8..6a5e8d5 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -762,6 +762,7 @@ struct xfs_log_item_desc;
 struct xfs_mount;
 struct xfs_trans;
 struct xfs_dquot_acct;
+struct xfs_busy_extent;
 
 typedef struct xfs_log_item {
 	struct list_head		li_ail;		/* AIL pointers */
@@ -824,8 +825,7 @@ typedef struct xfs_item_ops {
  */
 
 typedef struct xfs_log_busy_slot {
-	xfs_agnumber_t		lbc_ag;
-	ushort			lbc_idx;	/* index in perag.busy[] */
+	struct xfs_busy_extent	*lbc_busyp;
 } xfs_log_busy_slot_t;
 
 #define XFS_LBC_NUM_SLOTS	31
@@ -971,9 +971,9 @@ int		_xfs_trans_commit(xfs_trans_t *,
 void		xfs_trans_cancel(xfs_trans_t *, int);
 int		xfs_trans_ail_init(struct xfs_mount *);
 void		xfs_trans_ail_destroy(struct xfs_mount *);
-xfs_log_busy_slot_t *xfs_trans_add_busy(xfs_trans_t *tp,
-					xfs_agnumber_t ag,
-					xfs_extlen_t idx);
+
+xfs_log_busy_slot_t *xfs_trans_add_busy(struct xfs_trans *tp,
+					struct xfs_busy_extent *busyp);
 
 extern kmem_zone_t	*xfs_trans_zone;
 
diff --git a/fs/xfs/xfs_trans_item.c b/fs/xfs/xfs_trans_item.c
index e110bf5..3baa0af 100644
--- a/fs/xfs/xfs_trans_item.c
+++ b/fs/xfs/xfs_trans_item.c
@@ -449,7 +449,9 @@ xfs_trans_unlock_chunk(
  * descriptor with its ???? field.
  */
 xfs_log_busy_slot_t *
-xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
+xfs_trans_add_busy(
+	xfs_trans_t		*tp,
+	struct xfs_busy_extent	*busyp)
 {
 	xfs_log_busy_chunk_t	*lbcp;
 	xfs_log_busy_slot_t	*lbsp;
@@ -479,16 +481,8 @@ xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
 		tp->t_busy.lbc_next = lbcp;
 		tp->t_busy_free = XFS_LIC_NUM_SLOTS - 1;
 
-		/*
-		 * Initialize the descriptor and the generic portion
-		 * of the log item.
-		 *
-		 * Point the new slot at this item and return it.
-		 * Also point the log item at its currently active
-		 * descriptor and set the item's mount pointer.
-		 */
-		lbsp->lbc_ag = ag;
-		lbsp->lbc_idx = idx;
+		/* Initialize the descriptor and return it */
+		lbsp->lbc_busyp = busyp;
 		return lbsp;
 	}
 
@@ -521,8 +515,7 @@ xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
 	}
 	lbsp = XFS_LBC_SLOT(lbcp, i);
 	tp->t_busy_free--;
-	lbsp->lbc_ag = ag;
-	lbsp->lbc_idx = idx;
+	lbsp->lbc_busyp = busyp;
 	return lbsp;
 }
 
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 73e2ad3..fd50556 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -39,9 +39,6 @@ void				xfs_trans_free_items(struct xfs_trans *, int);
 void				xfs_trans_unlock_items(struct xfs_trans *,
 							xfs_lsn_t);
 void				xfs_trans_free_busy(xfs_trans_t *tp);
-xfs_log_busy_slot_t		*xfs_trans_add_busy(xfs_trans_t *tp,
-						    xfs_agnumber_t ag,
-						    xfs_extlen_t idx);
 
 /*
  * AIL traversal cursor.
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] XFS: Don't immediately reallocate busy extents
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
  2008-10-07 22:09 ` [PATCH 1/7] XFS: rename xfs_get_perag Dave Chinner
  2008-10-07 22:09 ` [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-07 22:09 ` [PATCH 4/7] XFS: Don't use log forces when busy extents are allocated Dave Chinner
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

Every time we reallocate a busy extent, we cause a synchronous log
force to occur to ensure the freeing transaction is on disk before
we continue and use the newly allocated extent. This is extremely
sub-optimal as it means we are holding the AGF locked over a
blocking log force that could take several hundred milliseconds
to complete. This prevents us from making progress and other
allocations and freeing from ocurring in that AG for that period
of time.

Instead of searching the busy extent list after deciding on the
extent to allocate, check each candidate extent during the
allocation decisions as to whether they are in the busy list.
If they are in the busy list, we don't consider that extent a
good choice to allocate. This will have relatively low overhead
because the rbtree is indexed by block number so searches should
find a busy extent very quickly.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_alloc.c |  779 +++++++++++++++++++++++++---------------------------
 1 files changed, 380 insertions(+), 399 deletions(-)

diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 98908ba..77dc18e 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -45,12 +45,6 @@
 #define	XFSA_FIXUP_BNO_OK	1
 #define	XFSA_FIXUP_CNT_OK	2
 
-STATIC void
-xfs_alloc_search_busy(xfs_trans_t *tp,
-		    xfs_agnumber_t agno,
-		    xfs_agblock_t bno,
-		    xfs_extlen_t len);
-
 #if defined(XFS_ALLOC_TRACE)
 ktrace_t *xfs_alloc_trace_buf;
 
@@ -85,6 +79,178 @@ STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
 STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
 	xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
 
+
+
+/*
+ * AG Busy list management
+ * The busy list contains block ranges that have been freed but whose
+ * transactions have not yet hit disk.  If any block listed in a busy
+ * list is reused, the transaction that freed it must be forced to disk
+ * before continuing to use the block.
+ *
+ * xfs_alloc_mark_busy - add to the per-ag busy list
+ * xfs_alloc_clear_busy - remove an item from the per-ag busy list
+ */
+static void
+xfs_alloc_busy_insert(
+	struct xfs_perag	*pag,
+	xfs_agblock_t		bno,
+	struct rb_node		*node)
+{
+	struct rb_node		**rbp = &pag->pagb_tree.rb_node;
+	struct rb_node		*parent = NULL;
+	struct xfs_busy_extent	*busyp;
+
+	while (*rbp) {
+		parent = *rbp;
+		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
+
+		if (bno < busyp->bno)
+			rbp = &(*rbp)->rb_left;
+		else if (bno > busyp->bno)
+			rbp = &(*rbp)->rb_right;
+		else
+			BUG();
+	}
+
+	rb_link_node(node, parent, rbp);
+	rb_insert_color(node, &pag->pagb_tree);
+}
+
+void
+xfs_alloc_mark_busy(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_busy_extent	*busyp;
+	struct xfs_perag	*pag;
+
+	busyp = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
+	if (!busyp) {
+		/*
+		 * No Memory!  Since it is now not possible to track the free
+		 * block, make this a synchronous transaction to insure that
+		 * the block is not reused before this transaction commits.
+		 */
+		TRACE_BUSY("xfs_alloc_mark_busy", "ENOMEM", agno, bno, len, tp);
+		xfs_trans_set_sync(tp);
+		return;
+	}
+
+	busyp->agno = agno;
+	busyp->bno = bno;
+	busyp->length = len;
+	busyp->tp = tp;
+
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	spin_lock(&pag->pagb_lock);
+	xfs_alloc_busy_insert(pag, bno, &busyp->rb_node);
+	xfs_trans_add_busy(tp, busyp);
+	TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, tp);
+	spin_unlock(&pag->pagb_lock);
+	xfs_perag_put(pag);
+}
+
+/*
+ * Search for a busy extent within the range of the extent we are about to
+ * allocate.  You need to be holding the busy extent tree lock when calling
+ * __xfs_alloc_busy_search().
+ */
+static struct xfs_busy_extent *
+__xfs_alloc_busy_search(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_perag	*pag;
+	struct rb_node		*rbp;
+	xfs_agblock_t		uend, bend;
+
+	uend = bno + len - 1;
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	rbp = pag->pagb_tree.rb_node;
+	while (rbp) {
+		struct xfs_busy_extent	*busyp;
+
+		busyp = rb_entry(rbp, struct xfs_busy_extent, rb_node);
+		bend = busyp->bno + busyp->length - 1;
+		if (uend < busyp->bno)
+			rbp = rbp->rb_left;
+		else if (bno > bend)
+			rbp = rbp->rb_right;
+		else {
+			/* (start1,length1) within (start2, length2) */
+			TRACE_BUSYSEARCH("xfs_alloc_search_busy",
+					 "found1", agno, bno, len, tp);
+			xfs_perag_put(pag);
+			return busyp;
+		}
+	}
+	xfs_perag_put(pag);
+	return NULL;
+}
+
+void
+xfs_alloc_clear_busy(
+	struct xfs_trans	*tp,
+	struct xfs_busy_extent	*busyp)
+{
+	struct xfs_perag	*pag;
+
+	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
+	spin_lock(&pag->pagb_lock);
+
+	/* check that the busyp is still in the rbtree */
+	ASSERT(__xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
+						busyp->length) == busyp);
+
+	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
+							busyp->length, tp);
+	rb_erase(&busyp->rb_node, &pag->pagb_tree);
+	spin_unlock(&pag->pagb_lock);
+	xfs_perag_put(pag);
+
+	kmem_free(busyp);
+}
+
+/*
+ * If we find the extent in the busy list, force the log out to get the
+ * extent out of the busy list so the caller can use it straight away.
+ */
+STATIC void
+xfs_alloc_search_busy(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_perag	*pag;
+	struct xfs_busy_extent	*busyp;
+	xfs_lsn_t		lsn;
+
+	pag = xfs_perag_get(tp->t_mountp, agno);
+	spin_lock(&pag->pagb_lock);
+	busyp = __xfs_alloc_busy_search(tp, agno, bno, len);
+	if (!busyp) {
+		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "not-found", agno, bno, len, tp);
+		spin_unlock(&pag->pagb_lock);
+		xfs_perag_put(pag);
+		return;
+	}
+	/*
+	 * A block was found, force the log through the LSN of the
+	 * transaction that freed the block
+	 */
+	TRACE_BUSYSEARCH("xfs_alloc_search_busy", "found", agno, bno, len, tp);
+	lsn = busyp->tp->t_commit_lsn;
+	spin_unlock(&pag->pagb_lock);
+	xfs_log_force(tp->t_mountp, lsn, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	xfs_perag_put(pag);
+}
+
 /*
  * Internal functions.
  */
@@ -705,6 +871,12 @@ xfs_alloc_ag_vextent(
  * Extent's length (returned in *len) will be between minlen and maxlen,
  * and of the form k * prod + mod unless there's nothing that large.
  * Return the starting a.g. block (bno), or NULLAGBLOCK if we can't do it.
+ *
+ * Avoid allocating extents that span a busy extent range to avoid causing a
+ * synchronous log force to be required during the transaction and hence avoid
+ * holding the AGF locked for an excessive period of time. If we are doing low
+ * space allocation, speed is a secondary factor so in that case we will use
+ * busy extents and take the hit....
  */
 STATIC int			/* error */
 xfs_alloc_ag_vextent_exact(
@@ -735,46 +907,36 @@ xfs_alloc_ag_vextent_exact(
 	 */
 	if ((error = xfs_alloc_lookup_le(bno_cur, args->agbno, args->minlen, &i)))
 		goto error0;
-	if (!i) {
-		/*
-		 * Didn't find it, return null.
-		 */
-		xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-		args->agbno = NULLAGBLOCK;
-		return 0;
-	}
-	/*
-	 * Grab the freespace record.
-	 */
-	if ((error = xfs_alloc_get_rec(bno_cur, &fbno, &flen, &i)))
+	if (!i)
+		goto not_found;
+
+	/* Grab a non-busy freespace record. */
+	error = xfs_alloc_get_rec(bno_cur, &fbno, &flen, &i);
+	if (error)
 		goto error0;
+	if (__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+		goto not_found;
 	XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 	ASSERT(fbno <= args->agbno);
+
+	/* Give up if the freespace isn't long enough for the minimum request. */
 	minend = args->agbno + args->minlen;
 	maxend = args->agbno + args->maxlen;
 	fend = fbno + flen;
-	/*
-	 * Give up if the freespace isn't long enough for the minimum request.
-	 */
-	if (fend < minend) {
-		xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-		args->agbno = NULLAGBLOCK;
-		return 0;
-	}
+	if (fend < minend)
+		goto not_found;
 	/*
 	 * End of extent will be smaller of the freespace end and the
 	 * maximal requested end.
 	 */
 	end = XFS_AGBLOCK_MIN(fend, maxend);
-	/*
-	 * Fix the length according to mod and prod if given.
-	 */
+
+	/* Fix the length according to mod and prod if given. */
 	args->len = end - args->agbno;
 	xfs_alloc_fix_len(args);
-	if (!xfs_alloc_fix_minleft(args)) {
-		xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
-		return 0;
-	}
+	if (!xfs_alloc_fix_minleft(args))
+		goto not_found;
+
 	rlen = args->len;
 	ASSERT(args->agbno + rlen <= fend);
 	end = args->agbno + rlen;
@@ -797,6 +959,13 @@ xfs_alloc_ag_vextent_exact(
 	args->wasfromfl = 0;
 	return 0;
 
+not_found:
+	/* Didn't find it, return null. */
+	xfs_btree_del_cursor(bno_cur, XFS_BTREE_NOERROR);
+	args->agbno = NULLAGBLOCK;
+	TRACE_ALLOC("not found", args);
+	return 0;
+
 error0:
 	xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
 	TRACE_ALLOC("error", args);
@@ -804,10 +973,104 @@ error0:
 }
 
 /*
+ * Search the btree in a given direction via the search cursor
+ * and compare the records found against the good extent we've
+ * already found.
+ */
+STATIC int
+xfs_alloc_find_best_extent(
+	xfs_alloc_arg_t	*args,
+	xfs_btree_cur_t	**gcur,		/* good cursor */
+	xfs_btree_cur_t	**scur,		/* searching cursor */
+	xfs_agblock_t	gdiff,		/* difference for search comparison */
+	xfs_agblock_t	*sbno,		/* extent found by search */
+	xfs_extlen_t	*slen,
+	xfs_extlen_t	*slena,		/* aligned length */
+	int		dir)	/* 0 = search right, 1 = search left */
+{
+	xfs_agblock_t	bno;
+	xfs_agblock_t	new;
+	xfs_agblock_t	sdiff;
+	int		i;
+	int		error;
+
+	/* The good extent is perfect, no need to  search. */
+	if (!gdiff)
+		goto out_use_good;
+
+	/*
+	 * Look until we find a better one, run out of
+	 * space, or run off the end.
+	 */
+	do {
+		error = xfs_alloc_get_rec(*scur, sbno, slen, &i);
+		if (error)
+			goto error0;
+		XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+		xfs_alloc_compute_aligned(*sbno, *slen, args->alignment,
+					args->minlen, &bno, slena);
+		if (__xfs_alloc_busy_search(args->tp, args->agno, bno, *slena)) {
+			/* just freed - skip this one */
+			goto next_record;
+		}
+
+		/* The good extent is closer than this one */
+		if (!dir) {
+			if (bno >= args->agbno + gdiff)
+				goto out_use_good;
+		} else {
+			if (bno <= args->agbno - gdiff)
+				goto out_use_good;
+		}
+
+		/* Same distance, compare length and pick the best. */
+		if (*slena >= args->minlen) {
+			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
+			xfs_alloc_fix_len(args);
+			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
+					args->alignment, *sbno, *slen, &new);
+
+			/* choose closer size and invalidate other cursor */
+			if (sdiff < gdiff)
+				goto out_use_search;
+			goto out_use_good;
+		}
+next_record:
+		if (!dir)
+			error = xfs_btree_increment(*scur, 0, &i);
+		else
+			error = xfs_btree_decrement(*scur, 0, &i);
+		if (error)
+			goto error0;
+	} while (i);
+
+out_use_good:
+	xfs_btree_del_cursor(*scur, XFS_BTREE_NOERROR);
+	*scur = NULL;
+	return 0;
+
+out_use_search:
+	xfs_btree_del_cursor(*gcur, XFS_BTREE_NOERROR);
+	*gcur = NULL;
+	return 0;
+
+error0:
+	/* caller invalidates cursors */
+	return error;
+
+}
+
+/*
  * Allocate a variable extent near bno in the allocation group agno.
  * Extent's length (returned in len) will be between minlen and maxlen,
  * and of the form k * prod + mod unless there's nothing that large.
  * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
+ *
+ * Avoid allocating extents that span a busy extent range to avoid causing a
+ * synchronous log force to be required during the transaction and hence avoid
+ * holding the AGF locked for an excessive period of time. If we are doing low
+ * space allocation, speed is a secondary factor so in that case we will use
+ * busy extents and take the hit....
  */
 STATIC int				/* error */
 xfs_alloc_ag_vextent_near(
@@ -926,6 +1189,8 @@ xfs_alloc_ag_vextent_near(
 					args->minlen, &ltbnoa, &ltlena);
 			if (ltlena < args->minlen)
 				continue;
+			if (__xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
+				continue;
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			ASSERT(args->len >= args->minlen);
@@ -1045,8 +1310,15 @@ xfs_alloc_ag_vextent_near(
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
 					args->minlen, &ltbnoa, &ltlena);
-			if (ltlena >= args->minlen)
+			if (ltlena >= args->minlen &&
+			    !__xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
 				break;
+			/*
+			 * clear the length so we don't accidentally use this
+			 * extent after we decrement the cursor then find a
+			 * match from the other cursor.
+			 */
+			ltlena = 0;
 			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
 				goto error0;
 			if (!i) {
@@ -1061,8 +1333,10 @@ xfs_alloc_ag_vextent_near(
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			xfs_alloc_compute_aligned(gtbno, gtlen, args->alignment,
 					args->minlen, &gtbnoa, &gtlena);
-			if (gtlena >= args->minlen)
+			if (gtlena >= args->minlen &&
+			    !__xfs_alloc_busy_search(args->tp, args->agno, gtbnoa, gtlena))
 				break;
+			gtlena = 0;
 			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
 				goto error0;
 			if (!i) {
@@ -1072,201 +1346,39 @@ xfs_alloc_ag_vextent_near(
 			}
 		}
 	} while (bno_cur_lt || bno_cur_gt);
-	/*
-	 * Got both cursors still active, need to find better entry.
-	 */
+
+	/* Got both cursors still active, need to find better entry. */
 	if (bno_cur_lt && bno_cur_gt) {
-		/*
-		 * Left side is long enough, look for a right side entry.
-		 */
 		if (ltlena >= args->minlen) {
-			/*
-			 * Fix up the length.
-			 */
+			/* Left side is good, look for a right side entry. */
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			rlen = args->len;
 			ltdiff = xfs_alloc_compute_diff(args->agbno, rlen,
 				args->alignment, ltbno, ltlen, &ltnew);
-			/*
-			 * Not perfect.
-			 */
-			if (ltdiff) {
-				/*
-				 * Look until we find a better one, run out of
-				 * space, or run off the end.
-				 */
-				while (bno_cur_lt && bno_cur_gt) {
-					if ((error = xfs_alloc_get_rec(
-							bno_cur_gt, &gtbno,
-							&gtlen, &i)))
-						goto error0;
-					XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-					xfs_alloc_compute_aligned(gtbno, gtlen,
-						args->alignment, args->minlen,
-						&gtbnoa, &gtlena);
-					/*
-					 * The left one is clearly better.
-					 */
-					if (gtbnoa >= args->agbno + ltdiff) {
-						xfs_btree_del_cursor(
-							bno_cur_gt,
-							XFS_BTREE_NOERROR);
-						bno_cur_gt = NULL;
-						break;
-					}
-					/*
-					 * If we reach a big enough entry,
-					 * compare the two and pick the best.
-					 */
-					if (gtlena >= args->minlen) {
-						args->len =
-							XFS_EXTLEN_MIN(gtlena,
-								args->maxlen);
-						xfs_alloc_fix_len(args);
-						rlen = args->len;
-						gtdiff = xfs_alloc_compute_diff(
-							args->agbno, rlen,
-							args->alignment,
-							gtbno, gtlen, &gtnew);
-						/*
-						 * Right side is better.
-						 */
-						if (gtdiff < ltdiff) {
-							xfs_btree_del_cursor(
-								bno_cur_lt,
-								XFS_BTREE_NOERROR);
-							bno_cur_lt = NULL;
-						}
-						/*
-						 * Left side is better.
-						 */
-						else {
-							xfs_btree_del_cursor(
-								bno_cur_gt,
-								XFS_BTREE_NOERROR);
-							bno_cur_gt = NULL;
-						}
-						break;
-					}
-					/*
-					 * Fell off the right end.
-					 */
-					if ((error = xfs_btree_increment(
-							bno_cur_gt, 0, &i)))
-						goto error0;
-					if (!i) {
-						xfs_btree_del_cursor(
-							bno_cur_gt,
-							XFS_BTREE_NOERROR);
-						bno_cur_gt = NULL;
-						break;
-					}
-				}
-			}
-			/*
-			 * The left side is perfect, trash the right side.
-			 */
-			else {
-				xfs_btree_del_cursor(bno_cur_gt,
-						     XFS_BTREE_NOERROR);
-				bno_cur_gt = NULL;
-			}
-		}
-		/*
-		 * It's the right side that was found first, look left.
-		 */
-		else {
-			/*
-			 * Fix up the length.
-			 */
+
+			error = xfs_alloc_find_best_extent(args,
+						&bno_cur_lt, &bno_cur_gt,
+						ltdiff, &gtbno, &gtlen, &gtlena,
+						0 /* search right */);
+			if (error)
+				goto error0;
+		} else {
+			ASSERT(gtlena >= args->minlen);
+
+			/* Right side is good, look for a left side entry. */
 			args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			rlen = args->len;
 			gtdiff = xfs_alloc_compute_diff(args->agbno, rlen,
 				args->alignment, gtbno, gtlen, &gtnew);
-			/*
-			 * Right side entry isn't perfect.
-			 */
-			if (gtdiff) {
-				/*
-				 * Look until we find a better one, run out of
-				 * space, or run off the end.
-				 */
-				while (bno_cur_lt && bno_cur_gt) {
-					if ((error = xfs_alloc_get_rec(
-							bno_cur_lt, &ltbno,
-							&ltlen, &i)))
-						goto error0;
-					XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-					xfs_alloc_compute_aligned(ltbno, ltlen,
-						args->alignment, args->minlen,
-						&ltbnoa, &ltlena);
-					/*
-					 * The right one is clearly better.
-					 */
-					if (ltbnoa <= args->agbno - gtdiff) {
-						xfs_btree_del_cursor(
-							bno_cur_lt,
-							XFS_BTREE_NOERROR);
-						bno_cur_lt = NULL;
-						break;
-					}
-					/*
-					 * If we reach a big enough entry,
-					 * compare the two and pick the best.
-					 */
-					if (ltlena >= args->minlen) {
-						args->len = XFS_EXTLEN_MIN(
-							ltlena, args->maxlen);
-						xfs_alloc_fix_len(args);
-						rlen = args->len;
-						ltdiff = xfs_alloc_compute_diff(
-							args->agbno, rlen,
-							args->alignment,
-							ltbno, ltlen, &ltnew);
-						/*
-						 * Left side is better.
-						 */
-						if (ltdiff < gtdiff) {
-							xfs_btree_del_cursor(
-								bno_cur_gt,
-								XFS_BTREE_NOERROR);
-							bno_cur_gt = NULL;
-						}
-						/*
-						 * Right side is better.
-						 */
-						else {
-							xfs_btree_del_cursor(
-								bno_cur_lt,
-								XFS_BTREE_NOERROR);
-							bno_cur_lt = NULL;
-						}
-						break;
-					}
-					/*
-					 * Fell off the left end.
-					 */
-					if ((error = xfs_btree_decrement(
-							bno_cur_lt, 0, &i)))
-						goto error0;
-					if (!i) {
-						xfs_btree_del_cursor(bno_cur_lt,
-							XFS_BTREE_NOERROR);
-						bno_cur_lt = NULL;
-						break;
-					}
-				}
-			}
-			/*
-			 * The right side is perfect, trash the left side.
-			 */
-			else {
-				xfs_btree_del_cursor(bno_cur_lt,
-					XFS_BTREE_NOERROR);
-				bno_cur_lt = NULL;
-			}
+
+			error = xfs_alloc_find_best_extent(args,
+						&bno_cur_gt, &bno_cur_lt,
+						gtdiff, &ltbno, &ltlen, &ltlena,
+						1 /* search left */);
+			if (error)
+				goto error0;
 		}
 	}
 	/*
@@ -1287,7 +1399,7 @@ xfs_alloc_ag_vextent_near(
 		bno_cur_lt = bno_cur_gt;
 		bno_cur_gt = NULL;
 		ltbno = gtbno;
-		ltbnoa = gtbnoa;
+		//ltbnoa = gtbnoa;
 		ltlen = gtlen;
 		ltlena = gtlena;
 		j = 1;
@@ -1332,10 +1444,16 @@ xfs_alloc_ag_vextent_near(
 }
 
 /*
- * Allocate a variable extent anywhere in the allocation group agno.
- * Extent's length (returned in len) will be between minlen and maxlen,
- * and of the form k * prod + mod unless there's nothing that large.
- * Return the starting a.g. block, or NULLAGBLOCK if we can't do it.
+ * Allocate a variable extent anywhere in the allocation group agno.  Extent's
+ * length (returned in len) will be between minlen and maxlen, and of the form
+ * k * prod + mod unless there's nothing that large.  Return the starting a.g.
+ * block, or NULLAGBLOCK if we can't do it.
+ *
+ * Avoid allocating extents that span a busy extent range to avoid causing a
+ * synchronous log force to be required during the transaction and hence avoid
+ * holding the AGF locked for an excessive period of time. If we are doing low
+ * space allocation, speed is a secondary factor so in that case we will use
+ * busy extents and take the hit....
  */
 STATIC int				/* error */
 xfs_alloc_ag_vextent_size(
@@ -1349,10 +1467,12 @@ xfs_alloc_ag_vextent_size(
 	int		i;		/* temp status variable */
 	xfs_agblock_t	rbno;		/* returned block number */
 	xfs_extlen_t	rlen;		/* length of returned extent */
+	int		check_busy = 1;
 
 	/*
 	 * Allocate and initialize a cursor for the by-size btree.
 	 */
+restart:
 	cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
 		args->agno, XFS_BTNUM_CNT);
 	bno_cur = NULL;
@@ -1381,9 +1501,31 @@ xfs_alloc_ag_vextent_size(
 	 * There's a freespace as big as maxlen+alignment-1, get it.
 	 */
 	else {
-		if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i)))
-			goto error0;
-		XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+		/*
+		 * Search for a non-busy extent that is large enough.
+		 * If we are at low space, don't check, or if we fall off
+		 * the end of the btree, turn off the busy check and
+		 * restart.
+		 */
+		for (;;) {
+			error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
+			if (error)
+				goto error0;
+			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+			if (!check_busy ||
+			    !__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+				break;
+			error = xfs_btree_increment(cnt_cur, 0, &i);
+			if (error)
+				goto error0;
+			if (i == 0) {
+				ASSERT(check_busy);
+				check_busy = 0;
+				xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+				TRACE_ALLOC("busy restart", args);
+				goto restart;
+			}
+		}
 	}
 	/*
 	 * In the first case above, we got the last entry in the
@@ -1407,16 +1549,25 @@ xfs_alloc_ag_vextent_size(
 		bestflen = flen;
 		bestfbno = fbno;
 		for (;;) {
-			if ((error = xfs_btree_decrement(cnt_cur, 0, &i)))
+			error = xfs_btree_decrement(cnt_cur, 0, &i);
+			if (error)
 				goto error0;
 			if (i == 0)
 				break;
-			if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen,
-					&i)))
+			error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
+			if (error)
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			if (flen < bestrlen)
 				break;
+
+			/*
+			 * avoid busy candidate extents as they are clearly not
+			 * a better choice than other extents due to the log
+			 * force penalty of using them.
+			 */
+			if (__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+				continue;
 			xfs_alloc_compute_aligned(fbno, flen, args->alignment,
 				args->minlen, &rbno, &rlen);
 			rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
@@ -2566,173 +2717,3 @@ error0:
 	return error;
 }
 
-
-/*
- * AG Busy list management
- * The busy list contains block ranges that have been freed but whose
- * transactions have not yet hit disk.  If any block listed in a busy
- * list is reused, the transaction that freed it must be forced to disk
- * before continuing to use the block.
- *
- * xfs_alloc_mark_busy - add to the per-ag busy list
- * xfs_alloc_clear_busy - remove an item from the per-ag busy list
- */
-static void
-xfs_alloc_busy_insert(
-	struct xfs_perag	*pag,
-	xfs_agblock_t		bno,
-	struct rb_node		*node)
-{
-	struct rb_node		**rbp = &pag->pagb_tree.rb_node;
-	struct rb_node		*parent = NULL;
-	struct xfs_busy_extent	*busyp;
-
-	while (*rbp) {
-		parent = *rbp;
-		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
-
-		if (bno < busyp->bno)
-			rbp = &(*rbp)->rb_left;
-		else if (bno > busyp->bno)
-			rbp = &(*rbp)->rb_right;
-		else
-			BUG();
-	}
-
-	rb_link_node(node, parent, rbp);
-	rb_insert_color(node, &pag->pagb_tree);
-}
-
-void
-xfs_alloc_mark_busy(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
-{
-	struct xfs_busy_extent	*busyp;
-	struct xfs_perag	*pag;
-
-	busyp = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
-	if (!busyp) {
-		/*
-		 * No Memory!  Since it is now not possible to track the free
-		 * block, make this a synchronous transaction to insure that
-		 * the block is not reused before this transaction commits.
-		 */
-		TRACE_BUSY("xfs_alloc_mark_busy", "ENOMEM", agno, bno, len, tp);
-		xfs_trans_set_sync(tp);
-		return;
-	}
-
-	busyp->agno = agno;
-	busyp->bno = bno;
-	busyp->length = len;
-	busyp->tp = tp;
-
-	pag = xfs_perag_get(tp->t_mountp, agno);
-	spin_lock(&pag->pagb_lock);
-	xfs_alloc_busy_insert(pag, bno, &busyp->rb_node);
-	xfs_trans_add_busy(tp, busyp);
-	TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, tp);
-	spin_unlock(&pag->pagb_lock);
-	xfs_perag_put(pag);
-}
-
-/*
- * Search for a busy extent within the range of the extent we are about to
- * allocate.  You need to be holding the busy extent tree lock when calling
- * __xfs_alloc_busy_search().
- */
-static struct xfs_busy_extent *
-__xfs_alloc_busy_search(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
-{
-	struct xfs_perag	*pag;
-	struct rb_node		*rbp;
-	xfs_agblock_t		uend, bend;
-
-	uend = bno + len - 1;
-	pag = xfs_perag_get(tp->t_mountp, agno);
-	rbp = pag->pagb_tree.rb_node;
-	while (rbp) {
-		struct xfs_busy_extent	*busyp;
-
-		busyp = rb_entry(rbp, struct xfs_busy_extent, rb_node);
-		bend = busyp->bno + busyp->length - 1;
-		if (uend < busyp->bno)
-			rbp = rbp->rb_left;
-		else if (bno > bend)
-			rbp = rbp->rb_right;
-		else {
-			/* (start1,length1) within (start2, length2) */
-			TRACE_BUSYSEARCH("xfs_alloc_search_busy",
-					 "found1", agno, bno, len, tp);
-			xfs_perag_put(pag);
-			return busyp;
-		}
-	}
-	xfs_perag_put(pag);
-	return NULL;
-}
-
-void
-xfs_alloc_clear_busy(
-	struct xfs_trans	*tp,
-	struct xfs_busy_extent	*busyp)
-{
-	struct xfs_perag	*pag;
-
-	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
-	spin_lock(&pag->pagb_lock);
-
-	/* check that the busyp is still in the rbtree */
-	ASSERT(__xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
-						busyp->length) == busyp);
-
-	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
-							busyp->length, tp);
-	rb_erase(&busyp->rb_node, &pag->pagb_tree);
-	spin_unlock(&pag->pagb_lock);
-	xfs_perag_put(pag);
-
-	kmem_free(busyp);
-}
-
-/*
- * If we find the extent in the busy list, force the log out to get the
- * extent out of the busy list so the caller can use it straight away.
- */
-STATIC void
-xfs_alloc_search_busy(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
-{
-	struct xfs_perag	*pag;
-	struct xfs_busy_extent	*busyp;
-	xfs_lsn_t		lsn;
-
-	pag = xfs_perag_get(tp->t_mountp, agno);
-	spin_lock(&pag->pagb_lock);
-	busyp = __xfs_alloc_busy_search(tp, agno, bno, len);
-	if (!busyp) {
-		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "not-found", agno, bno, len, tp);
-		spin_unlock(&pag->pagb_lock);
-		xfs_perag_put(pag);
-		return;
-	}
-	/*
-	 * A block was found, force the log through the LSN of the
-	 * transaction that freed the block
-	 */
-	TRACE_BUSYSEARCH("xfs_alloc_search_busy", "found", agno, bno, len, tp);
-	lsn = busyp->tp->t_commit_lsn;
-	spin_unlock(&pag->pagb_lock);
-	xfs_log_force(tp->t_mountp, lsn, XFS_LOG_FORCE|XFS_LOG_SYNC);
-	xfs_perag_put(pag);
-}
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] XFS: Don't use log forces when busy extents are allocated
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
                   ` (2 preceding siblings ...)
  2008-10-07 22:09 ` [PATCH 3/7] XFS: Don't immediately reallocate busy extents Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-07 22:09 ` [PATCH 5/7] XFS: Do not classify freed allocation btree blocks as busy Dave Chinner
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

Even though we try to avoid busy extents, there are still situations
where we can't avoid them (such as allocation from the AGFL). To
minimise the overhead of such allocation behaviour, we mark the
transaction doing the allocation as synchronous rather than
triggering a log force to get the previous freeing transaction on
disk.

The synchronous transaction provides the same guarantees as a
synchronous log force because it ensures that all the transactions
are on disk. i.e. it preserves the free->allocate order of the
extent correctly in recovery.

The big advantage to this method comes from the fact that we don't
hold the AGF locked during the writeback of the log buffers during
transaction commit. Hence we can be doing further allocations while
the synchronous transaction is committing, unlike the current
synchronous log force case.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_ag.h    |    1 -
 fs/xfs/xfs_alloc.c |   93 ++++++++++++++++++++--------------------------------
 2 files changed, 36 insertions(+), 58 deletions(-)

diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index e7aaa1c..7048d3d 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -164,7 +164,6 @@ struct xfs_busy_extent {
 	xfs_agnumber_t	agno;
 	xfs_agblock_t	bno;
 	xfs_extlen_t	length;
-	struct xfs_trans *tp;	/* transaction that did the free */
 };
 
 /*
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 77dc18e..f9d092e 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -142,7 +142,6 @@ xfs_alloc_mark_busy(
 	busyp->agno = agno;
 	busyp->bno = bno;
 	busyp->length = len;
-	busyp->tp = tp;
 
 	pag = xfs_perag_get(tp->t_mountp, agno);
 	spin_lock(&pag->pagb_lock);
@@ -155,11 +154,10 @@ xfs_alloc_mark_busy(
 
 /*
  * Search for a busy extent within the range of the extent we are about to
- * allocate.  You need to be holding the busy extent tree lock when calling
- * __xfs_alloc_busy_search().
+ * allocate.
  */
 static struct xfs_busy_extent *
-__xfs_alloc_busy_search(
+xfs_alloc_busy_search(
 	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
@@ -171,6 +169,7 @@ __xfs_alloc_busy_search(
 
 	uend = bno + len - 1;
 	pag = xfs_perag_get(tp->t_mountp, agno);
+	spin_lock(&pag->pagb_lock);
 	rbp = pag->pagb_tree.rb_node;
 	while (rbp) {
 		struct xfs_busy_extent	*busyp;
@@ -183,12 +182,14 @@ __xfs_alloc_busy_search(
 			rbp = rbp->rb_right;
 		else {
 			/* (start1,length1) within (start2, length2) */
-			TRACE_BUSYSEARCH("xfs_alloc_search_busy",
+			TRACE_BUSYSEARCH("xfs_alloc_busy_search",
 					 "found1", agno, bno, len, tp);
+			spin_unlock(&pag->pagb_lock);
 			xfs_perag_put(pag);
 			return busyp;
 		}
 	}
+	spin_unlock(&pag->pagb_lock);
 	xfs_perag_put(pag);
 	return NULL;
 }
@@ -200,13 +201,12 @@ xfs_alloc_clear_busy(
 {
 	struct xfs_perag	*pag;
 
-	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
-	spin_lock(&pag->pagb_lock);
-
 	/* check that the busyp is still in the rbtree */
-	ASSERT(__xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
+	ASSERT(xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
 						busyp->length) == busyp);
 
+	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
+	spin_lock(&pag->pagb_lock);
 	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
 							busyp->length, tp);
 	rb_erase(&busyp->rb_node, &pag->pagb_tree);
@@ -217,41 +217,6 @@ xfs_alloc_clear_busy(
 }
 
 /*
- * If we find the extent in the busy list, force the log out to get the
- * extent out of the busy list so the caller can use it straight away.
- */
-STATIC void
-xfs_alloc_search_busy(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
-{
-	struct xfs_perag	*pag;
-	struct xfs_busy_extent	*busyp;
-	xfs_lsn_t		lsn;
-
-	pag = xfs_perag_get(tp->t_mountp, agno);
-	spin_lock(&pag->pagb_lock);
-	busyp = __xfs_alloc_busy_search(tp, agno, bno, len);
-	if (!busyp) {
-		TRACE_BUSYSEARCH("xfs_alloc_search_busy", "not-found", agno, bno, len, tp);
-		spin_unlock(&pag->pagb_lock);
-		xfs_perag_put(pag);
-		return;
-	}
-	/*
-	 * A block was found, force the log through the LSN of the
-	 * transaction that freed the block
-	 */
-	TRACE_BUSYSEARCH("xfs_alloc_search_busy", "found", agno, bno, len, tp);
-	lsn = busyp->tp->t_commit_lsn;
-	spin_unlock(&pag->pagb_lock);
-	xfs_log_force(tp->t_mountp, lsn, XFS_LOG_FORCE|XFS_LOG_SYNC);
-	xfs_perag_put(pag);
-}
-
-/*
  * Internal functions.
  */
 
@@ -852,9 +817,16 @@ xfs_alloc_ag_vextent(
 			TRACE_MODAGF(NULL, agf, XFS_AGF_FREEBLKS);
 			xfs_alloc_log_agf(args->tp, args->agbp,
 						XFS_AGF_FREEBLKS);
-			/* search the busylist for these blocks */
-			xfs_alloc_search_busy(args->tp, args->agno,
-					args->agbno, args->len);
+			/*
+			 * Search the busylist for these blocks and mark the
+			 * transaction as synchronous if blocks are found. This
+			 * avoids the need to block in due to a synchronous log
+			 * force to ensure correct ordering as the synchronous
+			 * transaction will guarantee that for us.
+			 */
+			if (xfs_alloc_busy_search(args->tp, args->agno,
+						args->agbno, args->len))
+				xfs_trans_set_sync(args->tp);
 		}
 		if (!args->isfl)
 			xfs_trans_mod_sb(args->tp,
@@ -914,7 +886,7 @@ xfs_alloc_ag_vextent_exact(
 	error = xfs_alloc_get_rec(bno_cur, &fbno, &flen, &i);
 	if (error)
 		goto error0;
-	if (__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+	if (xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
 		goto not_found;
 	XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 	ASSERT(fbno <= args->agbno);
@@ -1009,7 +981,7 @@ xfs_alloc_find_best_extent(
 		XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 		xfs_alloc_compute_aligned(*sbno, *slen, args->alignment,
 					args->minlen, &bno, slena);
-		if (__xfs_alloc_busy_search(args->tp, args->agno, bno, *slena)) {
+		if (xfs_alloc_busy_search(args->tp, args->agno, bno, *slena)) {
 			/* just freed - skip this one */
 			goto next_record;
 		}
@@ -1189,7 +1161,7 @@ xfs_alloc_ag_vextent_near(
 					args->minlen, &ltbnoa, &ltlena);
 			if (ltlena < args->minlen)
 				continue;
-			if (__xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
+			if (xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
 				continue;
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
@@ -1311,7 +1283,7 @@ xfs_alloc_ag_vextent_near(
 			xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
 					args->minlen, &ltbnoa, &ltlena);
 			if (ltlena >= args->minlen &&
-			    !__xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
+			    !xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
 				break;
 			/*
 			 * clear the length so we don't accidentally use this
@@ -1334,7 +1306,7 @@ xfs_alloc_ag_vextent_near(
 			xfs_alloc_compute_aligned(gtbno, gtlen, args->alignment,
 					args->minlen, &gtbnoa, &gtlena);
 			if (gtlena >= args->minlen &&
-			    !__xfs_alloc_busy_search(args->tp, args->agno, gtbnoa, gtlena))
+			    !xfs_alloc_busy_search(args->tp, args->agno, gtbnoa, gtlena))
 				break;
 			gtlena = 0;
 			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
@@ -1513,7 +1485,7 @@ restart:
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 			if (!check_busy ||
-			    !__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+			    !xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
 				break;
 			error = xfs_btree_increment(cnt_cur, 0, &i);
 			if (error)
@@ -1566,7 +1538,7 @@ restart:
 			 * a better choice than other extents due to the log
 			 * force penalty of using them.
 			 */
-			if (__xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+			if (xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
 				continue;
 			xfs_alloc_compute_aligned(fbno, flen, args->alignment,
 				args->minlen, &rbno, &rlen);
@@ -2267,10 +2239,17 @@ xfs_alloc_get_freelist(
 	 * and remain there until the freeing transaction is committed to
 	 * disk.  Now that we have allocated blocks, this list must be
 	 * searched to see if a block is being reused.  If one is, then
-	 * the freeing transaction must be pushed to disk NOW by forcing
-	 * to disk all iclogs up that transaction's LSN.
+	 * the freeing transaction must be pushed to disk before this
+	 * transaction.
+	 *
+	 * We do this by setting the current transaction
+	 * to a sync transaction which guarantees that the freeing transaction
+	 * is on disk before this transaction. This is done instead of a
+	 * synchronous log force here so that we don't sit and wait with
+	 * the AGF locked in the transaction during the log force.
 	 */
-	xfs_alloc_search_busy(tp, be32_to_cpu(agf->agf_seqno), bno, 1);
+	if (xfs_alloc_busy_search(tp, be32_to_cpu(agf->agf_seqno), bno, 1))
+		xfs_trans_set_sync(tp);
 	return 0;
 }
 
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] XFS: Do not classify freed allocation btree blocks as busy
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
                   ` (3 preceding siblings ...)
  2008-10-07 22:09 ` [PATCH 4/7] XFS: Don't use log forces when busy extents are allocated Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-07 22:09 ` [PATCH 6/7] XFS: Avoid busy extent ranges rather than the entire extent Dave Chinner
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

During an allocation or freeing of an extent, we occasionally have
an interesting thing happen during a btree update. We may merge a
block as part of a record delete, then allocate it again immediately
afterwards when inserting a modified extent.
xfs_alloc_fixup_trees() does this sort of manipulation to the
btrees, and so can merge then immediately split the tree resulting
the in the same busy block being used from the free list.

Previously, this would trigger a synchronous log force of the entire
log (as the current transaction had a lsn of 0 until committed), but
continue to allow the extent to be used because if it is not on disk
now then it must have been freed and marked busy in this
transaction.

In the case of allocbt blocks, we log the fact that they get moved
to the freelist and we also log them when the move off the free list
back into the free space trees. When we move the blocks off the
freelist back to the trees, we mark the extent busy at that point so
that it doesn't get reused until the transaction is committed.

This means that as long as the block is on the AGFL and is only used
for allocbt blocks, we don't have to force the log to ensure prior
manipulating transactions are on disk as the process of log recovery
will replay the transactions in the correct order.

However, we still need to protect against the fact that as we
approach ENOSPC in an AG we can allocate data and metadata blocks
direct from the AGFL. In this case, we need to treat btree blocks
just like any other busy extent. Hence we still need to track btree
blocks in the busy extent tree, but we need to distinguish them from
normal extents so we can apply the necessary exceptions for btree
block allocation.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_ag.h          |    5 ++
 fs/xfs/xfs_alloc.c       |  181 +++++++++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_alloc.h       |    5 --
 fs/xfs/xfs_alloc_btree.c |   13 ----
 4 files changed, 145 insertions(+), 59 deletions(-)

diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 7048d3d..ec87185 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -164,6 +164,11 @@ struct xfs_busy_extent {
 	xfs_agnumber_t	agno;
 	xfs_agblock_t	bno;
 	xfs_extlen_t	length;
+	int		flags;
+};
+
+enum {
+	XFS_BUSY_EXT_FREELIST = 0x0001,	/* busy extent on freelist from abt */
 };
 
 /*
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index f9d092e..e5fb397 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -90,8 +90,22 @@ STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
  *
  * xfs_alloc_mark_busy - add to the per-ag busy list
  * xfs_alloc_clear_busy - remove an item from the per-ag busy list
+ *
+ * A single transaction can move an allocbt block to the AGFL and
+ * then reallocate it immediately within the same transaction. It
+ * may even get reused in a different transaction. In both these cases,
+ * we don't need to do anything special; all the changes to the buffer
+ * including he fact it has been freed and reallocated will be logged.
+ *
+ * Hence we can provide a special exception for blocks that become
+ * busy via freelist manipulations - if they are going to be re-used
+ * as allocbt blocks then we don't need to wait for transaction
+ * completion to remove the extent from the busy list - we can
+ * do it the moment we pull the extent off the free list. To do this,
+ * though, we need to know that the busy extent was a allocbt block
+ * and record that in the busy extent structure.
  */
-static void
+STATIC void
 xfs_alloc_busy_insert(
 	struct xfs_perag	*pag,
 	xfs_agblock_t		bno,
@@ -105,24 +119,29 @@ xfs_alloc_busy_insert(
 		parent = *rbp;
 		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
 
+		ASSERT(bno != busyp->bno);
 		if (bno < busyp->bno)
 			rbp = &(*rbp)->rb_left;
 		else if (bno > busyp->bno)
 			rbp = &(*rbp)->rb_right;
-		else
-			BUG();
+		else {
+			cmn_err(CE_WARN,
+				"XFS: ignoring duplicate busy extent.\n");
+			ASSERT(0);
+		}
 	}
 
 	rb_link_node(node, parent, rbp);
 	rb_insert_color(node, &pag->pagb_tree);
 }
 
-void
+STATIC void
 xfs_alloc_mark_busy(
 	struct xfs_trans	*tp,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
+	xfs_extlen_t		len,
+	int			freelist)	/* metadata block to freelist */
 {
 	struct xfs_busy_extent	*busyp;
 	struct xfs_perag	*pag;
@@ -142,6 +161,8 @@ xfs_alloc_mark_busy(
 	busyp->agno = agno;
 	busyp->bno = bno;
 	busyp->length = len;
+	if (freelist)
+		busyp->flags |= XFS_BUSY_EXT_FREELIST; 
 
 	pag = xfs_perag_get(tp->t_mountp, agno);
 	spin_lock(&pag->pagb_lock);
@@ -153,23 +174,21 @@ xfs_alloc_mark_busy(
 }
 
 /*
- * Search for a busy extent within the range of the extent we are about to
- * allocate.
+ * Search for a busy extent that overlaps the range of the extent we
+ * are about to allocate.
  */
 static struct xfs_busy_extent *
-xfs_alloc_busy_search(
+__xfs_alloc_busy_search(
 	struct xfs_trans	*tp,
+	struct xfs_perag	*pag,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len)
 {
-	struct xfs_perag	*pag;
 	struct rb_node		*rbp;
 	xfs_agblock_t		uend, bend;
 
 	uend = bno + len - 1;
-	pag = xfs_perag_get(tp->t_mountp, agno);
-	spin_lock(&pag->pagb_lock);
 	rbp = pag->pagb_tree.rb_node;
 	while (rbp) {
 		struct xfs_busy_extent	*busyp;
@@ -181,17 +200,31 @@ xfs_alloc_busy_search(
 		else if (bno > bend)
 			rbp = rbp->rb_right;
 		else {
-			/* (start1,length1) within (start2, length2) */
+			/* (start1,length1) overlaps (start2, length2) */
 			TRACE_BUSYSEARCH("xfs_alloc_busy_search",
 					 "found1", agno, bno, len, tp);
-			spin_unlock(&pag->pagb_lock);
-			xfs_perag_put(pag);
 			return busyp;
 		}
 	}
+	return NULL;
+}
+
+STATIC struct xfs_busy_extent *
+xfs_alloc_busy_search(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len)
+{
+	struct xfs_busy_extent	*busyp;
+	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, agno);
+
+	spin_lock(&pag->pagb_lock);
+	busyp = __xfs_alloc_busy_search(tp, pag, agno, bno, len);
 	spin_unlock(&pag->pagb_lock);
+
 	xfs_perag_put(pag);
-	return NULL;
+	return busyp;
 }
 
 void
@@ -307,6 +340,44 @@ xfs_alloc_get_rec(
 }
 
 /*
+ * If the extent is busy via xfs_alloc_put_freelist(), and we are
+ * currently reusing it via xfs_alloc_get_freelist(), remove the
+ * extent from the busy list. Do this search and remove atomically
+ * so we don't race with transaction completion calling
+ * xfs_alloc_clear_busy().
+ *
+ * We should never get the situation where an extent on the freelist
+ * is busy due to other methods of freeing extents. If it happens,
+ * let the caller handle it (e.g. by using a synchronous log
+ * force to wait for the extent to become unbusy).
+ */
+static struct xfs_busy_extent *
+xfs_alloc_search_clear_busy_freelist(
+	struct xfs_trans	*tp,
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		bno,
+	xfs_extlen_t		len,
+	int			btreeblk)
+{
+	struct xfs_busy_extent	*busyp;
+	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, agno);
+
+	spin_lock(&pag->pagb_lock);
+	busyp = __xfs_alloc_busy_search(tp, pag, agno, bno, len);
+	if (busyp && btreeblk && (busyp->flags & XFS_BUSY_EXT_FREELIST)) {
+		TRACE_UNBUSY("xfs_alloc_search_clear_busy_freelist",
+			"found", busyp->agno, busyp->bno, busyp->length, tp);
+		rb_erase(&busyp->rb_node, &pag->pagb_tree);
+		kmem_free(busyp);
+		busyp = NULL;
+	}
+	spin_unlock(&pag->pagb_lock);
+
+	xfs_perag_put(pag);
+	return busyp;
+}
+
+/*
  * Compute aligned version of the found extent.
  * Takes alignment and min length into account.
  */
@@ -1946,17 +2017,16 @@ xfs_free_ag_extent(
 		agno, bno, len, isfl);
 
 	/*
-	 * Since blocks move to the free list without the coordination
-	 * used in xfs_bmap_finish, we can't allow block to be available
-	 * for reallocation and non-transaction writing (user data)
-	 * until we know that the transaction that moved it to the free
-	 * list is permanently on disk.  We track the blocks by declaring
-	 * these blocks as "busy"; the busy list is maintained on a per-ag
-	 * basis and each transaction records which entries should be removed
-	 * when the iclog commits to disk.  If a busy block is allocated,
-	 * the iclog is pushed up to the LSN that freed the block.
+	 * Since blocks move to the free list without the coordination used in
+	 * xfs_bmap_finish, we can't allow block to be available for
+	 * reallocation and non-transaction writing (user data) until we know
+	 * that the transaction that moved it to the free list is permanently
+	 * on disk.  We track the blocks by declaring these blocks as "busy";
+	 * the busy list is maintained on a per-ag basis and the extent cannot
+	 * be reallocated until the transaction that frees it is safely on
+	 * disk.
 	 */
-	xfs_alloc_mark_busy(tp, agno, bno, len);
+	xfs_alloc_mark_busy(tp, agno, bno, len, 0);
 	return 0;
 
  error0:
@@ -2177,6 +2247,29 @@ xfs_alloc_fix_freelist(
 /*
  * Get a block from the freelist.
  * Returns with the buffer for the block gotten.
+ *
+ * Notes on busy extent list maniipulations:
+ *
+ * As blocks are freed, they are added to the per-ag busy list and
+ * remain there until the freeing transaction is committed to disk.
+ * Now that we have allocated blocks, this list must be searched to see
+ * if a block is being reused.
+ *
+ * The only way we can get a busy extent on the freelist is when an
+ * alloc btree block is freed. Because this is a metadata block and all
+ * modifications are logged, we don't need to push the freeing
+ * transaction to disk if this allocation is for another btree block -
+ * the natural ordering of the log transactions will ensure that
+ * recovery will do the right thing. However, we need to remove it from
+ * the busy extent list as we have reused it.
+ *
+ * If this is to be used as any other type of block, we really
+ * need to ensure that the freeing transaction is on disk before we
+ * re-use it. The above case is the common case, so only when we are
+ * near ENOSPC will we be allocating other metadata or data blocks
+ * from the free list. In that case, waiting here for a log force
+ * to push the transaction to disk is not a big deal because it should
+ * be very infrequent.
  */
 int				/* error */
 xfs_alloc_get_freelist(
@@ -2234,22 +2327,11 @@ xfs_alloc_get_freelist(
 	xfs_alloc_log_agf(tp, agbp, logflags);
 	*bnop = bno;
 
-	/*
-	 * As blocks are freed, they are added to the per-ag busy list
-	 * and remain there until the freeing transaction is committed to
-	 * disk.  Now that we have allocated blocks, this list must be
-	 * searched to see if a block is being reused.  If one is, then
-	 * the freeing transaction must be pushed to disk before this
-	 * transaction.
-	 *
-	 * We do this by setting the current transaction
-	 * to a sync transaction which guarantees that the freeing transaction
-	 * is on disk before this transaction. This is done instead of a
-	 * synchronous log force here so that we don't sit and wait with
-	 * the AGF locked in the transaction during the log force.
-	 */
-	if (xfs_alloc_busy_search(tp, be32_to_cpu(agf->agf_seqno), bno, 1))
-		xfs_trans_set_sync(tp);
+	/* handle busy extents */
+	if (xfs_alloc_search_clear_busy_freelist(tp,
+			be32_to_cpu(agf->agf_seqno), bno, 1, btreeblk))
+		xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+
 	return 0;
 }
 
@@ -2306,6 +2388,15 @@ xfs_alloc_pagf_init(
 
 /*
  * Put the block on the freelist for the allocation group.
+ *
+ * Notes on handling busy extents:
+ *
+ * If this is a btree block, we need to let the busy extent list handling
+ * know about it so that we can optimise xfs_alloc_get_freelist() to allow
+ * us to re-use extents that were btree blocks without problems. We do this
+ * so that the entire freelist is available for btree blocks and we can do
+ * this because all modifications, allocations and freeing of allocbt blocks
+ * are logged and hence strongly ordered.
  */
 int					/* error */
 xfs_alloc_put_freelist(
@@ -2357,6 +2448,14 @@ xfs_alloc_put_freelist(
 		(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl),
 		(int)((xfs_caddr_t)blockp - (xfs_caddr_t)agfl +
 			sizeof(xfs_agblock_t) - 1));
+
+	/*
+	 * If it's a btree block, mark it busy as it has just been freed.
+	 * Otherwise we are just replenishing the free list with extents that
+	 * are already free so we don't need to track them.
+	 */
+	if (btreeblk)
+		xfs_alloc_mark_busy(tp, be32_to_cpu(agf->agf_seqno), bno, 1, 1);
 	return 0;
 }
 
diff --git a/fs/xfs/xfs_alloc.h b/fs/xfs/xfs_alloc.h
index 21040af..69111dd 100644
--- a/fs/xfs/xfs_alloc.h
+++ b/fs/xfs/xfs_alloc.h
@@ -122,11 +122,6 @@ extern ktrace_t *xfs_alloc_trace_buf;
 #define	XFS_ALLOC_KTRACE_BUSYSEARCH	6
 #endif
 
-void
-xfs_alloc_mark_busy(xfs_trans_t *tp,
-		xfs_agnumber_t agno,
-		xfs_agblock_t bno,
-		xfs_extlen_t len);
 
 void
 xfs_alloc_clear_busy(xfs_trans_t *tp, struct xfs_busy_extent *busyp);
diff --git a/fs/xfs/xfs_alloc_btree.c b/fs/xfs/xfs_alloc_btree.c
index 9e63f8c..f41bfae 100644
--- a/fs/xfs/xfs_alloc_btree.c
+++ b/fs/xfs/xfs_alloc_btree.c
@@ -119,19 +119,6 @@ xfs_allocbt_free_block(
 	error = xfs_alloc_put_freelist(cur->bc_tp, agbp, NULL, bno, 1);
 	if (error)
 		return error;
-
-	/*
-	 * Since blocks move to the free list without the coordination used in
-	 * xfs_bmap_finish, we can't allow block to be available for
-	 * reallocation and non-transaction writing (user data) until we know
-	 * that the transaction that moved it to the free list is permanently
-	 * on disk. We track the blocks by declaring these blocks as "busy";
-	 * the busy list is maintained on a per-ag basis and each transaction
-	 * records which entries should be removed when the iclog commits to
-	 * disk. If a busy block is allocated, the iclog is pushed up to the
-	 * LSN that freed the block.
-	 */
-	xfs_alloc_mark_busy(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1);
 	xfs_trans_agbtree_delta(cur->bc_tp, -1);
 	return 0;
 }
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] XFS: Avoid busy extent ranges rather than the entire extent
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
                   ` (4 preceding siblings ...)
  2008-10-07 22:09 ` [PATCH 5/7] XFS: Do not classify freed allocation btree blocks as busy Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-07 22:09 ` [PATCH 7/7] XFS: Simplify transaction busy extent tracking Dave Chinner
  2008-10-09 18:17 ` [RFC, PATCH 0/7] XFS: dynamic " Martin Steigerwald
  7 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

When we check an extent for being busy, we check for an overlap
and if we find one we declare that extent off-limits and move
on to the next free extent. This means that we can skip entire
AGs that have a single large free extent with very small overlapping
busy extent.

This can actually cause significant problems - often we have to
allocate within a given AG, and we have a reservation that guarantees
us that the free space is there. If we only have a singel extent,
and part of it is busy, we will fail the allocation and that will
cause problems.

To avoid this problem, we should trim the busy range out of the
extent we have found and determine if that trimmed range is still OK
for allocation. In many cases, this check can be incorporated into
the cwallocationextent alignment code which already does trimming of
the found extent before determining if it is a valid candidate for
allocation.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_ag.h    |    1 +
 fs/xfs/xfs_alloc.c |  439 +++++++++++++++++++++++++++++-----------------------
 2 files changed, 248 insertions(+), 192 deletions(-)

diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index ec87185..62bb280 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -161,6 +161,7 @@ typedef struct xfs_agfl {
  */
 struct xfs_busy_extent {
 	struct rb_node	rb_node;
+	atomic_t	ref;
 	xfs_agnumber_t	agno;
 	xfs_agblock_t	bno;
 	xfs_extlen_t	length;
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index e5fb397..73536ef 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -105,34 +105,75 @@ STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
  * though, we need to know that the busy extent was a allocbt block
  * and record that in the busy extent structure.
  */
-STATIC void
+STATIC struct xfs_busy_extent *
+xfs_alloc_busy_alloc(void)
+{
+	struct xfs_busy_extent	*busyp;
+
+	/* XXX: probably should use a zone */
+	busyp = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
+	if (!busyp)
+		return NULL;
+	atomic_set(&busyp->ref, 1);
+	return busyp;
+}
+
+static inline void
+xfs_alloc_busy_get(
+	struct xfs_busy_extent	*busyp)
+{
+	ASSERT(atomic_read(&busyp->ref) > 0);
+	atomic_inc(&busyp->ref);
+}
+
+static inline void
+xfs_alloc_busy_put(
+	struct xfs_busy_extent	*busyp)
+{
+	ASSERT(atomic_read(&busyp->ref) > 0);
+	if (atomic_dec_and_test(&busyp->ref))
+		kmem_free(busyp);
+}
+
+/*
+ * Insert a new busy extent into the tree.
+ *
+ * We can get duplicates occurring here in the case where an extent
+ * goes from an allocbt block to the freelist (i.e. btree merge occurs)
+ * and then the free list can be shortened which moves the extent
+ * from the free list to a frespace record. In this case, we simply
+ * clear the freelist flag from the busy extent.
+ */
+STATIC int
 xfs_alloc_busy_insert(
 	struct xfs_perag	*pag,
 	xfs_agblock_t		bno,
-	struct rb_node		*node)
+	struct xfs_busy_extent	*busyp)
 {
 	struct rb_node		**rbp = &pag->pagb_tree.rb_node;
 	struct rb_node		*parent = NULL;
-	struct xfs_busy_extent	*busyp;
 
 	while (*rbp) {
-		parent = *rbp;
-		busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
+		struct xfs_busy_extent	*bp;
 
-		ASSERT(bno != busyp->bno);
-		if (bno < busyp->bno)
+		parent = *rbp;
+		bp = rb_entry(parent, struct xfs_busy_extent, rb_node);
+		if (bno < bp->bno)
 			rbp = &(*rbp)->rb_left;
-		else if (bno > busyp->bno)
+		else if (bno > bp->bno)
 			rbp = &(*rbp)->rb_right;
 		else {
-			cmn_err(CE_WARN,
-				"XFS: ignoring duplicate busy extent.\n");
-			ASSERT(0);
+			/* clear the freelist flag and free the new node */
+			ASSERT(bp->flags & XFS_BUSY_EXT_FREELIST);
+			bp->flags &= ~XFS_BUSY_EXT_FREELIST;
+			xfs_alloc_busy_put(busyp);
+			return 0;
 		}
 	}
 
-	rb_link_node(node, parent, rbp);
-	rb_insert_color(node, &pag->pagb_tree);
+	rb_link_node(&busyp->rb_node, parent, rbp);
+	rb_insert_color(&busyp->rb_node, &pag->pagb_tree);
+	return 1;
 }
 
 STATIC void
@@ -146,7 +187,7 @@ xfs_alloc_mark_busy(
 	struct xfs_busy_extent	*busyp;
 	struct xfs_perag	*pag;
 
-	busyp = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
+	busyp = xfs_alloc_busy_alloc();
 	if (!busyp) {
 		/*
 		 * No Memory!  Since it is now not possible to track the free
@@ -166,33 +207,53 @@ xfs_alloc_mark_busy(
 
 	pag = xfs_perag_get(tp->t_mountp, agno);
 	spin_lock(&pag->pagb_lock);
-	xfs_alloc_busy_insert(pag, bno, &busyp->rb_node);
-	xfs_trans_add_busy(tp, busyp);
+	if (xfs_alloc_busy_insert(pag, bno, busyp))
+		xfs_trans_add_busy(tp, busyp);
 	TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, tp);
 	spin_unlock(&pag->pagb_lock);
 	xfs_perag_put(pag);
 }
 
+void
+xfs_alloc_clear_busy(
+	struct xfs_trans	*tp,
+	struct xfs_busy_extent	*busyp)
+{
+	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, busyp->agno);
+
+	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
+							busyp->length, tp);
+	spin_lock(&pag->pagb_lock);
+	ASSERT(!RB_EMPTY_NODE(&busyp->rb_node));
+	rb_erase(&busyp->rb_node, &pag->pagb_tree);
+	spin_unlock(&pag->pagb_lock);
+
+	xfs_perag_put(pag);
+	xfs_alloc_busy_put(busyp);
+}
+
 /*
  * Search for a busy extent that overlaps the range of the extent we
- * are about to allocate.
+ * have been given. If we find a match, take a reference to the busyp
+ * and return it to the caller so they can decide what action to take.
+ * Caller must drop the reference.
  */
-static struct xfs_busy_extent *
-__xfs_alloc_busy_search(
+STATIC struct xfs_busy_extent *
+xfs_alloc_busy_search(
 	struct xfs_trans	*tp,
-	struct xfs_perag	*pag,
 	xfs_agnumber_t		agno,
 	xfs_agblock_t		bno,
 	xfs_extlen_t		len)
 {
+	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, agno);
+	struct xfs_busy_extent	*busyp = NULL;
 	struct rb_node		*rbp;
 	xfs_agblock_t		uend, bend;
 
 	uend = bno + len - 1;
+	spin_lock(&pag->pagb_lock);
 	rbp = pag->pagb_tree.rb_node;
 	while (rbp) {
-		struct xfs_busy_extent	*busyp;
-
 		busyp = rb_entry(rbp, struct xfs_busy_extent, rb_node);
 		bend = busyp->bno + busyp->length - 1;
 		if (uend < busyp->bno)
@@ -203,50 +264,69 @@ __xfs_alloc_busy_search(
 			/* (start1,length1) overlaps (start2, length2) */
 			TRACE_BUSYSEARCH("xfs_alloc_busy_search",
 					 "found1", agno, bno, len, tp);
-			return busyp;
+			xfs_alloc_busy_get(busyp);
+			break;
 		}
+		busyp = NULL;
 	}
-	return NULL;
-}
-
-STATIC struct xfs_busy_extent *
-xfs_alloc_busy_search(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len)
-{
-	struct xfs_busy_extent	*busyp;
-	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, agno);
-
-	spin_lock(&pag->pagb_lock);
-	busyp = __xfs_alloc_busy_search(tp, pag, agno, bno, len);
 	spin_unlock(&pag->pagb_lock);
 
 	xfs_perag_put(pag);
 	return busyp;
 }
 
-void
-xfs_alloc_clear_busy(
+/*
+ * For a given extent [fbno, flen], search the busy extent list
+ * to find a subset of the extent that is not busy.
+ */
+STATIC void
+xfs_alloc_busy_search_trim(
 	struct xfs_trans	*tp,
-	struct xfs_busy_extent	*busyp)
-{
-	struct xfs_perag	*pag;
-
-	/* check that the busyp is still in the rbtree */
-	ASSERT(xfs_alloc_busy_search(tp, busyp->agno, busyp->bno,
-						busyp->length) == busyp);
+	xfs_agnumber_t		agno,
+	xfs_agblock_t		fbno,
+	xfs_extlen_t		flen,
+	xfs_agblock_t		*rbno,
+	xfs_extlen_t		*rlen)
 
-	pag = xfs_perag_get(tp->t_mountp, busyp->agno);
-	spin_lock(&pag->pagb_lock);
-	TRACE_UNBUSY("xfs_alloc_clear_busy", "found", busyp->agno, busyp->bno,
-							busyp->length, tp);
-	rb_erase(&busyp->rb_node, &pag->pagb_tree);
-	spin_unlock(&pag->pagb_lock);
-	xfs_perag_put(pag);
+{
+	xfs_agblock_t		bno = fbno;
+	xfs_extlen_t		len = flen;
+	struct xfs_busy_extent	*busyp;
 
-	kmem_free(busyp);
+	busyp = xfs_alloc_busy_search(tp, agno, bno, len);
+	while (busyp) {
+		xfs_agblock_t	end = bno + len;
+		xfs_agblock_t	bend = busyp->bno + busyp->length;
+		if (busyp->bno < bno) {
+			/* start overlap */
+			ASSERT(bend >= bno);
+			ASSERT(bend <= end);
+			len -= bno - bend;
+			bno = bend;
+		} else if (bend > end) {
+			/* end overlap */
+			ASSERT(busyp->bno >= bno);
+			ASSERT(busyp->bno < end);
+			len -= bend - end;
+		} else {
+			/* middle overlap - return larger segment */
+			ASSERT(busyp->bno >= bno);
+			ASSERT(bend <= end);
+			len = busyp->bno - bno;
+			if (len >= end - bend) {
+				/* use first segment */
+				len = len;
+			} else {
+				/* use last segment */
+				bno = bend;
+				len = end - bend;
+			}
+		}
+		xfs_alloc_busy_put(busyp);
+		busyp = xfs_alloc_busy_search(tp, agno, bno, len);
+	}
+	*rbno = bno;
+	*rlen = len;
 }
 
 /*
@@ -340,53 +420,14 @@ xfs_alloc_get_rec(
 }
 
 /*
- * If the extent is busy via xfs_alloc_put_freelist(), and we are
- * currently reusing it via xfs_alloc_get_freelist(), remove the
- * extent from the busy list. Do this search and remove atomically
- * so we don't race with transaction completion calling
- * xfs_alloc_clear_busy().
- *
- * We should never get the situation where an extent on the freelist
- * is busy due to other methods of freeing extents. If it happens,
- * let the caller handle it (e.g. by using a synchronous log
- * force to wait for the extent to become unbusy).
- */
-static struct xfs_busy_extent *
-xfs_alloc_search_clear_busy_freelist(
-	struct xfs_trans	*tp,
-	xfs_agnumber_t		agno,
-	xfs_agblock_t		bno,
-	xfs_extlen_t		len,
-	int			btreeblk)
-{
-	struct xfs_busy_extent	*busyp;
-	struct xfs_perag	*pag = xfs_perag_get(tp->t_mountp, agno);
-
-	spin_lock(&pag->pagb_lock);
-	busyp = __xfs_alloc_busy_search(tp, pag, agno, bno, len);
-	if (busyp && btreeblk && (busyp->flags & XFS_BUSY_EXT_FREELIST)) {
-		TRACE_UNBUSY("xfs_alloc_search_clear_busy_freelist",
-			"found", busyp->agno, busyp->bno, busyp->length, tp);
-		rb_erase(&busyp->rb_node, &pag->pagb_tree);
-		kmem_free(busyp);
-		busyp = NULL;
-	}
-	spin_unlock(&pag->pagb_lock);
-
-	xfs_perag_put(pag);
-	return busyp;
-}
-
-/*
  * Compute aligned version of the found extent.
- * Takes alignment and min length into account.
+ * Takes alignment, min length and busy extent ranges into account.
  */
 STATIC void
 xfs_alloc_compute_aligned(
+	xfs_alloc_arg_t	*args,
 	xfs_agblock_t	foundbno,	/* starting block in found extent */
 	xfs_extlen_t	foundlen,	/* length in found extent */
-	xfs_extlen_t	alignment,	/* alignment for allocation */
-	xfs_extlen_t	minlen,		/* minimum length for allocation */
 	xfs_agblock_t	*resbno,	/* result block number */
 	xfs_extlen_t	*reslen)	/* result length */
 {
@@ -394,13 +435,13 @@ xfs_alloc_compute_aligned(
 	xfs_extlen_t	diff;
 	xfs_extlen_t	len;
 
-	if (alignment > 1 && foundlen >= minlen) {
-		bno = roundup(foundbno, alignment);
+	/* Trim busy sections out of found extent */
+	xfs_alloc_busy_search_trim(args->tp, args->agno, foundbno, foundlen,
+							&bno, &len);
+	if (args->alignment > 1 && len >= args->minlen) {
+		bno = roundup(bno, args->alignment);
 		diff = bno - foundbno;
 		len = diff >= foundlen ? 0 : foundlen - diff;
-	} else {
-		bno = foundbno;
-		len = foundlen;
 	}
 	*resbno = bno;
 	*reslen = len;
@@ -876,6 +917,10 @@ xfs_alloc_ag_vextent(
 		ASSERT(args->len >= args->minlen && args->len <= args->maxlen);
 		ASSERT(!(args->wasfromfl) || !args->isfl);
 		ASSERT(args->agbno % args->alignment == 0);
+
+		/* we should never be handed a busy extent. */
+		ASSERT(!xfs_alloc_busy_search(args->tp, args->agno,
+						args->agbno, args->len));
 		if (!(args->wasfromfl)) {
 
 			agf = XFS_BUF_TO_AGF(args->agbp);
@@ -888,16 +933,6 @@ xfs_alloc_ag_vextent(
 			TRACE_MODAGF(NULL, agf, XFS_AGF_FREEBLKS);
 			xfs_alloc_log_agf(args->tp, args->agbp,
 						XFS_AGF_FREEBLKS);
-			/*
-			 * Search the busylist for these blocks and mark the
-			 * transaction as synchronous if blocks are found. This
-			 * avoids the need to block in due to a synchronous log
-			 * force to ensure correct ordering as the synchronous
-			 * transaction will guarantee that for us.
-			 */
-			if (xfs_alloc_busy_search(args->tp, args->agno,
-						args->agbno, args->len))
-				xfs_trans_set_sync(args->tp);
 		}
 		if (!args->isfl)
 			xfs_trans_mod_sb(args->tp,
@@ -927,14 +962,14 @@ xfs_alloc_ag_vextent_exact(
 {
 	xfs_btree_cur_t	*bno_cur;/* by block-number btree cursor */
 	xfs_btree_cur_t	*cnt_cur;/* by count btree cursor */
-	xfs_agblock_t	end;	/* end of allocated extent */
 	int		error;
 	xfs_agblock_t	fbno;	/* start block of found extent */
-	xfs_agblock_t	fend;	/* end block of found extent */
 	xfs_extlen_t	flen;	/* length of found extent */
+	xfs_agblock_t	tbno;	/* start block of trimmed extent */
+	xfs_extlen_t	tlen;	/* length of trimmed extent */
+	xfs_agblock_t	tend;	/* end block of trimmed extent */
+	xfs_agblock_t	end;	/* end of allocated extent */
 	int		i;	/* success/failure of operation */
-	xfs_agblock_t	maxend;	/* end of maximal extent */
-	xfs_agblock_t	minend;	/* end of minimal extent */
 	xfs_extlen_t	rlen;	/* length of returned extent */
 
 	ASSERT(args->alignment == 1);
@@ -957,22 +992,22 @@ xfs_alloc_ag_vextent_exact(
 	error = xfs_alloc_get_rec(bno_cur, &fbno, &flen, &i);
 	if (error)
 		goto error0;
-	if (xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
-		goto not_found;
 	XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
 	ASSERT(fbno <= args->agbno);
 
+	/* Trim busy sections out of found extent */
+	xfs_alloc_busy_search_trim(args->tp, args->agno, fbno, flen,
+							&tbno, &tlen);
+
 	/* Give up if the freespace isn't long enough for the minimum request. */
-	minend = args->agbno + args->minlen;
-	maxend = args->agbno + args->maxlen;
-	fend = fbno + flen;
-	if (fend < minend)
+	tend = tbno + tlen;
+	if (tend < args->agbno + args->minlen)
 		goto not_found;
 	/*
 	 * End of extent will be smaller of the freespace end and the
 	 * maximal requested end.
 	 */
-	end = XFS_AGBLOCK_MIN(fend, maxend);
+	end = XFS_AGBLOCK_MIN(tend, (args->agbno + args->maxlen));
 
 	/* Fix the length according to mod and prod if given. */
 	args->len = end - args->agbno;
@@ -981,7 +1016,7 @@ xfs_alloc_ag_vextent_exact(
 		goto not_found;
 
 	rlen = args->len;
-	ASSERT(args->agbno + rlen <= fend);
+	ASSERT(args->agbno + rlen <= tend);
 	end = args->agbno + rlen;
 	/*
 	 * We are allocating agbno for rlen [agbno .. end]
@@ -1028,10 +1063,10 @@ xfs_alloc_find_best_extent(
 	xfs_agblock_t	gdiff,		/* difference for search comparison */
 	xfs_agblock_t	*sbno,		/* extent found by search */
 	xfs_extlen_t	*slen,
+	xfs_agblock_t	*sbnoa,		/* aligned extent found by search */
 	xfs_extlen_t	*slena,		/* aligned length */
 	int		dir)	/* 0 = search right, 1 = search left */
 {
-	xfs_agblock_t	bno;
 	xfs_agblock_t	new;
 	xfs_agblock_t	sdiff;
 	int		i;
@@ -1050,35 +1085,29 @@ xfs_alloc_find_best_extent(
 		if (error)
 			goto error0;
 		XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-		xfs_alloc_compute_aligned(*sbno, *slen, args->alignment,
-					args->minlen, &bno, slena);
-		if (xfs_alloc_busy_search(args->tp, args->agno, bno, *slena)) {
-			/* just freed - skip this one */
-			goto next_record;
-		}
+		xfs_alloc_compute_aligned(args, *sbno, *slen, sbnoa, slena);
 
 		/* The good extent is closer than this one */
 		if (!dir) {
-			if (bno >= args->agbno + gdiff)
+			if (*sbnoa >= args->agbno + gdiff)
 				goto out_use_good;
 		} else {
-			if (bno <= args->agbno - gdiff)
+			if (*sbnoa <= args->agbno - gdiff)
 				goto out_use_good;
 		}
 
-		/* Same distance, compare length and pick the best. */
+		/* Same region, compare length and pick the best. */
 		if (*slena >= args->minlen) {
 			args->len = XFS_EXTLEN_MIN(*slena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-					args->alignment, *sbno, *slen, &new);
+					args->alignment, *sbnoa, *slena, &new);
 
 			/* choose closer size and invalidate other cursor */
 			if (sdiff < gdiff)
 				goto out_use_search;
 			goto out_use_good;
 		}
-next_record:
 		if (!dir)
 			error = xfs_btree_increment(*scur, 0, &i);
 		else
@@ -1228,19 +1257,17 @@ xfs_alloc_ag_vextent_near(
 			if ((error = xfs_alloc_get_rec(cnt_cur, &ltbno, &ltlen, &i)))
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-			xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
-					args->minlen, &ltbnoa, &ltlena);
+			xfs_alloc_compute_aligned(args, ltbno, ltlen,
+							&ltbnoa, &ltlena);
 			if (ltlena < args->minlen)
 				continue;
-			if (xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
-				continue;
 			args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 			xfs_alloc_fix_len(args);
 			ASSERT(args->len >= args->minlen);
 			if (args->len < blen)
 				continue;
 			ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
-				args->alignment, ltbno, ltlen, &ltnew);
+				args->alignment, ltbnoa, ltlena, &ltnew);
 			if (ltnew != NULLAGBLOCK &&
 			    (args->len > blen || ltdiff < bdiff)) {
 				bdiff = ltdiff;
@@ -1351,17 +1378,10 @@ xfs_alloc_ag_vextent_near(
 			if ((error = xfs_alloc_get_rec(bno_cur_lt, &ltbno, &ltlen, &i)))
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-			xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
-					args->minlen, &ltbnoa, &ltlena);
-			if (ltlena >= args->minlen &&
-			    !xfs_alloc_busy_search(args->tp, args->agno, ltbnoa, ltlena))
+			xfs_alloc_compute_aligned(args, ltbno, ltlen,
+							&ltbnoa, &ltlena);
+			if (ltlena >= args->minlen)
 				break;
-			/*
-			 * clear the length so we don't accidentally use this
-			 * extent after we decrement the cursor then find a
-			 * match from the other cursor.
-			 */
-			ltlena = 0;
 			if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
 				goto error0;
 			if (!i) {
@@ -1374,12 +1394,10 @@ xfs_alloc_ag_vextent_near(
 			if ((error = xfs_alloc_get_rec(bno_cur_gt, &gtbno, &gtlen, &i)))
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-			xfs_alloc_compute_aligned(gtbno, gtlen, args->alignment,
-					args->minlen, &gtbnoa, &gtlena);
-			if (gtlena >= args->minlen &&
-			    !xfs_alloc_busy_search(args->tp, args->agno, gtbnoa, gtlena))
+			xfs_alloc_compute_aligned(args, gtbno, gtlen,
+							&gtbnoa, &gtlena);
+			if (gtlena >= args->minlen)
 				break;
-			gtlena = 0;
 			if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
 				goto error0;
 			if (!i) {
@@ -1398,11 +1416,12 @@ xfs_alloc_ag_vextent_near(
 			xfs_alloc_fix_len(args);
 			rlen = args->len;
 			ltdiff = xfs_alloc_compute_diff(args->agbno, rlen,
-				args->alignment, ltbno, ltlen, &ltnew);
+				args->alignment, ltbnoa, ltlena, &ltnew);
 
 			error = xfs_alloc_find_best_extent(args,
 						&bno_cur_lt, &bno_cur_gt,
-						ltdiff, &gtbno, &gtlen, &gtlena,
+						ltdiff, &gtbno, &gtlen, 
+						&gtbnoa, &gtlena,
 						0 /* search right */);
 			if (error)
 				goto error0;
@@ -1414,11 +1433,12 @@ xfs_alloc_ag_vextent_near(
 			xfs_alloc_fix_len(args);
 			rlen = args->len;
 			gtdiff = xfs_alloc_compute_diff(args->agbno, rlen,
-				args->alignment, gtbno, gtlen, &gtnew);
+				args->alignment, gtbnoa, gtlena, &gtnew);
 
 			error = xfs_alloc_find_best_extent(args,
 						&bno_cur_gt, &bno_cur_lt,
-						gtdiff, &ltbno, &ltlen, &ltlena,
+						gtdiff, &ltbno, &ltlen, 
+						&ltbnoa, &ltlena,
 						1 /* search left */);
 			if (error)
 				goto error0;
@@ -1442,7 +1462,7 @@ xfs_alloc_ag_vextent_near(
 		bno_cur_lt = bno_cur_gt;
 		bno_cur_gt = NULL;
 		ltbno = gtbno;
-		//ltbnoa = gtbnoa;
+		ltbnoa = gtbnoa;
 		ltlen = gtlen;
 		ltlena = gtlena;
 		j = 1;
@@ -1451,7 +1471,7 @@ xfs_alloc_ag_vextent_near(
 	/*
 	 * Fix up the length and compute the useful address.
 	 */
-	ltend = ltbno + ltlen;
+	ltend = ltbnoa + ltlena;
 	args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
 	xfs_alloc_fix_len(args);
 	if (!xfs_alloc_fix_minleft(args)) {
@@ -1461,8 +1481,8 @@ xfs_alloc_ag_vextent_near(
 		return 0;
 	}
 	rlen = args->len;
-	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, ltbno,
-		ltlen, &ltnew);
+	(void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, ltbnoa,
+		ltlena, &ltnew);
 	ASSERT(ltnew >= ltbno);
 	ASSERT(ltnew + rlen <= ltend);
 	ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
@@ -1510,7 +1530,7 @@ xfs_alloc_ag_vextent_size(
 	int		i;		/* temp status variable */
 	xfs_agblock_t	rbno;		/* returned block number */
 	xfs_extlen_t	rlen;		/* length of returned extent */
-	int		check_busy = 1;
+	int		forced = 0;
 
 	/*
 	 * Allocate and initialize a cursor for the by-size btree.
@@ -1526,10 +1546,14 @@ restart:
 			args->maxlen + args->alignment - 1, &i)))
 		goto error0;
 	/*
-	 * If none, then pick up the last entry in the tree unless the
-	 * tree is empty.
+	 * If none or we have busy extents that we cannot allocate from, then
+	 * we have to settle for a smaller extent. In the case that there are
+	 * no large extents, this will return the last entry in the tree unless
+	 * the tree is empty. In the case that there are only busy large
+	 * extents, this will return the largest small extent unless there
+	 * are no smaller extents available.
 	 */
-	if (!i) {
+	if (!i || forced > 1) {
 		if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &fbno,
 				&flen, &i)))
 			goto error0;
@@ -1539,6 +1563,7 @@ restart:
 			return 0;
 		}
 		ASSERT(i == 1);
+		xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
 	}
 	/*
 	 * There's a freespace as big as maxlen+alignment-1, get it.
@@ -1555,17 +1580,30 @@ restart:
 			if (error)
 				goto error0;
 			XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
-			if (!check_busy ||
-			    !xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
+			xfs_alloc_compute_aligned(args, fbno, flen,
+							&rbno, &rlen);
+			if (rlen >= args->maxlen)
 				break;
 			error = xfs_btree_increment(cnt_cur, 0, &i);
 			if (error)
 				goto error0;
 			if (i == 0) {
-				ASSERT(check_busy);
-				check_busy = 0;
+				/*
+				 * Our only valid extents must have been busy.
+				 * Make it unbusy by forcing the log out and
+				 * retrying. If we've been here before, forcing
+				 * the log isn't making the extents available,
+				 * which means they have probably been freed in
+				 * this transaction. In that case, we have to
+				 * give up on them and we'll attempt a minlen
+				 * allocation the next time around.
+				 */
 				xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
 				TRACE_ALLOC("busy restart", args);
+				if (!forced)
+					xfs_log_force(args->mp, 0,
+						XFS_LOG_FORCE|XFS_LOG_SYNC);
+				forced++;
 				goto restart;
 			}
 		}
@@ -1576,8 +1614,6 @@ restart:
 	 * once aligned; if not, we search left for something better.
 	 * This can't happen in the second case above.
 	 */
-	xfs_alloc_compute_aligned(fbno, flen, args->alignment, args->minlen,
-		&rbno, &rlen);
 	rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
 	XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
 			(rlen <= flen && rbno + rlen <= fbno + flen), error0);
@@ -1604,15 +1640,8 @@ restart:
 			if (flen < bestrlen)
 				break;
 
-			/*
-			 * avoid busy candidate extents as they are clearly not
-			 * a better choice than other extents due to the log
-			 * force penalty of using them.
-			 */
-			if (xfs_alloc_busy_search(args->tp, args->agno, fbno, flen))
-				continue;
-			xfs_alloc_compute_aligned(fbno, flen, args->alignment,
-				args->minlen, &rbno, &rlen);
+			xfs_alloc_compute_aligned(args, fbno, flen,
+							&rbno, &rlen);
 			rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
 			XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
 				(rlen <= flen && rbno + rlen <= fbno + flen),
@@ -1640,13 +1669,19 @@ restart:
 	 * Fix up the length.
 	 */
 	args->len = rlen;
-	xfs_alloc_fix_len(args);
-	if (rlen < args->minlen || !xfs_alloc_fix_minleft(args)) {
-		xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
-		TRACE_ALLOC("nominleft", args);
-		args->agbno = NULLAGBLOCK;
-		return 0;
+	if (rlen < args->minlen) {
+		if (!forced) {
+			xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+			TRACE_ALLOC("busy restart", args);
+			xfs_log_force(args->mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+			forced++;
+			goto restart;
+		}
+		goto out_nominleft;
 	}
+	xfs_alloc_fix_len(args);
+	if (!xfs_alloc_fix_minleft(args))
+		goto out_nominleft;
 	rlen = args->len;
 	XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0);
 	/*
@@ -1676,6 +1711,12 @@ error0:
 	if (bno_cur)
 		xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
 	return error;
+
+out_nominleft:
+	xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+	TRACE_ALLOC("nominleft", args);
+	args->agbno = NULLAGBLOCK;
+	return 0;
 }
 
 /*
@@ -2327,10 +2368,24 @@ xfs_alloc_get_freelist(
 	xfs_alloc_log_agf(tp, agbp, logflags);
 	*bnop = bno;
 
-	/* handle busy extents */
-	if (xfs_alloc_search_clear_busy_freelist(tp,
-			be32_to_cpu(agf->agf_seqno), bno, 1, btreeblk))
-		xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+	/*
+	 * for busy btree blocks, we want to remove them from the
+	 * busy tree as we are now reusing it. Other busy extents
+	 * require a log force to be able to use safely.
+	 */
+	if (btreeblk) {
+		struct xfs_busy_extent	*busyp;
+
+		busyp = xfs_alloc_busy_search(tp, be32_to_cpu(agf->agf_seqno),
+							bno, 1);
+		if (busyp) {
+			if (busyp->flags & XFS_BUSY_EXT_FREELIST)
+				xfs_alloc_clear_busy(tp, busyp);
+			else
+				xfs_log_force(mp, 0, XFS_LOG_FORCE|XFS_LOG_SYNC);
+			xfs_alloc_busy_put(busyp);
+		}
+	}
 
 	return 0;
 }
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] XFS: Simplify transaction busy extent tracking
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
                   ` (5 preceding siblings ...)
  2008-10-07 22:09 ` [PATCH 6/7] XFS: Avoid busy extent ranges rather than the entire extent Dave Chinner
@ 2008-10-07 22:09 ` Dave Chinner
  2008-10-09 18:17 ` [RFC, PATCH 0/7] XFS: dynamic " Martin Steigerwald
  7 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-07 22:09 UTC (permalink / raw)
  To: xfs

Now that we have a dynamic busy extent structure, we no
longer need the complex log busy chunk infrastructure to
track them in a transaction. We can simply use a linked
list hanging off the transaction structure to do this now.

Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 fs/xfs/xfs_ag.h         |    3 +-
 fs/xfs/xfs_alloc.c      |    4 +-
 fs/xfs/xfs_trans.c      |   30 +++++++-------
 fs/xfs/xfs_trans.h      |   33 +---------------
 fs/xfs/xfs_trans_item.c |  102 -----------------------------------------------
 fs/xfs/xfs_trans_priv.h |    1 -
 6 files changed, 21 insertions(+), 152 deletions(-)

diff --git a/fs/xfs/xfs_ag.h b/fs/xfs/xfs_ag.h
index 62bb280..de105a4 100644
--- a/fs/xfs/xfs_ag.h
+++ b/fs/xfs/xfs_ag.h
@@ -160,7 +160,8 @@ typedef struct xfs_agfl {
  * have been freed but whose transactions aren't committed to disk yet.
  */
 struct xfs_busy_extent {
-	struct rb_node	rb_node;
+	struct rb_node	rb_node;	/* ag by-bno indexed search tree */
+	struct list_head list;		/* transaction busy extent list */
 	atomic_t	ref;
 	xfs_agnumber_t	agno;
 	xfs_agblock_t	bno;
diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 73536ef..ff3111f 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -202,13 +202,14 @@ xfs_alloc_mark_busy(
 	busyp->agno = agno;
 	busyp->bno = bno;
 	busyp->length = len;
+	INIT_LIST_HEAD(&busyp->list);
 	if (freelist)
 		busyp->flags |= XFS_BUSY_EXT_FREELIST; 
 
 	pag = xfs_perag_get(tp->t_mountp, agno);
 	spin_lock(&pag->pagb_lock);
 	if (xfs_alloc_busy_insert(pag, bno, busyp))
-		xfs_trans_add_busy(tp, busyp);
+		list_add(&busyp->list, &tp->t_busy);
 	TRACE_BUSY("xfs_alloc_mark_busy", "got", agno, bno, len, tp);
 	spin_unlock(&pag->pagb_lock);
 	xfs_perag_put(pag);
@@ -226,6 +227,7 @@ xfs_alloc_clear_busy(
 	spin_lock(&pag->pagb_lock);
 	ASSERT(!RB_EMPTY_NODE(&busyp->rb_node));
 	rb_erase(&busyp->rb_node, &pag->pagb_tree);
+	list_del_init(&busyp->list);
 	spin_unlock(&pag->pagb_lock);
 
 	xfs_perag_put(pag);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index e88b9bf..d0d0f6f 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -253,9 +253,8 @@ _xfs_trans_alloc(
 	tp->t_type = type;
 	tp->t_mountp = mp;
 	tp->t_items_free = XFS_LIC_NUM_SLOTS;
-	tp->t_busy_free = XFS_LBC_NUM_SLOTS;
 	xfs_lic_init(&(tp->t_items));
-	XFS_LBC_INIT(&(tp->t_busy));
+	INIT_LIST_HEAD(&tp->t_busy);
 	return tp;
 }
 
@@ -282,9 +281,8 @@ xfs_trans_dup(
 	ntp->t_type = tp->t_type;
 	ntp->t_mountp = tp->t_mountp;
 	ntp->t_items_free = XFS_LIC_NUM_SLOTS;
-	ntp->t_busy_free = XFS_LBC_NUM_SLOTS;
 	xfs_lic_init(&(ntp->t_items));
-	XFS_LBC_INIT(&(ntp->t_busy));
+	INIT_LIST_HEAD(&ntp->t_busy);
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(tp->t_ticket != NULL);
@@ -765,6 +763,19 @@ xfs_trans_unreserve_and_mod_sb(
 	}
 }
 
+/*
+ * xfs_trans_free_busy
+ * Free all of the busy extents from a transaction
+ */
+static void
+xfs_trans_free_busy(xfs_trans_t *tp)
+{
+	struct xfs_busy_extent	*busyp, *n;
+
+	list_for_each_entry_safe(busyp, n, &tp->t_busy, list) {
+		xfs_alloc_clear_busy(tp, busyp);
+	}
+}
 
 /*
  * xfs_trans_commit
@@ -1300,9 +1311,6 @@ xfs_trans_committed(
 {
 	xfs_log_item_chunk_t	*licp;
 	xfs_log_item_chunk_t	*next_licp;
-	xfs_log_busy_chunk_t	*lbcp;
-	xfs_log_busy_slot_t	*lbsp;
-	int			i;
 
 	/*
 	 * Call the transaction's completion callback if there
@@ -1335,14 +1343,6 @@ xfs_trans_committed(
 	/*
 	 * Clear all the per-AG busy list items listed in this transaction
 	 */
-	lbcp = &tp->t_busy;
-	while (lbcp != NULL) {
-		for (i = 0, lbsp = lbcp->lbc_busy; i < lbcp->lbc_unused; i++, lbsp++) {
-			if (!XFS_LBC_ISFREE(lbcp, i))
-				xfs_alloc_clear_busy(tp, lbsp->lbc_busyp);
-		}
-		lbcp = lbcp->lbc_next;
-	}
 	xfs_trans_free_busy(tp);
 
 	/*
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 6a5e8d5..b7e8cf0 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -819,33 +819,6 @@ typedef struct xfs_item_ops {
 #define XFS_ITEM_PUSHBUF	4
 
 /*
- * This structure is used to maintain a list of block ranges that have been
- * freed in the transaction.  The ranges are listed in the perag[] busy list
- * between when they're freed and the transaction is committed to disk.
- */
-
-typedef struct xfs_log_busy_slot {
-	struct xfs_busy_extent	*lbc_busyp;
-} xfs_log_busy_slot_t;
-
-#define XFS_LBC_NUM_SLOTS	31
-typedef struct xfs_log_busy_chunk {
-	struct xfs_log_busy_chunk	*lbc_next;
-	uint				lbc_free;	/* free slots bitmask */
-	ushort				lbc_unused;	/* first unused */
-	xfs_log_busy_slot_t		lbc_busy[XFS_LBC_NUM_SLOTS];
-} xfs_log_busy_chunk_t;
-
-#define	XFS_LBC_MAX_SLOT	(XFS_LBC_NUM_SLOTS - 1)
-#define	XFS_LBC_FREEMASK	((1U << XFS_LBC_NUM_SLOTS) - 1)
-
-#define	XFS_LBC_INIT(cp)	((cp)->lbc_free = XFS_LBC_FREEMASK)
-#define	XFS_LBC_CLAIM(cp, slot)	((cp)->lbc_free &= ~(1 << (slot)))
-#define	XFS_LBC_SLOT(cp, slot)	(&((cp)->lbc_busy[(slot)]))
-#define	XFS_LBC_VACANCY(cp)	(((cp)->lbc_free) & XFS_LBC_FREEMASK)
-#define	XFS_LBC_ISFREE(cp, slot) ((cp)->lbc_free & (1 << (slot)))
-
-/*
  * This is the type of function which can be given to xfs_trans_callback()
  * to be called upon the transaction's commit to disk.
  */
@@ -896,8 +869,7 @@ typedef struct xfs_trans {
 	unsigned int		t_items_free;	/* log item descs free */
 	xfs_log_item_chunk_t	t_items;	/* first log item desc chunk */
 	xfs_trans_header_t	t_header;	/* header for in-log trans */
-	unsigned int		t_busy_free;	/* busy descs free */
-	xfs_log_busy_chunk_t	t_busy;		/* busy/async free blocks */
+	struct list_head	t_busy;		/* list of busy extents */
 	unsigned long		t_pflags;	/* saved process flags state */
 } xfs_trans_t;
 
@@ -972,9 +944,6 @@ void		xfs_trans_cancel(xfs_trans_t *, int);
 int		xfs_trans_ail_init(struct xfs_mount *);
 void		xfs_trans_ail_destroy(struct xfs_mount *);
 
-xfs_log_busy_slot_t *xfs_trans_add_busy(struct xfs_trans *tp,
-					struct xfs_busy_extent *busyp);
-
 extern kmem_zone_t	*xfs_trans_zone;
 
 #endif	/* __KERNEL__ */
diff --git a/fs/xfs/xfs_trans_item.c b/fs/xfs/xfs_trans_item.c
index 3baa0af..687625f 100644
--- a/fs/xfs/xfs_trans_item.c
+++ b/fs/xfs/xfs_trans_item.c
@@ -438,105 +438,3 @@ xfs_trans_unlock_chunk(
 
 	return freed;
 }
-
-
-/*
- * This is called to add the given busy item to the transaction's
- * list of busy items.  It must find a free busy item descriptor
- * or allocate a new one and add the item to that descriptor.
- * The function returns a pointer to busy descriptor used to point
- * to the new busy entry.  The log busy entry will now point to its new
- * descriptor with its ???? field.
- */
-xfs_log_busy_slot_t *
-xfs_trans_add_busy(
-	xfs_trans_t		*tp,
-	struct xfs_busy_extent	*busyp)
-{
-	xfs_log_busy_chunk_t	*lbcp;
-	xfs_log_busy_slot_t	*lbsp;
-	int			i=0;
-
-	/*
-	 * If there are no free descriptors, allocate a new chunk
-	 * of them and put it at the front of the chunk list.
-	 */
-	if (tp->t_busy_free == 0) {
-		lbcp = (xfs_log_busy_chunk_t*)
-		       kmem_alloc(sizeof(xfs_log_busy_chunk_t), KM_SLEEP);
-		ASSERT(lbcp != NULL);
-		/*
-		 * Initialize the chunk, and then
-		 * claim the first slot in the newly allocated chunk.
-		 */
-		XFS_LBC_INIT(lbcp);
-		XFS_LBC_CLAIM(lbcp, 0);
-		lbcp->lbc_unused = 1;
-		lbsp = XFS_LBC_SLOT(lbcp, 0);
-
-		/*
-		 * Link in the new chunk and update the free count.
-		 */
-		lbcp->lbc_next = tp->t_busy.lbc_next;
-		tp->t_busy.lbc_next = lbcp;
-		tp->t_busy_free = XFS_LIC_NUM_SLOTS - 1;
-
-		/* Initialize the descriptor and return it */
-		lbsp->lbc_busyp = busyp;
-		return lbsp;
-	}
-
-	/*
-	 * Find the free descriptor. It is somewhere in the chunklist
-	 * of descriptors.
-	 */
-	lbcp = &tp->t_busy;
-	while (lbcp != NULL) {
-		if (XFS_LBC_VACANCY(lbcp)) {
-			if (lbcp->lbc_unused <= XFS_LBC_MAX_SLOT) {
-				i = lbcp->lbc_unused;
-				break;
-			} else {
-				/* out-of-order vacancy */
-				cmn_err(CE_DEBUG, "OOO vacancy lbcp 0x%p\n", lbcp);
-				ASSERT(0);
-			}
-		}
-		lbcp = lbcp->lbc_next;
-	}
-	ASSERT(lbcp != NULL);
-	/*
-	 * If we find a free descriptor, claim it,
-	 * initialize it, and return it.
-	 */
-	XFS_LBC_CLAIM(lbcp, i);
-	if (lbcp->lbc_unused <= i) {
-		lbcp->lbc_unused = i + 1;
-	}
-	lbsp = XFS_LBC_SLOT(lbcp, i);
-	tp->t_busy_free--;
-	lbsp->lbc_busyp = busyp;
-	return lbsp;
-}
-
-
-/*
- * xfs_trans_free_busy
- * Free all of the busy lists from a transaction
- */
-void
-xfs_trans_free_busy(xfs_trans_t *tp)
-{
-	xfs_log_busy_chunk_t	*lbcp;
-	xfs_log_busy_chunk_t	*lbcq;
-
-	lbcp = tp->t_busy.lbc_next;
-	while (lbcp != NULL) {
-		lbcq = lbcp->lbc_next;
-		kmem_free(lbcp);
-		lbcp = lbcq;
-	}
-
-	XFS_LBC_INIT(&tp->t_busy);
-	tp->t_busy.lbc_unused = 0;
-}
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index fd50556..901dc0f 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -38,7 +38,6 @@ struct xfs_log_item_desc	*xfs_trans_next_item(struct xfs_trans *,
 void				xfs_trans_free_items(struct xfs_trans *, int);
 void				xfs_trans_unlock_items(struct xfs_trans *,
 							xfs_lsn_t);
-void				xfs_trans_free_busy(xfs_trans_t *tp);
 
 /*
  * AIL traversal cursor.
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] XFS: rename xfs_get_perag
  2008-10-07 22:09 ` [PATCH 1/7] XFS: rename xfs_get_perag Dave Chinner
@ 2008-10-08 18:41   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2008-10-08 18:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Oct 08, 2008 at 09:09:31AM +1100, Dave Chinner wrote:
> xfs_get_perag is really getting the perag that an inode belongs to
> based on it's inode number. Rename it appropriately so we can use
> xfs_perag_get() to get the perag from a provided ag number. Convert
> a number of sites over to using this interface.

Looks good.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree
  2008-10-07 22:09 ` [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree Dave Chinner
@ 2008-10-08 18:49   ` Christoph Hellwig
  2008-10-09  0:06     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2008-10-08 18:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> @@ -131,14 +131,8 @@ STATIC void
>  xfs_free_perag(
>  	xfs_mount_t	*mp)
>  {
> -	if (mp->m_perag) {
> -		int	agno;
> -
> -		for (agno = 0; agno < mp->m_maxagi; agno++)
> -			if (mp->m_perag[agno].pagb_list)
> -				kmem_free(mp->m_perag[agno].pagb_list);
> +	if (mp->m_perag)
>  		kmem_free(mp->m_perag);
> -	}

kmem_free(NULL) is fine, so no need for the if.  And with that there's
no need for this one-line wrapper and we can just do the free in the
caller.

>  typedef struct xfs_log_busy_slot {
> -	xfs_agnumber_t		lbc_ag;
> -	ushort			lbc_idx;	/* index in perag.busy[] */
> +	struct xfs_busy_extent	*lbc_busyp;
>  } xfs_log_busy_slot_t;

Just use xfs_busy_extent directly - there's only about a handful places
using xfs_log_busy_slot anyway.

>  xfs_log_busy_slot_t *
> -xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
> +xfs_trans_add_busy(
> +	xfs_trans_t		*tp,
> +	struct xfs_busy_extent	*busyp)

And this one can lose it's return value.  It's always the second
argmument and ignored by all callers anyway.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree
  2008-10-08 18:49   ` Christoph Hellwig
@ 2008-10-09  0:06     ` Dave Chinner
  0 siblings, 0 replies; 14+ messages in thread
From: Dave Chinner @ 2008-10-09  0:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Oct 08, 2008 at 02:49:28PM -0400, Christoph Hellwig wrote:
> > @@ -131,14 +131,8 @@ STATIC void
> >  xfs_free_perag(
> >  	xfs_mount_t	*mp)
> >  {
> > -	if (mp->m_perag) {
> > -		int	agno;
> > -
> > -		for (agno = 0; agno < mp->m_maxagi; agno++)
> > -			if (mp->m_perag[agno].pagb_list)
> > -				kmem_free(mp->m_perag[agno].pagb_list);
> > +	if (mp->m_perag)
> >  		kmem_free(mp->m_perag);
> > -	}
> 
> kmem_free(NULL) is fine, so no need for the if.  And with that there's
> no need for this one-line wrapper and we can just do the free in the
> caller.

Ok, I'll do that.

> >  typedef struct xfs_log_busy_slot {
> > -	xfs_agnumber_t		lbc_ag;
> > -	ushort			lbc_idx;	/* index in perag.busy[] */
> > +	struct xfs_busy_extent	*lbc_busyp;
> >  } xfs_log_busy_slot_t;
> 
> Just use xfs_busy_extent directly - there's only about a handful places
> using xfs_log_busy_slot anyway.
> 
> >  xfs_log_busy_slot_t *
> > -xfs_trans_add_busy(xfs_trans_t *tp, xfs_agnumber_t ag, xfs_extlen_t idx)
> > +xfs_trans_add_busy(
> > +	xfs_trans_t		*tp,
> > +	struct xfs_busy_extent	*busyp)
> 
> And this one can lose it's return value.  It's always the second
> argmument and ignored by all callers anyway.

I'm not concerned about this as the busy slot stuff in the struct
xfs_trans gets removed in the last patch of the series. I'm
half-tempted to integrate this one with the initial rbtree patch
as all that intermediate stuffing around just goes away.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, PATCH 0/7] XFS: dynamic busy extent tracking
  2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
                   ` (6 preceding siblings ...)
  2008-10-07 22:09 ` [PATCH 7/7] XFS: Simplify transaction busy extent tracking Dave Chinner
@ 2008-10-09 18:17 ` Martin Steigerwald
  2008-10-09 22:33   ` Dave Chinner
  7 siblings, 1 reply; 14+ messages in thread
From: Martin Steigerwald @ 2008-10-09 18:17 UTC (permalink / raw)
  To: linux-xfs


Hi Dave,

Am Mittwoch 08 Oktober 2008 schrieb Dave Chinner:
> The busy extent tracking in XFS is currently very static and has
> some performance issues. We can only track 128 busy extents per AG,
> and when we overflow this we fall back to synchronous transactions.
> Also, every time we re-use a busy extent we cause a synchronous log
> force, which stops all allocation and freeing in that AG while the
> log force is in progress.

Could this accelerate

tar -xf linux-2.6.26.tar.gz
rm -r linux-2.6.26

?

A student in the Linux Performance Tuning course I hold this week compared 
this with ext3, even with the improved mkfs.xfs options (but without 
lazy-count=1, cause mkfs.xfs from Debian Etch is too old) and even with 
noop as IO scheduler. AFAIR XFS took roughly 3-4 times as long as Ext3, I 
did not note the exact numbers. This was with 2.6.25. I can repeat the 
test locally with 2.6.26.5 if wanted.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, PATCH 0/7] XFS: dynamic busy extent tracking
  2008-10-09 18:17 ` [RFC, PATCH 0/7] XFS: dynamic " Martin Steigerwald
@ 2008-10-09 22:33   ` Dave Chinner
  2008-10-10  7:11     ` Martin Steigerwald
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2008-10-09 22:33 UTC (permalink / raw)
  To: Martin Steigerwald; +Cc: linux-xfs

On Thu, Oct 09, 2008 at 08:17:32PM +0200, Martin Steigerwald wrote:
> 
> Hi Dave,
> 
> Am Mittwoch 08 Oktober 2008 schrieb Dave Chinner:
> > The busy extent tracking in XFS is currently very static and has
> > some performance issues. We can only track 128 busy extents per AG,
> > and when we overflow this we fall back to synchronous transactions.
> > Also, every time we re-use a busy extent we cause a synchronous log
> > force, which stops all allocation and freeing in that AG while the
> > log force is in progress.
> 
> Could this accelerate
> 
> tar -xf linux-2.6.26.tar.gz
> rm -r linux-2.6.26

Not really. There is very little reuse of freed extents in this
type of workload. It's when you have 10 users on the filesystem
all doing that sort of thing that it will make a difference.

> ?
> 
> A student in the Linux Performance Tuning course I hold this week compared 
> this with ext3, even with the improved mkfs.xfs options (but without 
> lazy-count=1, cause mkfs.xfs from Debian Etch is too old) and even with 
> noop as IO scheduler. AFAIR XFS took roughly 3-4 times as long as Ext3, I 
> did not note the exact numbers. This was with 2.6.25. I can repeat the 
> test locally with 2.6.26.5 if wanted.

Yes, that's par for the course. XFS journals transactions almost
immediately, whereas ext3 gathers lots of changees in memory and
checkpoints infrequently. Hence XFS writes a lot more to the
journal and is hence slower. The dynamic extent tracking is a
necessary step to moving the XFS journalling to a more
checkpoint-like setup which would perform much less journal
I/O and hence run substantially faster....

See the asynchronous transaction aggregation section here:

http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Journal_Overhead

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [RFC, PATCH 0/7] XFS: dynamic busy extent tracking
  2008-10-09 22:33   ` Dave Chinner
@ 2008-10-10  7:11     ` Martin Steigerwald
  0 siblings, 0 replies; 14+ messages in thread
From: Martin Steigerwald @ 2008-10-10  7:11 UTC (permalink / raw)
  To: linux-xfs; +Cc: Dave Chinner

Am Freitag 10 Oktober 2008 schrieben Sie:
> On Thu, Oct 09, 2008 at 08:17:32PM +0200, Martin Steigerwald wrote:
> > Hi Dave,

[...]

> > A student in the Linux Performance Tuning course I hold this week
> > compared this with ext3, even with the improved mkfs.xfs options (but
> > without lazy-count=1, cause mkfs.xfs from Debian Etch is too old) and
> > even with noop as IO scheduler. AFAIR XFS took roughly 3-4 times as
> > long as Ext3, I did not note the exact numbers. This was with 2.6.25.
> > I can repeat the test locally with 2.6.26.5 if wanted.
>
> Yes, that's par for the course. XFS journals transactions almost
> immediately, whereas ext3 gathers lots of changees in memory and
> checkpoints infrequently. Hence XFS writes a lot more to the
> journal and is hence slower. The dynamic extent tracking is a
> necessary step to moving the XFS journalling to a more
> checkpoint-like setup which would perform much less journal
> I/O and hence run substantially faster....
>
> See the asynchronous transaction aggregation section here:
>
> http://xfs.org/index.php/Improving_Metadata_Performance_By_Reducing_Jou
>rnal_Overhead

Thanks for the info Dave.

I still have your three mails about future improvements on XFS on my 
reading list. I just read a bit of the first one.

Ciao,
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2008-10-10  7:09 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-10-07 22:09 [RFC, PATCH 0/7] XFS: dynamic busy extent tracking Dave Chinner
2008-10-07 22:09 ` [PATCH 1/7] XFS: rename xfs_get_perag Dave Chinner
2008-10-08 18:41   ` Christoph Hellwig
2008-10-07 22:09 ` [PATCH 2/7] XFS: replace fixed size busy extent array with an rbtree Dave Chinner
2008-10-08 18:49   ` Christoph Hellwig
2008-10-09  0:06     ` Dave Chinner
2008-10-07 22:09 ` [PATCH 3/7] XFS: Don't immediately reallocate busy extents Dave Chinner
2008-10-07 22:09 ` [PATCH 4/7] XFS: Don't use log forces when busy extents are allocated Dave Chinner
2008-10-07 22:09 ` [PATCH 5/7] XFS: Do not classify freed allocation btree blocks as busy Dave Chinner
2008-10-07 22:09 ` [PATCH 6/7] XFS: Avoid busy extent ranges rather than the entire extent Dave Chinner
2008-10-07 22:09 ` [PATCH 7/7] XFS: Simplify transaction busy extent tracking Dave Chinner
2008-10-09 18:17 ` [RFC, PATCH 0/7] XFS: dynamic " Martin Steigerwald
2008-10-09 22:33   ` Dave Chinner
2008-10-10  7:11     ` Martin Steigerwald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox