* [PATCH 0/5] improved busy extent handling
@ 2011-03-28 21:06 Christoph Hellwig
2011-03-28 21:06 ` [PATCH 1/5] xfs: optimize AGFL refills Christoph Hellwig
` (5 more replies)
0 siblings, 6 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
This series optimizes how XFS deals with busy extents. It starts to
track them exactly, and allows reuses where possible (metadata to metadata)
or else tries to avoid busy extents during allocations. This means
we don't have a single log force due to busy extents during either
xfstests, compilebench or postmark on my testsystem, which can easily
be tracked using the new tracepoints added in the last patch.
This is a repost of the previous series and should address all review
comments. The discard support, which relies on the exact busy extent
tracking has been dropped temporarily until I can fix up some issues
that I found during testing.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/5] xfs: optimize AGFL refills
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
@ 2011-03-28 21:06 ` Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 2/5] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-optimize-freelist-refills --]
[-- Type: text/plain, Size: 3716 bytes --]
While we need to make sure we do not reuse busy extents, there is no need
to force out busy extents when moving them between the AGFL and the
freespace btree as we still take care of that when doing the real allocation.
To avoid the log force when just moving extents from the different free
space tracking structures, move the busy search out of
xfs_alloc_get_freelist into the callers that need it, and move the busy
list insert from xfs_free_ag_extent which is used both by AGFL refills
and real allocation to xfs_free_extent, which is only used by the latter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-27 23:52:29.004480044 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-28 13:42:50.682839138 +0200
@@ -1326,6 +1326,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
+ if (xfs_alloc_busy_search(args->mp, args->agno, fbno, 1))
+ xfs_trans_set_sync(args->tp);
if (args->userdata) {
xfs_buf_t *bp;
@@ -1617,18 +1619,6 @@ xfs_free_ag_extent(
trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
- /*
- * Since blocks move to the free list without the coordination
- * used in xfs_bmap_finish, we can't allow block to be available
- * for reallocation and non-transaction writing (user data)
- * until we know that the transaction that moved it to the free
- * list is permanently on disk. We track the blocks by declaring
- * these blocks as "busy"; the busy list is maintained on a per-ag
- * basis and each transaction records which entries should be removed
- * when the iclog commits to disk. If a busy block is allocated,
- * the iclog is pushed up to the LSN that freed the block.
- */
- xfs_alloc_busy_insert(tp, agno, bno, len);
return 0;
error0:
@@ -1923,21 +1913,6 @@ xfs_alloc_get_freelist(
xfs_alloc_log_agf(tp, agbp, logflags);
*bnop = bno;
- /*
- * As blocks are freed, they are added to the per-ag busy list and
- * remain there until the freeing transaction is committed to disk.
- * Now that we have allocated blocks, this list must be searched to see
- * if a block is being reused. If one is, then the freeing transaction
- * must be pushed to disk before this transaction.
- *
- * We do this by setting the current transaction to a sync transaction
- * which guarantees that the freeing transaction is on disk before this
- * transaction. This is done instead of a synchronous log force here so
- * that we don't sit and wait with the AGF locked in the transaction
- * during the log force.
- */
- if (xfs_alloc_busy_search(mp, be32_to_cpu(agf->agf_seqno), bno, 1))
- xfs_trans_set_sync(tp);
return 0;
}
@@ -2407,6 +2382,8 @@ xfs_free_extent(
be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length));
#endif
error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
+ if (!error)
+ xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
error0:
xfs_perag_put(args.pag);
return error;
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-27 23:52:29.008480632 +0200
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-28 13:42:49.462839006 +0200
@@ -94,6 +94,8 @@ xfs_allocbt_alloc_block(
*stat = 0;
return 0;
}
+ if (xfs_alloc_busy_search(cur->bc_mp, cur->bc_private.a.agno, bno, 1))
+ xfs_trans_set_sync(cur->bc_tp);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 2/5] xfs: do not immediately reuse busy extent ranges
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
2011-03-28 21:06 ` [PATCH 1/5] xfs: optimize AGFL refills Christoph Hellwig
@ 2011-03-28 21:06 ` Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 3/5] xfs: exact busy extent tracking Christoph Hellwig
` (3 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-skip-busy-extents --]
[-- Type: text/plain, Size: 21067 bytes --]
Every time we reallocate a busy extent, we cause a synchronous log force
to occur to ensure the freeing transaction is on disk before we continue
and use the newly allocated extent. This is extremely sub-optimal as we
have to mark every transaction with blocks that get reused as synchronous.
Instead of searching the busy extent list after deciding on the extent to
allocate, check each candidate extent during the allocation decisions as
to whether they are in the busy list. If they are in the busy list, we
trim the busy range out of the extent we have found and determine if that
trimmed range is still OK for allocation. In many cases, this check can
be incorporated into the allocation extent alignment code which already
does trimming of the found extent before determining if it is a valid
candidate for allocation.
Based on two earlier patches from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-28 14:17:43.358838528 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-28 16:02:05.469343820 +0200
@@ -41,19 +41,13 @@
#define XFSA_FIXUP_BNO_OK 1
#define XFSA_FIXUP_CNT_OK 2
-/*
- * Prototypes for per-ag allocation routines
- */
-
STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
- xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
-
-/*
- * Internal functions.
- */
+ xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+STATIC void xfs_alloc_busy_trim(struct xfs_alloc_arg *,
+ xfs_agblock_t, xfs_extlen_t, xfs_agblock_t *, xfs_extlen_t *);
/*
* Lookup the record equal to [bno, len] in the btree given by cur.
@@ -154,19 +148,21 @@ xfs_alloc_compute_aligned(
xfs_extlen_t *reslen) /* result length */
{
xfs_agblock_t bno;
- xfs_extlen_t diff;
xfs_extlen_t len;
- if (args->alignment > 1 && foundlen >= args->minlen) {
- bno = roundup(foundbno, args->alignment);
- diff = bno - foundbno;
- len = diff >= foundlen ? 0 : foundlen - diff;
+ /* Trim busy sections out of found extent */
+ xfs_alloc_busy_trim(args, foundbno, foundlen, &bno, &len);
+
+ if (args->alignment > 1 && len >= args->minlen) {
+ xfs_agblock_t aligned_bno = roundup(bno, args->alignment);
+ xfs_extlen_t diff = aligned_bno - bno;
+
+ *resbno = aligned_bno;
+ *reslen = diff >= len ? 0 : len - diff;
} else {
- bno = foundbno;
- len = foundlen;
+ *resbno = bno;
+ *reslen = len;
}
- *resbno = bno;
- *reslen = len;
}
/*
@@ -541,16 +537,8 @@ xfs_alloc_ag_vextent(
if (error)
return error;
- /*
- * Search the busylist for these blocks and mark the
- * transaction as synchronous if blocks are found. This
- * avoids the need to block due to a synchronous log
- * force to ensure correct ordering as the synchronous
- * transaction will guarantee that for us.
- */
- if (xfs_alloc_busy_search(args->mp, args->agno,
- args->agbno, args->len))
- xfs_trans_set_sync(args->tp);
+ ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
+ args->agbno, args->len));
}
if (!args->isfl) {
@@ -577,14 +565,14 @@ xfs_alloc_ag_vextent_exact(
{
xfs_btree_cur_t *bno_cur;/* by block-number btree cursor */
xfs_btree_cur_t *cnt_cur;/* by count btree cursor */
- xfs_agblock_t end; /* end of allocated extent */
int error;
xfs_agblock_t fbno; /* start block of found extent */
- xfs_agblock_t fend; /* end block of found extent */
xfs_extlen_t flen; /* length of found extent */
+ xfs_agblock_t tbno; /* start block of trimmed extent */
+ xfs_extlen_t tlen; /* length of trimmed extent */
+ xfs_agblock_t tend; /* end block of trimmed extent */
+ xfs_agblock_t end; /* end of allocated extent */
int i; /* success/failure of operation */
- xfs_agblock_t maxend; /* end of maximal extent */
- xfs_agblock_t minend; /* end of minimal extent */
xfs_extlen_t rlen; /* length of returned extent */
ASSERT(args->alignment == 1);
@@ -614,14 +602,22 @@ xfs_alloc_ag_vextent_exact(
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
ASSERT(fbno <= args->agbno);
- minend = args->agbno + args->minlen;
- maxend = args->agbno + args->maxlen;
- fend = fbno + flen;
/*
- * Give up if the freespace isn't long enough for the minimum request.
+ * Check for overlapping busy extents.
+ */
+ xfs_alloc_busy_trim(args, fbno, flen, &tbno, &tlen);
+
+ /*
+ * Give up if the start of the extent is busy, or the freespace isn't
+ * long enough for the minimum request.
*/
- if (fend < minend)
+ if (tbno > args->agbno)
+ goto not_found;
+ if (tlen < args->minlen)
+ goto not_found;
+ tend = tbno + tlen;
+ if (tend < args->agbno + args->minlen)
goto not_found;
/*
@@ -630,14 +626,14 @@ xfs_alloc_ag_vextent_exact(
*
* Fix the length according to mod and prod if given.
*/
- end = XFS_AGBLOCK_MIN(fend, maxend);
+ end = XFS_AGBLOCK_MIN(tend, args->agbno + args->maxlen);
args->len = end - args->agbno;
xfs_alloc_fix_len(args);
if (!xfs_alloc_fix_minleft(args))
goto not_found;
rlen = args->len;
- ASSERT(args->agbno + rlen <= fend);
+ ASSERT(args->agbno + rlen <= tend);
end = args->agbno + rlen;
/*
@@ -686,11 +682,11 @@ xfs_alloc_find_best_extent(
struct xfs_btree_cur **scur, /* searching cursor */
xfs_agblock_t gdiff, /* difference for search comparison */
xfs_agblock_t *sbno, /* extent found by search */
- xfs_extlen_t *slen,
- xfs_extlen_t *slena, /* aligned length */
+ xfs_extlen_t *slen, /* extent length */
+ xfs_agblock_t *sbnoa, /* aligned extent found by search */
+ xfs_extlen_t *slena, /* aligned extent length */
int dir) /* 0 = search right, 1 = search left */
{
- xfs_agblock_t bno;
xfs_agblock_t new;
xfs_agblock_t sdiff;
int error;
@@ -708,16 +704,16 @@ xfs_alloc_find_best_extent(
if (error)
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(args, *sbno, *slen, &bno, slena);
+ xfs_alloc_compute_aligned(args, *sbno, *slen, sbnoa, slena);
/*
* The good extent is closer than this one.
*/
if (!dir) {
- if (bno >= args->agbno + gdiff)
+ if (*sbnoa >= args->agbno + gdiff)
goto out_use_good;
} else {
- if (bno <= args->agbno - gdiff)
+ if (*sbnoa <= args->agbno - gdiff)
goto out_use_good;
}
@@ -729,8 +725,8 @@ xfs_alloc_find_best_extent(
xfs_alloc_fix_len(args);
sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, *sbno,
- *slen, &new);
+ args->alignment, *sbnoa,
+ *slena, &new);
/*
* Choose closer size and invalidate other cursor.
@@ -780,7 +776,7 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t gtbnoa; /* aligned ... */
xfs_extlen_t gtdiff; /* difference to right side entry */
xfs_extlen_t gtlen; /* length of right side entry */
- xfs_extlen_t gtlena = 0; /* aligned ... */
+ xfs_extlen_t gtlena; /* aligned ... */
xfs_agblock_t gtnew; /* useful start bno of right side */
int error; /* error code */
int i; /* result code, temporary */
@@ -789,9 +785,10 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t ltbnoa; /* aligned ... */
xfs_extlen_t ltdiff; /* difference to left side entry */
xfs_extlen_t ltlen; /* length of left side entry */
- xfs_extlen_t ltlena = 0; /* aligned ... */
+ xfs_extlen_t ltlena; /* aligned ... */
xfs_agblock_t ltnew; /* useful start bno of left side */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
#if defined(DEBUG) && defined(__KERNEL__)
/*
* Randomly don't execute the first algorithm.
@@ -800,13 +797,20 @@ xfs_alloc_ag_vextent_near(
dofirst = random32() & 1;
#endif
+
+restart:
+ bno_cur_lt = NULL;
+ bno_cur_gt = NULL;
+ ltlen = 0;
+ gtlena = 0;
+ ltlena = 0;
+
/*
* Get a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
- ltlen = 0;
- bno_cur_lt = bno_cur_gt = NULL;
+
/*
* See if there are any free extents as big as maxlen.
*/
@@ -822,11 +826,13 @@ xfs_alloc_ag_vextent_near(
goto error0;
if (i == 0 || ltlen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_near_noentry(args);
return 0;
}
ASSERT(i == 1);
}
args->wasfromfl = 0;
+
/*
* First algorithm.
* If the requested extent is large wrt the freespaces available
@@ -890,7 +896,7 @@ xfs_alloc_ag_vextent_near(
if (args->len < blen)
continue;
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
if (ltnew != NULLAGBLOCK &&
(args->len > blen || ltdiff < bdiff)) {
bdiff = ltdiff;
@@ -1042,11 +1048,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
xfs_alloc_fix_len(args);
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_lt, &bno_cur_gt,
- ltdiff, >bno, >len, >lena,
+ ltdiff, >bno, >len,
+ >bnoa, >lena,
0 /* search right */);
} else {
ASSERT(gtlena >= args->minlen);
@@ -1057,11 +1064,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
xfs_alloc_fix_len(args);
gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, gtbno, gtlen, >new);
+ args->alignment, gtbnoa, gtlena, >new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_gt, &bno_cur_lt,
- gtdiff, <bno, <len, <lena,
+ gtdiff, <bno, <len,
+ <bnoa, <lena,
1 /* search left */);
}
@@ -1073,6 +1081,12 @@ xfs_alloc_ag_vextent_near(
* If we couldn't get anything, give up.
*/
if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+ if (!forced++) {
+ trace_xfs_alloc_near_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+
trace_xfs_alloc_size_neither(args);
args->agbno = NULLAGBLOCK;
return 0;
@@ -1107,12 +1121,13 @@ xfs_alloc_ag_vextent_near(
return 0;
}
rlen = args->len;
- (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, ltbno,
- ltlen, <new);
+ (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
+ ltbnoa, ltlena, <new);
ASSERT(ltnew >= ltbno);
- ASSERT(ltnew + rlen <= ltbno + ltlen);
+ ASSERT(ltnew + rlen <= ltbnoa + ltlena);
ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
args->agbno = ltnew;
+
if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
ltnew, rlen, XFSA_FIXUP_BNO_OK)))
goto error0;
@@ -1155,26 +1170,35 @@ xfs_alloc_ag_vextent_size(
int i; /* temp status variable */
xfs_agblock_t rbno; /* returned block number */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
+restart:
/*
* Allocate and initialize a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
bno_cur = NULL;
+
/*
* Look for an entry >= maxlen+alignment-1 blocks.
*/
if ((error = xfs_alloc_lookup_ge(cnt_cur, 0,
args->maxlen + args->alignment - 1, &i)))
goto error0;
+
/*
- * If none, then pick up the last entry in the tree unless the
- * tree is empty.
- */
- if (!i) {
- if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &fbno,
- &flen, &i)))
+ * If none or we have busy extents that we cannot allocate from, then
+ * we have to settle for a smaller extent. In the case that there are
+ * no large extents, this will return the last entry in the tree unless
+ * the tree is empty. In the case that there are only busy large
+ * extents, this will return the largest small extent unless there
+ * are no smaller extents available.
+ */
+ if (!i || forced > 1) {
+ error = xfs_alloc_ag_vextent_small(args, cnt_cur,
+ &fbno, &flen, &i);
+ if (error)
goto error0;
if (i == 0 || flen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
@@ -1182,22 +1206,56 @@ xfs_alloc_ag_vextent_size(
return 0;
}
ASSERT(i == 1);
- }
- /*
- * There's a freespace as big as maxlen+alignment-1, get it.
- */
- else {
- if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i)))
- goto error0;
- XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- }
+ xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
+ } else {
+ /*
+ * Search for a non-busy extent that is large enough.
+ * If we are at low space, don't check, or if we fall of
+ * the end of the btree, turn off the busy check and
+ * restart.
+ */
+ for (;;) {
+ error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
+ if (error)
+ goto error0;
+ XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+
+ xfs_alloc_compute_aligned(args, fbno, flen,
+ &rbno, &rlen);
+
+ if (rlen >= args->maxlen)
+ break;
+
+ error = xfs_btree_increment(cnt_cur, 0, &i);
+ if (error)
+ goto error0;
+ if (i == 0) {
+ /*
+ * Our only valid extents must have been busy.
+ * Make it unbusy by forcing the log out and
+ * retrying. If we've been here before, forcing
+ * the log isn't making the extents available,
+ * which means they have probably been freed in
+ * this transaction. In that case, we have to
+ * give up on them and we'll attempt a minlen
+ * allocation the next time around.
+ */
+ xfs_btree_del_cursor(cnt_cur,
+ XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ if (!forced++)
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ }
+ }
+
/*
* In the first case above, we got the last entry in the
* by-size btree. Now we check to see if the space hits maxlen
* once aligned; if not, we search left for something better.
* This can't happen in the second case above.
*/
- xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
(rlen <= flen && rbno + rlen <= fbno + flen), error0);
@@ -1251,13 +1309,19 @@ xfs_alloc_ag_vextent_size(
* Fix up the length.
*/
args->len = rlen;
- xfs_alloc_fix_len(args);
- if (rlen < args->minlen || !xfs_alloc_fix_minleft(args)) {
- xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
- trace_xfs_alloc_size_nominleft(args);
- args->agbno = NULLAGBLOCK;
- return 0;
+ if (rlen < args->minlen) {
+ if (!forced++) {
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ goto out_nominleft;
}
+ xfs_alloc_fix_len(args);
+
+ if (!xfs_alloc_fix_minleft(args))
+ goto out_nominleft;
rlen = args->len;
XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0);
/*
@@ -1287,6 +1351,12 @@ error0:
if (bno_cur)
xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
return error;
+
+out_nominleft:
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_nominleft(args);
+ args->agbno = NULLAGBLOCK;
+ return 0;
}
/*
@@ -2634,6 +2704,178 @@ xfs_alloc_busy_search(
return match;
}
+/*
+ * For a given extent [fbno, flen], search the busy extent list
+ * to find a subset of the extent that is not busy.
+ */
+STATIC void
+xfs_alloc_busy_trim(
+ struct xfs_alloc_arg *args,
+ xfs_agblock_t bno,
+ xfs_extlen_t len,
+ xfs_agblock_t *rbno,
+ xfs_extlen_t *rlen)
+{
+ xfs_agblock_t fbno = bno;
+ xfs_extlen_t flen = len;
+ struct rb_node *rbp;
+
+ ASSERT(flen > 0);
+
+ spin_lock(&args->pag->pagb_lock);
+ rbp = args->pag->pagb_tree.rb_node;
+ while (rbp && flen >= args->minlen) {
+ struct xfs_busy_extent *busyp =
+ rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ xfs_agblock_t fend = fbno + flen;
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (fend <= bbno) {
+ rbp = rbp->rb_left;
+ continue;
+ } else if (fbno >= bend) {
+ rbp = rbp->rb_right;
+ continue;
+ }
+
+ if (bbno <= fbno) {
+ /* start overlap */
+
+ /*
+ * Case 1:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ *
+ * Case 2:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 3:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 4:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------+
+ * fbno fend
+ *
+ * No unbusy region in extent, return failure.
+ */
+ if (fend <= bend)
+ goto fail;
+
+ /*
+ * Case 5:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +----------------------+
+ * fbno fend
+ *
+ * Case 6:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * fbno fend
+ */
+ fbno = bend;
+ } else if (bend >= fend) {
+ /* end overlap */
+
+ /*
+ * Case 7:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +------------------+
+ * fbno fend
+ *
+ * Case 8:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * fbno fend
+ */
+ fend = bbno;
+ } else {
+ /* middle overlap */
+
+ /*
+ * Case 9:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------------------------+
+ * fbno fend
+ *
+ * Can be trimmed to:
+ * +-------+ OR +-------+
+ * fbno fend fbno fend
+ *
+ * Backward allocation leads to significant
+ * fragmentation of directories, which degrades
+ * directory performance, therefore we always want to
+ * choose the option that produces forward allocation
+ * patterns.
+ * Preferring the lower bno extent will make the next
+ * request use "fend" as the start of the next
+ * allocation; if the segment is no longer busy at
+ * that point, we'll get a contiguous allocation, but
+ * even if it is still busy, we will get a forward
+ * allocation.
+ * We try to avoid choosing the segment at "bend",
+ * because that can lead to the next allocation
+ * taking the segment at "fbno", which would be a
+ * backward allocation. We only use the segment at
+ * "fbno" if it is much larger than the current
+ * requested size, because in that case there's a
+ * good chance subsequent allocations will be
+ * contiguous.
+ */
+ if (bbno - fbno >= args->maxlen) {
+ /* left candidate fits perfect */
+ fend = bbno;
+ } else if (fend - bend >= args->maxlen * 4) {
+ /* right candidate has enough free space */
+ fbno = bend;
+ } else if (bbno - fbno >= args->minlen) {
+ /* left candidate fits minimum requirement */
+ fend = bbno;
+ } else {
+ goto fail;
+ }
+ }
+
+ flen = fend - fbno;
+ }
+ spin_unlock(&args->pag->pagb_lock);
+
+ *rbno = fbno;
+ *rlen = flen;
+ return;
+fail:
+ /*
+ * Return a zero extent length as failure indications. All callers
+ * re-check if the trimmed extent satisfies the minlen requirement.
+ */
+ spin_unlock(&args->pag->pagb_lock);
+ *rbno = fbno;
+ *rlen = 0;
+}
+
void
xfs_alloc_busy_clear(
struct xfs_mount *mp,
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h 2011-03-28 14:11:07.546838286 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h 2011-03-28 16:01:54.018838901 +0200
@@ -1433,11 +1433,14 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first)
DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_size_neither);
DEFINE_ALLOC_EVENT(xfs_alloc_size_noentry);
DEFINE_ALLOC_EVENT(xfs_alloc_size_nominleft);
DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
DEFINE_ALLOC_EVENT(xfs_alloc_size_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_size_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_small_freelist);
DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 3/5] xfs: exact busy extent tracking
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
2011-03-28 21:06 ` [PATCH 1/5] xfs: optimize AGFL refills Christoph Hellwig
2011-03-28 21:06 ` [PATCH 2/5] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-28 21:06 ` Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 4/5] xfs: allow reusing busy extents where safe Christoph Hellwig
` (2 subsequent siblings)
5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-better-busy-extent-tracking --]
[-- Type: text/plain, Size: 16729 bytes --]
Update the extent tree in case we have to reuse a busy extent, so that it
always is kept uptodate. This is done by replacing the busy list searches
with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
in case of a reuse. Also replace setting transactions to sync with forcing
the log out in case we found a busy extent to reuse. This makes the code a
lot more simple, and is required for discard support later on. While it
will cause performance regressios with just this patch applied, the impact
is more than mitigated by the next patch in the series.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-28 16:06:17.806838570 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-28 16:09:32.489839705 +0200
@@ -1396,8 +1396,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
- if (xfs_alloc_busy_search(args->mp, args->agno, fbno, 1))
- xfs_trans_set_sync(args->tp);
+ xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1);
+
if (args->userdata) {
xfs_buf_t *bp;
@@ -2459,100 +2459,6 @@ error0:
return error;
}
-
-/*
- * AG Busy list management
- * The busy list contains block ranges that have been freed but whose
- * transactions have not yet hit disk. If any block listed in a busy
- * list is reused, the transaction that freed it must be forced to disk
- * before continuing to use the block.
- *
- * xfs_alloc_busy_insert - add to the per-ag busy list
- * xfs_alloc_busy_clear - remove an item from the per-ag busy list
- * xfs_alloc_busy_search - search for a busy extent
- */
-
-/*
- * Insert a new extent into the busy tree.
- *
- * The busy extent tree is indexed by the start block of the busy extent.
- * there can be multiple overlapping ranges in the busy extent tree but only
- * ever one entry at a given start block. The reason for this is that
- * multi-block extents can be freed, then smaller chunks of that extent
- * allocated and freed again before the first transaction commit is on disk.
- * If the exact same start block is freed a second time, we have to wait for
- * that busy extent to pass out of the tree before the new extent is inserted.
- * There are two main cases we have to handle here.
- *
- * The first case is a transaction that triggers a "free - allocate - free"
- * cycle. This can occur during btree manipulations as a btree block is freed
- * to the freelist, then allocated from the free list, then freed again. In
- * this case, the second extxpnet free is what triggers the duplicate and as
- * such the transaction IDs should match. Because the extent was allocated in
- * this transaction, the transaction must be marked as synchronous. This is
- * true for all cases where the free/alloc/free occurs in the one transaction,
- * hence the addition of the ASSERT(tp->t_flags & XFS_TRANS_SYNC) to this case.
- * This serves to catch violations of the second case quite effectively.
- *
- * The second case is where the free/alloc/free occur in different
- * transactions. In this case, the thread freeing the extent the second time
- * can't mark the extent busy immediately because it is already tracked in a
- * transaction that may be committing. When the log commit for the existing
- * busy extent completes, the busy extent will be removed from the tree. If we
- * allow the second busy insert to continue using that busy extent structure,
- * it can be freed before this transaction is safely in the log. Hence our
- * only option in this case is to force the log to remove the existing busy
- * extent from the list before we insert the new one with the current
- * transaction ID.
- *
- * The problem we are trying to avoid in the free-alloc-free in separate
- * transactions is most easily described with a timeline:
- *
- * Thread 1 Thread 2 Thread 3 xfslogd
- * xact alloc
- * free X
- * mark busy
- * commit xact
- * free xact
- * xact alloc
- * alloc X
- * busy search
- * mark xact sync
- * commit xact
- * free xact
- * force log
- * checkpoint starts
- * ....
- * xact alloc
- * free X
- * mark busy
- * finds match
- * *** KABOOM! ***
- * ....
- * log IO completes
- * unbusy X
- * checkpoint completes
- *
- * By issuing a log force in thread 3 @ "KABOOM", the thread will block until
- * the checkpoint completes, and the busy extent it matched will have been
- * removed from the tree when it is woken. Hence it can then continue safely.
- *
- * However, to ensure this matching process is robust, we need to use the
- * transaction ID for identifying transaction, as delayed logging results in
- * the busy extent and transaction lifecycles being different. i.e. the busy
- * extent is active for a lot longer than the transaction. Hence the
- * transaction structure can be freed and reallocated, then mark the same
- * extent busy again in the new transaction. In this case the new transaction
- * will have a different tid but can have the same address, and hence we need
- * to check against the tid.
- *
- * Future: for delayed logging, we could avoid the log force if the extent was
- * first freed in the current checkpoint sequence. This, however, requires the
- * ability to pin the current checkpoint in memory until this transaction
- * commits to ensure that both the original free and the current one combine
- * logically into the one checkpoint. If the checkpoint sequences are
- * different, however, we still need to wait on a log force.
- */
void
xfs_alloc_busy_insert(
struct xfs_trans *tp,
@@ -2564,9 +2470,7 @@ xfs_alloc_busy_insert(
struct xfs_busy_extent *busyp;
struct xfs_perag *pag;
struct rb_node **rbp;
- struct rb_node *parent;
- int match;
-
+ struct rb_node *parent = NULL;
new = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
if (!new) {
@@ -2583,66 +2487,28 @@ xfs_alloc_busy_insert(
new->agno = agno;
new->bno = bno;
new->length = len;
- new->tid = xfs_log_get_trans_ident(tp);
-
INIT_LIST_HEAD(&new->list);
/* trace before insert to be able to see failed inserts */
trace_xfs_alloc_busy(tp, agno, bno, len, 0);
pag = xfs_perag_get(tp->t_mountp, new->agno);
-restart:
spin_lock(&pag->pagb_lock);
rbp = &pag->pagb_tree.rb_node;
- parent = NULL;
- busyp = NULL;
- match = 0;
- while (*rbp && match >= 0) {
+ while (*rbp) {
parent = *rbp;
busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
if (new->bno < busyp->bno) {
- /* may overlap, but exact start block is lower */
rbp = &(*rbp)->rb_left;
- if (new->bno + new->length > busyp->bno)
- match = busyp->tid == new->tid ? 1 : -1;
+ ASSERT(new->bno + new->length <= busyp->bno);
} else if (new->bno > busyp->bno) {
- /* may overlap, but exact start block is higher */
rbp = &(*rbp)->rb_right;
- if (bno < busyp->bno + busyp->length)
- match = busyp->tid == new->tid ? 1 : -1;
+ ASSERT(bno >= busyp->bno + busyp->length);
} else {
- match = busyp->tid == new->tid ? 1 : -1;
- break;
+ ASSERT(0);
}
}
- if (match < 0) {
- /* overlap marked busy in different transaction */
- spin_unlock(&pag->pagb_lock);
- xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
- goto restart;
- }
- if (match > 0) {
- /*
- * overlap marked busy in same transaction. Update if exact
- * start block match, otherwise combine the busy extents into
- * a single range.
- */
- if (busyp->bno == new->bno) {
- busyp->length = max(busyp->length, new->length);
- spin_unlock(&pag->pagb_lock);
- ASSERT(tp->t_flags & XFS_TRANS_SYNC);
- xfs_perag_put(pag);
- kmem_free(new);
- return;
- }
- rb_erase(&busyp->rb_node, &pag->pagb_tree);
- new->length = max(busyp->bno + busyp->length,
- new->bno + new->length) -
- min(busyp->bno, new->bno);
- new->bno = min(busyp->bno, new->bno);
- } else
- busyp = NULL;
rb_link_node(&new->rb_node, parent, rbp);
rb_insert_color(&new->rb_node, &pag->pagb_tree);
@@ -2650,7 +2516,6 @@ restart:
list_add(&new->list, &tp->t_busy);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
- kmem_free(busyp);
}
/*
@@ -2705,6 +2570,162 @@ xfs_alloc_busy_search(
}
/*
+ * The found free extent [fbno, fend] overlaps part or all of the given busy
+ * extent. If the overlap covers the beginning, the end, or all of the busy
+ * extent, the overlapping portion can be made unbusy and used for the
+ * allocation. We can't split a busy extent because we can't modify a
+ * transaction/CIL context busy list, but we can update an entries block
+ * number or length.
+ *
+ * The caller will force the log and re-check the busy list after returning
+ * from this function.
+ */
+STATIC void
+xfs_alloc_busy_update_extent(
+ struct xfs_perag *pag,
+ struct xfs_busy_extent *busyp,
+ xfs_agblock_t fbno,
+ xfs_agblock_t fend)
+{
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (bbno < fbno && bend > fend) {
+ /*
+ * Case 1:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ */
+
+ /*
+ * We would have to split the busy extent to be able to track
+ * it correct, which we cannot do because we would have to
+ * modify the list of busy extents attached to the transaction
+ * or CIL context, which is immutable.
+ *
+ * Let the caller force out the log to clear the busy extents
+ * and retry the search.
+ */
+ } else if (bbno >= fbno && bend <= fend) {
+ /*
+ * Case 2:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------+
+ * fbno fend
+ *
+ * Case 3:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Case 4:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Case 5:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------------------------+
+ * fbno fend
+ *
+ */
+
+ /*
+ * The busy extent is fully covered by the extent we are
+ * allocating, and can simply be removed from the rbtree.
+ * However we cannot remove it from the immutable list
+ * tracking busy extents in the transaction or CIL context,
+ * so set the length to zero to mark it invalid.
+ */
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ busyp->length = 0;
+ } else if (bbno == fbno) {
+ /*
+ * Case 6:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ *
+ * Case 7:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +------------------+
+ * fbno fend
+ *
+ */
+ busyp->bno = fend;
+ } else if (bend == fend) {
+ /*
+ * Case 8:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 9:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +----------------------+
+ * fbno fend
+ */
+
+ busyp->length = fbno - busyp->bno;
+ } else {
+ ASSERT(0);
+ }
+}
+
+
+/*
+ * For a given extent [fbno, flen], make sure we can reuse it safely.
+ */
+void
+xfs_alloc_busy_reuse(
+ struct xfs_trans *tp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t fbno,
+ xfs_extlen_t flen)
+{
+ struct xfs_perag *pag;
+ struct rb_node *rbp;
+
+ ASSERT(flen > 0);
+
+ pag = xfs_perag_get(tp->t_mountp, agno);
+ spin_lock(&pag->pagb_lock);
+ rbp = pag->pagb_tree.rb_node;
+ while (rbp) {
+ struct xfs_busy_extent *busyp =
+ rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ xfs_agblock_t fend = fbno + flen;
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (fend <= bbno) {
+ rbp = rbp->rb_left;
+ continue;
+ } else if (fbno >= bend) {
+ rbp = rbp->rb_right;
+ continue;
+ }
+
+ xfs_alloc_busy_update_extent(pag, busyp, fbno, fbno + flen);
+
+ spin_unlock(&pag->pagb_lock);
+ xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ }
+ spin_unlock(&pag->pagb_lock);
+ xfs_perag_put(pag);
+}
+
+/*
* For a given extent [fbno, flen], search the busy extent list
* to find a subset of the extent that is not busy.
*/
@@ -2886,14 +2907,12 @@ xfs_alloc_busy_clear(
trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
busyp->length);
- ASSERT(xfs_alloc_busy_search(mp, busyp->agno, busyp->bno,
- busyp->length) == 1);
-
list_del_init(&busyp->list);
pag = xfs_perag_get(mp, busyp->agno);
spin_lock(&pag->pagb_lock);
- rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ if (busyp->length)
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
Index: xfs/fs/xfs/xfs_alloc.h
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.h 2011-03-28 16:06:17.822840017 +0200
+++ xfs/fs/xfs/xfs_alloc.h 2011-03-28 16:06:23.013344460 +0200
@@ -145,6 +145,10 @@ xfs_alloc_busy_clear(struct xfs_mount *m
int
xfs_alloc_busy_search(struct xfs_mount *mp, xfs_agnumber_t agno,
xfs_agblock_t bno, xfs_extlen_t len);
+
+void
+xfs_alloc_busy_reuse(struct xfs_trans *tp, xfs_agnumber_t agno,
+ xfs_agblock_t fbno, xfs_extlen_t flen);
#endif /* __KERNEL__ */
/*
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-28 16:06:17.830839717 +0200
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-28 16:06:23.025342125 +0200
@@ -94,8 +94,8 @@ xfs_allocbt_alloc_block(
*stat = 0;
return 0;
}
- if (xfs_alloc_busy_search(cur->bc_mp, cur->bc_private.a.agno, bno, 1))
- xfs_trans_set_sync(cur->bc_tp);
+
+ xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h 2011-03-28 16:06:17.842839071 +0200
+++ xfs/fs/xfs/xfs_ag.h 2011-03-28 16:06:23.037341448 +0200
@@ -187,7 +187,6 @@ struct xfs_busy_extent {
xfs_agnumber_t agno;
xfs_agblock_t bno;
xfs_extlen_t length;
- xlog_tid_t tid; /* transaction that created this */
};
/*
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c 2011-03-28 16:06:17.854837887 +0200
+++ xfs/fs/xfs/xfs_log.c 2011-03-28 16:06:20.390839609 +0200
@@ -3248,13 +3248,6 @@ xfs_log_ticket_get(
return ticket;
}
-xlog_tid_t
-xfs_log_get_trans_ident(
- struct xfs_trans *tp)
-{
- return tp->t_ticket->t_tid;
-}
-
/*
* Allocate and initialise a new log ticket.
*/
Index: xfs/fs/xfs/xfs_log.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log.h 2011-03-28 16:06:17.866839600 +0200
+++ xfs/fs/xfs/xfs_log.h 2011-03-28 16:06:20.394838660 +0200
@@ -189,8 +189,6 @@ void xlog_iodone(struct xfs_buf *);
struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
void xfs_log_ticket_put(struct xlog_ticket *ticket);
-xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
-
void xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
struct xfs_log_vec *log_vector,
xfs_lsn_t *commit_lsn, int flags);
Index: xfs/fs/xfs/xfs_log_priv.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log_priv.h 2011-03-28 16:06:17.878839137 +0200
+++ xfs/fs/xfs/xfs_log_priv.h 2011-03-28 16:06:20.398888649 +0200
@@ -144,6 +144,8 @@ static inline uint xlog_get_client_id(__
#define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */
#define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being
shutdown */
+typedef __uint32_t xlog_tid_t;
+
#ifdef __KERNEL__
/*
Index: xfs/fs/xfs/xfs_types.h
===================================================================
--- xfs.orig/fs/xfs/xfs_types.h 2011-03-28 16:06:17.894838713 +0200
+++ xfs/fs/xfs/xfs_types.h 2011-03-28 16:06:20.402904478 +0200
@@ -73,8 +73,6 @@ typedef __int32_t xfs_tid_t; /* transact
typedef __uint32_t xfs_dablk_t; /* dir/attr block number (in file) */
typedef __uint32_t xfs_dahash_t; /* dir/attr hash value */
-typedef __uint32_t xlog_tid_t; /* transaction ID type */
-
/*
* These types are 64 bits on disk but are either 32 or 64 bits in memory.
* Disk based types:
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 4/5] xfs: allow reusing busy extents where safe
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
` (2 preceding siblings ...)
2011-03-28 21:06 ` [PATCH 3/5] xfs: exact busy extent tracking Christoph Hellwig
@ 2011-03-28 21:06 ` Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 5/5] xfs: update busy extent tracing Christoph Hellwig
2011-03-29 19:04 ` [PATCH 0/5] improved busy extent handling Alex Elder
5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-simplify-user-allocations --]
[-- Type: text/plain, Size: 17598 bytes --]
Allow reusing any busy extent for metadata allocations, and reusing busy
userdata extents for userdata allocations. Most of the complexity is
propagating the userdata information from the XFS_BMAPI_METADATA flag
to xfs_bunmapi into the low-level extent freeing routines. After that
we can just track what type of busy extent we have and treat it accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-28 16:09:32.000000000 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-28 16:14:49.253338527 +0200
@@ -1396,7 +1396,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
- xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1);
+ xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1,
+ args->userdata);
if (args->userdata) {
xfs_buf_t *bp;
@@ -2431,7 +2432,8 @@ int /* error */
xfs_free_extent(
xfs_trans_t *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len) /* length of extent */
+ xfs_extlen_t len, /* length of extent */
+ bool userdata)
{
xfs_alloc_arg_t args;
int error;
@@ -2444,6 +2446,7 @@ xfs_free_extent(
ASSERT(args.agno < args.mp->m_sb.sb_agcount);
args.agbno = XFS_FSB_TO_AGBNO(args.mp, bno);
args.pag = xfs_perag_get(args.mp, args.agno);
+ args.userdata = userdata;
if ((error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING)))
goto error0;
#ifdef DEBUG
@@ -2453,7 +2456,7 @@ xfs_free_extent(
#endif
error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
if (!error)
- xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
+ xfs_alloc_busy_insert(tp, args.agno, args.agbno, len, userdata);
error0:
xfs_perag_put(args.pag);
return error;
@@ -2464,7 +2467,8 @@ xfs_alloc_busy_insert(
struct xfs_trans *tp,
xfs_agnumber_t agno,
xfs_agblock_t bno,
- xfs_extlen_t len)
+ xfs_extlen_t len,
+ bool userdata)
{
struct xfs_busy_extent *new;
struct xfs_busy_extent *busyp;
@@ -2487,6 +2491,7 @@ xfs_alloc_busy_insert(
new->agno = agno;
new->bno = bno;
new->length = len;
+ new->flags = userdata ? XFS_ALLOC_BUSY_USERDATA : 0;
INIT_LIST_HEAD(&new->list);
/* trace before insert to be able to see failed inserts */
@@ -2569,6 +2574,12 @@ xfs_alloc_busy_search(
return match;
}
+enum {
+ XFS_BUSY_REUSE_OK,
+ XFS_BUSY_LOG_FORCE,
+ XFS_BUSY_RESCAN,
+};
+
/*
* The found free extent [fbno, fend] overlaps part or all of the given busy
* extent. If the overlap covers the beginning, the end, or all of the busy
@@ -2580,7 +2591,7 @@ xfs_alloc_busy_search(
* The caller will force the log and re-check the busy list after returning
* from this function.
*/
-STATIC void
+STATIC int
xfs_alloc_busy_update_extent(
struct xfs_perag *pag,
struct xfs_busy_extent *busyp,
@@ -2608,6 +2619,7 @@ xfs_alloc_busy_update_extent(
* Let the caller force out the log to clear the busy extents
* and retry the search.
*/
+ return XFS_BUSY_LOG_FORCE;
} else if (bbno >= fbno && bend <= fend) {
/*
* Case 2:
@@ -2645,6 +2657,7 @@ xfs_alloc_busy_update_extent(
*/
rb_erase(&busyp->rb_node, &pag->pagb_tree);
busyp->length = 0;
+ return XFS_BUSY_RESCAN;
} else if (bbno == fbno) {
/*
* Case 6:
@@ -2680,6 +2693,8 @@ xfs_alloc_busy_update_extent(
} else {
ASSERT(0);
}
+
+ return XFS_BUSY_REUSE_OK;
}
@@ -2691,7 +2706,8 @@ xfs_alloc_busy_reuse(
struct xfs_trans *tp,
xfs_agnumber_t agno,
xfs_agblock_t fbno,
- xfs_extlen_t flen)
+ xfs_extlen_t flen,
+ bool userdata)
{
struct xfs_perag *pag;
struct rb_node *rbp;
@@ -2699,6 +2715,7 @@ xfs_alloc_busy_reuse(
ASSERT(flen > 0);
pag = xfs_perag_get(tp->t_mountp, agno);
+restart:
spin_lock(&pag->pagb_lock);
rbp = pag->pagb_tree.rb_node;
while (rbp) {
@@ -2707,6 +2724,7 @@ xfs_alloc_busy_reuse(
xfs_agblock_t fend = fbno + flen;
xfs_agblock_t bbno = busyp->bno;
xfs_agblock_t bend = bbno + busyp->length;
+ int ret;
if (fend <= bbno) {
rbp = rbp->rb_left;
@@ -2716,10 +2734,21 @@ xfs_alloc_busy_reuse(
continue;
}
- xfs_alloc_busy_update_extent(pag, busyp, fbno, fbno + flen);
-
- spin_unlock(&pag->pagb_lock);
- xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ ret = xfs_alloc_busy_update_extent(pag, busyp,
+ fbno, fbno + flen);
+ if (ret != XFS_BUSY_REUSE_OK || userdata) {
+ spin_unlock(&pag->pagb_lock);
+ if (ret == XFS_BUSY_LOG_FORCE)
+ xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ goto restart;
+ }
+#if 0
+ /*
+ * No more busy extents to search.
+ */
+ if (bbno <= fbno && bend >= fend)
+ break;
+#endif
}
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
@@ -2743,6 +2772,11 @@ xfs_alloc_busy_trim(
ASSERT(flen > 0);
+ if (!args->userdata) {
+ xfs_alloc_busy_reuse(args->tp, args->agno, fbno, flen, false);
+ goto out;
+ }
+
spin_lock(&args->pag->pagb_lock);
rbp = args->pag->pagb_tree.rb_node;
while (rbp && flen >= args->minlen) {
@@ -2883,7 +2917,7 @@ xfs_alloc_busy_trim(
flen = fend - fbno;
}
spin_unlock(&args->pag->pagb_lock);
-
+out:
*rbno = fbno;
*rlen = flen;
return;
Index: xfs/fs/xfs/xfs_alloc.h
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.h 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_alloc.h 2011-03-28 16:10:24.930841761 +0200
@@ -137,7 +137,7 @@ xfs_alloc_longest_free_extent(struct xfs
#ifdef __KERNEL__
void
xfs_alloc_busy_insert(struct xfs_trans *tp, xfs_agnumber_t agno,
- xfs_agblock_t bno, xfs_extlen_t len);
+ xfs_agblock_t bno, xfs_extlen_t len, bool userdata);
void
xfs_alloc_busy_clear(struct xfs_mount *mp, struct xfs_busy_extent *busyp);
@@ -148,7 +148,7 @@ xfs_alloc_busy_search(struct xfs_mount *
void
xfs_alloc_busy_reuse(struct xfs_trans *tp, xfs_agnumber_t agno,
- xfs_agblock_t fbno, xfs_extlen_t flen);
+ xfs_agblock_t fbno, xfs_extlen_t flen, bool userdata);
#endif /* __KERNEL__ */
/*
@@ -224,7 +224,8 @@ int /* error */
xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len); /* length of extent */
+ xfs_extlen_t len,
+ bool userdata);/* length of extent */
int /* error */
xfs_alloc_lookup_le(
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-28 16:10:24.938837964 +0200
@@ -95,7 +95,7 @@ xfs_allocbt_alloc_block(
return 0;
}
- xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1);
+ xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1, false);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
@@ -120,18 +120,8 @@ xfs_allocbt_free_block(
if (error)
return error;
- /*
- * Since blocks move to the free list without the coordination used in
- * xfs_bmap_finish, we can't allow block to be available for
- * reallocation and non-transaction writing (user data) until we know
- * that the transaction that moved it to the free list is permanently
- * on disk. We track the blocks by declaring these blocks as "busy";
- * the busy list is maintained on a per-ag basis and each transaction
- * records which entries should be removed when the iclog commits to
- * disk. If a busy block is allocated, the iclog is pushed up to the
- * LSN that freed the block.
- */
- xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1);
+ xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno,
+ 1, false);
xfs_trans_agbtree_delta(cur->bc_tp, -1);
return 0;
}
Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_ag.h 2011-03-28 16:10:24.938837964 +0200
@@ -187,6 +187,8 @@ struct xfs_busy_extent {
xfs_agnumber_t agno;
xfs_agblock_t bno;
xfs_extlen_t length;
+ unsigned int flags;
+#define XFS_ALLOC_BUSY_USERDATA 0x01 /* freed data extents */
};
/*
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c 2011-03-28 16:06:23.049342208 +0200
+++ xfs/fs/xfs/xfs_bmap.c 2011-03-28 16:10:24.942837745 +0200
@@ -180,22 +180,6 @@ xfs_bmap_btree_to_extents(
int whichfork); /* data or attr fork */
/*
- * Called by xfs_bmapi to update file extent records and the btree
- * after removing space (or undoing a delayed allocation).
- */
-STATIC int /* error */
-xfs_bmap_del_extent(
- xfs_inode_t *ip, /* incore inode pointer */
- xfs_trans_t *tp, /* current trans pointer */
- xfs_extnum_t idx, /* extent number to update/insert */
- xfs_bmap_free_t *flist, /* list of extents to be freed */
- xfs_btree_cur_t *cur, /* if null, not a btree */
- xfs_bmbt_irec_t *new, /* new data to add to file extents */
- int *logflagsp,/* inode logging flags */
- int whichfork, /* data or attr fork */
- int rsvd); /* OK to allocate reserved blocks */
-
-/*
* Remove the entry "free" from the free item list. Prev points to the
* previous entry, unless "free" is the head of the list.
*/
@@ -2811,7 +2795,7 @@ xfs_bmap_btree_to_extents(
cblock = XFS_BUF_TO_BLOCK(cbp);
if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
return error;
- xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, 0);
ip->i_d.di_nblocks--;
xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
xfs_trans_binval(tp, cbp);
@@ -2838,8 +2822,7 @@ xfs_bmap_del_extent(
xfs_btree_cur_t *cur, /* if null, not a btree */
xfs_bmbt_irec_t *del, /* data to remove from extents */
int *logflagsp, /* inode logging flags */
- int whichfork, /* data or attr fork */
- int rsvd) /* OK to allocate reserved blocks */
+ int flags) /* XFS_BMAPI_* flags */
{
xfs_filblks_t da_new; /* new delay-alloc indirect blocks */
xfs_filblks_t da_old; /* old delay-alloc indirect blocks */
@@ -2849,7 +2832,6 @@ xfs_bmap_del_extent(
int do_fx; /* free extent at end of routine */
xfs_bmbt_rec_host_t *ep; /* current extent entry pointer */
int error; /* error return value */
- int flags; /* inode logging flags */
xfs_bmbt_irec_t got; /* current extent entry */
xfs_fileoff_t got_endoff; /* first offset past got */
int i; /* temp state */
@@ -2861,12 +2843,17 @@ xfs_bmap_del_extent(
uint qfield; /* quota field to update */
xfs_filblks_t temp; /* for indirect length calculations */
xfs_filblks_t temp2; /* for indirect length calculations */
- int state = 0;
+ int state, whichfork;
XFS_STATS_INC(xs_del_exlist);
- if (whichfork == XFS_ATTR_FORK)
- state |= BMAP_ATTRFORK;
+ if (flags & XFS_BMAPI_ATTRFORK) {
+ whichfork = XFS_ATTR_FORK;
+ state = BMAP_ATTRFORK;
+ } else {
+ whichfork = XFS_DATA_FORK;
+ state = 0;
+ }
mp = ip->i_mount;
ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -3121,9 +3108,13 @@ xfs_bmap_del_extent(
/*
* If we need to, add to list of extents to delete.
*/
- if (do_fx)
- xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
- mp);
+ if (do_fx) {
+ xfs_bmap_add_free(mp, flist, del->br_startblock,
+ del->br_blockcount,
+ (flags & XFS_BMAPI_METADATA) ? 0 :
+ XFS_BFI_USERDATA);
+ }
+
/*
* Adjust inode # blocks in the file.
*/
@@ -3142,7 +3133,9 @@ xfs_bmap_del_extent(
ASSERT(da_old >= da_new);
if (da_old > da_new) {
xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
- (int64_t)(da_old - da_new), rsvd);
+ (int64_t)(da_old - da_new),
+ !!(flags & XFS_BMAPI_RSVBLOCKS));
+
}
done:
*logflagsp = flags;
@@ -3723,10 +3716,11 @@ error0:
/* ARGSUSED */
void
xfs_bmap_add_free(
+ struct xfs_mount *mp, /* mount point structure */
+ struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- xfs_bmap_free_t *flist, /* list of extents */
- xfs_mount_t *mp) /* mount point structure */
+ unsigned int flags)
{
xfs_bmap_free_item_t *cur; /* current (next) element */
xfs_bmap_free_item_t *new; /* new element */
@@ -3750,6 +3744,7 @@ xfs_bmap_add_free(
new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
new->xbfi_startblock = bno;
new->xbfi_blockcount = (xfs_extlen_t)len;
+ new->xbfi_flags = flags;
for (prev = NULL, cur = flist->xbf_first;
cur != NULL;
prev = cur, cur = cur->xbfi_next) {
@@ -3883,8 +3878,11 @@ xfs_bmap_finish(
efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
for (free = flist->xbf_first; free != NULL; free = next) {
next = free->xbfi_next;
- if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
- free->xbfi_blockcount))) {
+
+ error = xfs_free_extent(ntp, free->xbfi_startblock,
+ free->xbfi_blockcount,
+ !!(free->xbfi_flags & XFS_BFI_USERDATA));
+ if (error) {
/*
* The bmap free list will be cleaned up at a
* higher level. The EFI will be canceled when
@@ -5278,7 +5276,7 @@ xfs_bunmapi(
goto error0;
}
error = xfs_bmap_del_extent(ip, tp, lastx, flist, cur, &del,
- &tmp_logflags, whichfork, rsvd);
+ &tmp_logflags, flags);
logflags |= tmp_logflags;
if (error)
goto error0;
Index: xfs/fs/xfs/xfs_bmap.h
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.h 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_bmap.h 2011-03-28 16:10:24.950887404 +0200
@@ -35,6 +35,8 @@ typedef struct xfs_bmap_free_item
{
xfs_fsblock_t xbfi_startblock;/* starting fs block number */
xfs_extlen_t xbfi_blockcount;/* number of blocks in extent */
+ unsigned int xbfi_flags;
+#define XFS_BFI_USERDATA 0x01 /* userdata extent */
struct xfs_bmap_free_item *xbfi_next; /* link to next entry */
} xfs_bmap_free_item_t;
@@ -188,10 +190,11 @@ xfs_bmap_add_attrfork(
*/
void
xfs_bmap_add_free(
+ struct xfs_mount *mp, /* mount point structure */
+ struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- xfs_bmap_free_t *flist, /* list of extents */
- struct xfs_mount *mp); /* mount point structure */
+ unsigned int flags);
/*
* Routine to clean up the free list data structure when
Index: xfs/fs/xfs/xfs_bmap_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap_btree.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_bmap_btree.c 2011-03-28 16:10:24.950887404 +0200
@@ -598,7 +598,7 @@ xfs_bmbt_free_block(
struct xfs_trans *tp = cur->bc_tp;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
- xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, 0);
ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
Index: xfs/fs/xfs/xfs_fsops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_fsops.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_fsops.c 2011-03-28 16:10:24.954839193 +0200
@@ -344,7 +344,7 @@ xfs_growfs_data_private(
* Free the new space.
*/
error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
- be32_to_cpu(agf->agf_length) - new), new);
+ be32_to_cpu(agf->agf_length) - new), new, false);
if (error) {
goto error0;
}
Index: xfs/fs/xfs/xfs_ialloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_ialloc.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_ialloc.c 2011-03-28 16:10:24.954839193 +0200
@@ -1154,9 +1154,10 @@ xfs_difree(
goto error0;
}
- xfs_bmap_add_free(XFS_AGB_TO_FSB(mp,
- agno, XFS_INO_TO_AGBNO(mp,rec.ir_startino)),
- XFS_IALLOC_BLOCKS(mp), flist, mp);
+ xfs_bmap_add_free(mp, flist,
+ XFS_AGB_TO_FSB(mp, agno,
+ XFS_INO_TO_AGBNO(mp,rec.ir_startino)),
+ XFS_IALLOC_BLOCKS(mp), 0);
} else {
*delete = 0;
Index: xfs/fs/xfs/xfs_ialloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_ialloc_btree.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_ialloc_btree.c 2011-03-28 16:10:24.954839193 +0200
@@ -117,7 +117,7 @@ xfs_inobt_free_block(
int error;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
- error = xfs_free_extent(cur->bc_tp, fsbno, 1);
+ error = xfs_free_extent(cur->bc_tp, fsbno, 1, false);
if (error)
return error;
Index: xfs/fs/xfs/xfs_log_recover.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_recover.c 2011-03-28 16:06:23.000000000 +0200
+++ xfs/fs/xfs/xfs_log_recover.c 2011-03-28 16:10:24.958839336 +0200
@@ -2907,8 +2907,9 @@ xlog_recover_process_efi(
efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) {
- extp = &(efip->efi_format.efi_extents[i]);
- error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
+ extp = &efip->efi_format.efi_extents[i];
+ error = xfs_free_extent(tp, extp->ext_start, extp->ext_len,
+ false);
if (error)
goto abort_error;
xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 5/5] xfs: update busy extent tracing
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
` (3 preceding siblings ...)
2011-03-28 21:06 ` [PATCH 4/5] xfs: allow reusing busy extents where safe Christoph Hellwig
@ 2011-03-28 21:06 ` Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-29 19:04 ` [PATCH 0/5] improved busy extent handling Alex Elder
5 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-28 21:06 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-busy-trace-update --]
[-- Type: text/plain, Size: 6169 bytes --]
Add new tracepoint for the new busy extent handling helpers, and update the
existing ones to use a common class. Also drop the busysearch tracepoint
now that a plain busylist search only happens in a debug assert and the
FITRIM handler.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h 2011-03-28 16:02:55.000000000 +0200
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h 2011-03-28 16:46:51.373340927 +0200
@@ -1151,44 +1151,7 @@ TRACE_EVENT(xfs_bunmap,
);
-#define XFS_BUSY_SYNC \
- { 0, "async" }, \
- { 1, "sync" }
-
-TRACE_EVENT(xfs_alloc_busy,
- TP_PROTO(struct xfs_trans *trans, xfs_agnumber_t agno,
- xfs_agblock_t agbno, xfs_extlen_t len, int sync),
- TP_ARGS(trans, agno, agbno, len, sync),
- TP_STRUCT__entry(
- __field(dev_t, dev)
- __field(struct xfs_trans *, tp)
- __field(int, tid)
- __field(xfs_agnumber_t, agno)
- __field(xfs_agblock_t, agbno)
- __field(xfs_extlen_t, len)
- __field(int, sync)
- ),
- TP_fast_assign(
- __entry->dev = trans->t_mountp->m_super->s_dev;
- __entry->tp = trans;
- __entry->tid = trans->t_ticket->t_tid;
- __entry->agno = agno;
- __entry->agbno = agbno;
- __entry->len = len;
- __entry->sync = sync;
- ),
- TP_printk("dev %d:%d trans 0x%p tid 0x%x agno %u agbno %u len %u %s",
- MAJOR(__entry->dev), MINOR(__entry->dev),
- __entry->tp,
- __entry->tid,
- __entry->agno,
- __entry->agbno,
- __entry->len,
- __print_symbolic(__entry->sync, XFS_BUSY_SYNC))
-
-);
-
-TRACE_EVENT(xfs_alloc_unbusy,
+DECLARE_EVENT_CLASS(xfs_busy_class,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
xfs_agblock_t agbno, xfs_extlen_t len),
TP_ARGS(mp, agno, agbno, len),
@@ -1210,35 +1173,45 @@ TRACE_EVENT(xfs_alloc_unbusy,
__entry->agbno,
__entry->len)
);
+#define DEFINE_BUSY_EVENT(name) \
+DEFINE_EVENT(xfs_busy_class, name, \
+ TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \
+ xfs_agblock_t agbno, xfs_extlen_t len), \
+ TP_ARGS(mp, agno, agbno, len))
+DEFINE_BUSY_EVENT(xfs_alloc_busy);
+DEFINE_BUSY_EVENT(xfs_alloc_busy_enomem);
+DEFINE_BUSY_EVENT(xfs_alloc_busy_force);
+DEFINE_BUSY_EVENT(xfs_alloc_busy_reuse);
+DEFINE_BUSY_EVENT(xfs_alloc_busy_clear);
-#define XFS_BUSY_STATES \
- { 0, "missing" }, \
- { 1, "found" }
-
-TRACE_EVENT(xfs_alloc_busysearch,
+TRACE_EVENT(xfs_alloc_busy_trim,
TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno,
- xfs_agblock_t agbno, xfs_extlen_t len, int found),
- TP_ARGS(mp, agno, agbno, len, found),
+ xfs_agblock_t agbno, xfs_extlen_t len,
+ xfs_agblock_t tbno, xfs_extlen_t tlen),
+ TP_ARGS(mp, agno, agbno, len, tbno, tlen),
TP_STRUCT__entry(
__field(dev_t, dev)
__field(xfs_agnumber_t, agno)
__field(xfs_agblock_t, agbno)
__field(xfs_extlen_t, len)
- __field(int, found)
+ __field(xfs_agblock_t, tbno)
+ __field(xfs_extlen_t, tlen)
),
TP_fast_assign(
__entry->dev = mp->m_super->s_dev;
__entry->agno = agno;
__entry->agbno = agbno;
__entry->len = len;
- __entry->found = found;
+ __entry->tbno = tbno;
+ __entry->tlen = tlen;
),
- TP_printk("dev %d:%d agno %u agbno %u len %u %s",
+ TP_printk("dev %d:%d agno %u agbno %u len %u tbno %u tlen %u",
MAJOR(__entry->dev), MINOR(__entry->dev),
__entry->agno,
__entry->agbno,
__entry->len,
- __print_symbolic(__entry->found, XFS_BUSY_STATES))
+ __entry->tbno,
+ __entry->tlen)
);
TRACE_EVENT(xfs_trans_commit_lsn,
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-28 16:14:49.000000000 +0200
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-28 16:46:51.397342798 +0200
@@ -2483,7 +2483,7 @@ xfs_alloc_busy_insert(
* block, make this a synchronous transaction to insure that
* the block is not reused before this transaction commits.
*/
- trace_xfs_alloc_busy(tp, agno, bno, len, 1);
+ trace_xfs_alloc_busy_enomem(tp->t_mountp, agno, bno, len);
xfs_trans_set_sync(tp);
return;
}
@@ -2495,7 +2495,7 @@ xfs_alloc_busy_insert(
INIT_LIST_HEAD(&new->list);
/* trace before insert to be able to see failed inserts */
- trace_xfs_alloc_busy(tp, agno, bno, len, 0);
+ trace_xfs_alloc_busy(tp->t_mountp, agno, bno, len);
pag = xfs_perag_get(tp->t_mountp, new->agno);
spin_lock(&pag->pagb_lock);
@@ -2569,7 +2569,6 @@ xfs_alloc_busy_search(
}
}
spin_unlock(&pag->pagb_lock);
- trace_xfs_alloc_busysearch(mp, agno, bno, len, !!match);
xfs_perag_put(pag);
return match;
}
@@ -2738,10 +2737,16 @@ restart:
fbno, fbno + flen);
if (ret != XFS_BUSY_REUSE_OK || userdata) {
spin_unlock(&pag->pagb_lock);
- if (ret == XFS_BUSY_LOG_FORCE)
+ if (ret == XFS_BUSY_LOG_FORCE) {
+ trace_xfs_alloc_busy_force(tp->t_mountp, agno,
+ fbno, flen);
xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ }
goto restart;
}
+
+ trace_xfs_alloc_busy_reuse(tp->t_mountp, agno, fbno, flen);
+
#if 0
/*
* No more busy extents to search.
@@ -2918,6 +2923,8 @@ xfs_alloc_busy_trim(
}
spin_unlock(&args->pag->pagb_lock);
out:
+ if (fbno != bno || flen != len)
+ trace_xfs_alloc_busy_trim(args->mp, args->agno, bno, len, fbno, flen);
*rbno = fbno;
*rlen = flen;
return;
@@ -2927,6 +2934,7 @@ fail:
* re-check if the trimmed extent satisfies the minlen requirement.
*/
spin_unlock(&args->pag->pagb_lock);
+ trace_xfs_alloc_busy_trim(args->mp, args->agno, bno, len, fbno, 0);
*rbno = fbno;
*rlen = 0;
}
@@ -2938,15 +2946,15 @@ xfs_alloc_busy_clear(
{
struct xfs_perag *pag;
- trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
- busyp->length);
-
list_del_init(&busyp->list);
pag = xfs_perag_get(mp, busyp->agno);
spin_lock(&pag->pagb_lock);
- if (busyp->length)
+ if (busyp->length) {
+ trace_xfs_alloc_busy_clear(mp, busyp->agno, busyp->bno,
+ busyp->length);
rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ }
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/5] xfs: optimize AGFL refills
2011-03-28 21:06 ` [PATCH 1/5] xfs: optimize AGFL refills Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> While we need to make sure we do not reuse busy extents, there is no need
> to force out busy extents when moving them between the AGFL and the
> freespace btree as we still take care of that when doing the real allocation.
>
> To avoid the log force when just moving extents from the different free
> space tracking structures, move the busy search out of
> xfs_alloc_get_freelist into the callers that need it, and move the busy
> list insert from xfs_free_ag_extent which is used both by AGFL refills
> and real allocation to xfs_free_extent, which is only used by the latter.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/5] xfs: do not immediately reuse busy extent ranges
2011-03-28 21:06 ` [PATCH 2/5] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent. This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
>
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list. If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
>
> Based on two earlier patches from Dave Chinner.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/5] xfs: exact busy extent tracking
2011-03-28 21:06 ` [PATCH 3/5] xfs: exact busy extent tracking Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> Update the extent tree in case we have to reuse a busy extent, so that it
> always is kept uptodate. This is done by replacing the busy list searches
> with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
> in case of a reuse. Also replace setting transactions to sync with forcing
> the log out in case we found a busy extent to reuse. This makes the code a
> lot more simple, and is required for discard support later on. While it
> will cause performance regressios with just this patch applied, the impact
> is more than mitigated by the next patch in the series.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good. xfs_alloc_busy_update_extent() is a better name
than xfs_alloc_buisy_try_reuse() was.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] xfs: allow reusing busy extents where safe
2011-03-28 21:06 ` [PATCH 4/5] xfs: allow reusing busy extents where safe Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
2011-03-31 8:30 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> Allow reusing any busy extent for metadata allocations, and reusing busy
> userdata extents for userdata allocations. Most of the complexity is
> propagating the userdata information from the XFS_BMAPI_METADATA flag
> to xfs_bunmapi into the low-level extent freeing routines. After that
> we can just track what type of busy extent we have and treat it accordingly.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
The use of an enum value returned from
xfs_alloc_busy_update_extent() is a good improvement.
I'll issue the caveat here that I did not look through
it this time as carefully as the first time. My main
concern was about the validity of reusing busy user data
extents for user data, and as before I'll say I accept
that it's OK, but I haven't worked through in my own
mind that it is indeed safe. If I find the time to do
it I'll look this one over again to for reassurance...
But aside from that, it looks good to me.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 5/5] xfs: update busy extent tracing
2011-03-28 21:06 ` [PATCH 5/5] xfs: update busy extent tracing Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
0 siblings, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> Add new tracepoint for the new busy extent handling helpers, and update the
> existing ones to use a common class. Also drop the busysearch tracepoint
> now that a plain busylist search only happens in a debug assert and the
> FITRIM handler.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] improved busy extent handling
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
` (4 preceding siblings ...)
2011-03-28 21:06 ` [PATCH 5/5] xfs: update busy extent tracing Christoph Hellwig
@ 2011-03-29 19:04 ` Alex Elder
2011-03-30 10:14 ` Christoph Hellwig
5 siblings, 1 reply; 14+ messages in thread
From: Alex Elder @ 2011-03-29 19:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> This series optimizes how XFS deals with busy extents. It starts to
> track them exactly, and allows reuses where possible (metadata to metadata)
> or else tries to avoid busy extents during allocations. This means
> we don't have a single log force due to busy extents during either
> xfstests, compilebench or postmark on my testsystem, which can easily
> be tracked using the new tracepoints added in the last patch.
>
> This is a repost of the previous series and should address all review
> comments. The discard support, which relies on the exact busy extent
> tracking has been dropped temporarily until I can fix up some issues
> that I found during testing.
I've reviewed the series and it looks good to me.
Unless someone else has comments that deserve some
action I can take these in as-is. I'll wait for
a bit though, to let the 2.6.39 merge window settle.
I'm still curious what this (clipping busy blocks
from consideration when allocating) will do to
allocation patterns. Probably not much except in
somewhat extreme conditions (when they already
won't likely be that good). The "goodness" of
allocation patterns are a bit hard to characterize
anyway.
-Alex
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/5] improved busy extent handling
2011-03-29 19:04 ` [PATCH 0/5] improved busy extent handling Alex Elder
@ 2011-03-30 10:14 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-30 10:14 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Tue, Mar 29, 2011 at 02:04:35PM -0500, Alex Elder wrote:
> I've reviewed the series and it looks good to me.
> Unless someone else has comments that deserve some
> action I can take these in as-is. I'll wait for
> a bit though, to let the 2.6.39 merge window settle.
Yes, no need to rush - I just wanted to get it out for more
review ASAP.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/5] xfs: allow reusing busy extents where safe
2011-03-29 19:04 ` Alex Elder
@ 2011-03-31 8:30 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-03-31 8:30 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Tue, Mar 29, 2011 at 02:04:28PM -0500, Alex Elder wrote:
> On Mon, 2011-03-28 at 17:06 -0400, Christoph Hellwig wrote:
> > Allow reusing any busy extent for metadata allocations, and reusing busy
> > userdata extents for userdata allocations. Most of the complexity is
> > propagating the userdata information from the XFS_BMAPI_METADATA flag
> > to xfs_bunmapi into the low-level extent freeing routines. After that
> > we can just track what type of busy extent we have and treat it accordingly.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> The use of an enum value returned from
> xfs_alloc_busy_update_extent() is a good improvement.
>
> I'll issue the caveat here that I did not look through
> it this time as carefully as the first time. My main
> concern was about the validity of reusing busy user data
> extents for user data, and as before I'll say I accept
> that it's OK, but I haven't worked through in my own
> mind that it is indeed safe. If I find the time to do
> it I'll look this one over again to for reassurance...
This version doesn't actually allow userdata reallocations anymore,
I just forgot to update the patch description.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-03-31 8:27 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-28 21:06 [PATCH 0/5] improved busy extent handling Christoph Hellwig
2011-03-28 21:06 ` [PATCH 1/5] xfs: optimize AGFL refills Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 2/5] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 3/5] xfs: exact busy extent tracking Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-28 21:06 ` [PATCH 4/5] xfs: allow reusing busy extents where safe Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-31 8:30 ` Christoph Hellwig
2011-03-28 21:06 ` [PATCH 5/5] xfs: update busy extent tracing Christoph Hellwig
2011-03-29 19:04 ` Alex Elder
2011-03-29 19:04 ` [PATCH 0/5] improved busy extent handling Alex Elder
2011-03-30 10:14 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox