* [PATCH 0/3] avoid busy extents during user data allocations
@ 2011-03-04 12:59 Christoph Hellwig
2011-03-04 12:59 ` [PATCH 1/3] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-04 12:59 UTC (permalink / raw)
To: xfs
This patchset adds support to trim down extents
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] xfs: clean up the xfs_alloc_compute_aligned calling convention
2011-03-04 12:59 [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
@ 2011-03-04 12:59 ` Christoph Hellwig
2011-03-04 12:59 ` [PATCH 3/3] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-03-04 13:19 ` [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-04 12:59 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-cleanup-xfs_alloc_compute_aligned --]
[-- Type: text/plain, Size: 4210 bytes --]
Pass a xfs_alloc_arg structure to xfs_alloc_compute_aligned and derive
the alignment and minlen paramters from it. This cleans up the existing
callers, and we'll need even more information from the xfs_alloc_arg
in subsequent patches. Based on a patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-01-03 13:06:52.386254734 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-01-03 13:07:19.545002883 +0100
@@ -147,10 +147,9 @@ xfs_alloc_get_rec(
*/
STATIC void
xfs_alloc_compute_aligned(
+ xfs_alloc_arg_t *args, /* allocation argument structure */
xfs_agblock_t foundbno, /* starting block in found extent */
xfs_extlen_t foundlen, /* length in found extent */
- xfs_extlen_t alignment, /* alignment for allocation */
- xfs_extlen_t minlen, /* minimum length for allocation */
xfs_agblock_t *resbno, /* result block number */
xfs_extlen_t *reslen) /* result length */
{
@@ -158,8 +157,8 @@ xfs_alloc_compute_aligned(
xfs_extlen_t diff;
xfs_extlen_t len;
- if (alignment > 1 && foundlen >= minlen) {
- bno = roundup(foundbno, alignment);
+ if (args->alignment > 1 && foundlen >= args->minlen) {
+ bno = roundup(foundbno, args->alignment);
diff = bno - foundbno;
len = diff >= foundlen ? 0 : foundlen - diff;
} else {
@@ -693,8 +692,7 @@ xfs_alloc_find_best_extent(
if (error)
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(*sbno, *slen, args->alignment,
- args->minlen, &bno, slena);
+ xfs_alloc_compute_aligned(args, *sbno, *slen, &bno, slena);
/*
* The good extent is closer than this one.
@@ -866,8 +864,8 @@ xfs_alloc_ag_vextent_near(
if ((error = xfs_alloc_get_rec(cnt_cur, <bno, <len, &i)))
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
- args->minlen, <bnoa, <lena);
+ xfs_alloc_compute_aligned(args, ltbno, ltlen,
+ <bnoa, <lena);
if (ltlena < args->minlen)
continue;
args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
@@ -987,8 +985,8 @@ xfs_alloc_ag_vextent_near(
if ((error = xfs_alloc_get_rec(bno_cur_lt, <bno, <len, &i)))
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(ltbno, ltlen, args->alignment,
- args->minlen, <bnoa, <lena);
+ xfs_alloc_compute_aligned(args, ltbno, ltlen,
+ <bnoa, <lena);
if (ltlena >= args->minlen)
break;
if ((error = xfs_btree_decrement(bno_cur_lt, 0, &i)))
@@ -1003,8 +1001,8 @@ xfs_alloc_ag_vextent_near(
if ((error = xfs_alloc_get_rec(bno_cur_gt, >bno, >len, &i)))
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(gtbno, gtlen, args->alignment,
- args->minlen, >bnoa, >lena);
+ xfs_alloc_compute_aligned(args, gtbno, gtlen,
+ >bnoa, >lena);
if (gtlena >= args->minlen)
break;
if ((error = xfs_btree_increment(bno_cur_gt, 0, &i)))
@@ -1183,8 +1181,7 @@ xfs_alloc_ag_vextent_size(
* once aligned; if not, we search left for something better.
* This can't happen in the second case above.
*/
- xfs_alloc_compute_aligned(fbno, flen, args->alignment, args->minlen,
- &rbno, &rlen);
+ xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
(rlen <= flen && rbno + rlen <= fbno + flen), error0);
@@ -1209,8 +1206,8 @@ xfs_alloc_ag_vextent_size(
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
if (flen < bestrlen)
break;
- xfs_alloc_compute_aligned(fbno, flen, args->alignment,
- args->minlen, &rbno, &rlen);
+ xfs_alloc_compute_aligned(args, fbno, flen,
+ &rbno, &rlen);
rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
(rlen <= flen && rbno + rlen <= fbno + flen),
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] xfs: do not immediately reuse busy extent ranges
2011-03-04 12:59 [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
2011-03-04 12:59 ` [PATCH 1/3] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
@ 2011-03-04 12:59 ` Christoph Hellwig
2011-03-07 23:01 ` Alex Elder
2011-03-04 13:19 ` [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-04 12:59 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-skip-busy-extents --]
[-- Type: text/plain, Size: 21202 bytes --]
Every time we reallocate a busy extent, we cause a synchronous log force
to occur to ensure the freeing transaction is on disk before we continue
and use the newly allocated extent. This is extremely sub-optimal as we
have to mark every transaction with blocks that get reused as synchronous.
Instead of searching the busy extent list after deciding on the extent to
allocate, check each candidate extent during the allocation decisions as
to whether they are in the busy list. If they are in the busy list, we
trim the busy range out of the extent we have found and determine if that
trimmed range is still OK for allocation. In many cases, this check can
be incorporated into the allocation extent alignment code which already
does trimming of the found extent before determining if it is a valid
candidate for allocation.
[hch: merged two earlier patches from Dave and fixed various bugs]
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-02 12:18:01.599040095 -0500
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-02 12:19:10.599027233 -0500
@@ -41,19 +41,13 @@
#define XFSA_FIXUP_BNO_OK 1
#define XFSA_FIXUP_CNT_OK 2
-/*
- * Prototypes for per-ag allocation routines
- */
-
STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
- xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
-
-/*
- * Internal functions.
- */
+ xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+STATIC void xfs_alloc_busy_trim(struct xfs_alloc_arg *,
+ xfs_agblock_t, xfs_extlen_t, xfs_agblock_t *, xfs_extlen_t *);
/*
* Lookup the record equal to [bno, len] in the btree given by cur.
@@ -154,19 +148,21 @@ xfs_alloc_compute_aligned(
xfs_extlen_t *reslen) /* result length */
{
xfs_agblock_t bno;
- xfs_extlen_t diff;
xfs_extlen_t len;
- if (args->alignment > 1 && foundlen >= args->minlen) {
- bno = roundup(foundbno, args->alignment);
- diff = bno - foundbno;
- len = diff >= foundlen ? 0 : foundlen - diff;
+ /* Trim busy sections out of found extent */
+ xfs_alloc_busy_trim(args, foundbno, foundlen, &bno, &len);
+
+ if (args->alignment > 1 && len >= args->minlen) {
+ xfs_agblock_t aligned_bno = roundup(bno, args->alignment);
+ xfs_extlen_t diff = aligned_bno - bno;
+
+ *resbno = aligned_bno;
+ *reslen = diff >= len ? 0 : len - diff;
} else {
- bno = foundbno;
- len = foundlen;
+ *resbno = bno;
+ *reslen = len;
}
- *resbno = bno;
- *reslen = len;
}
/*
@@ -541,16 +537,8 @@ xfs_alloc_ag_vextent(
if (error)
return error;
- /*
- * Search the busylist for these blocks and mark the
- * transaction as synchronous if blocks are found. This
- * avoids the need to block due to a synchronous log
- * force to ensure correct ordering as the synchronous
- * transaction will guarantee that for us.
- */
- if (xfs_alloc_busy_search(args->mp, args->agno,
- args->agbno, args->len))
- xfs_trans_set_sync(args->tp);
+ ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
+ args->agbno, args->len));
}
if (!args->isfl) {
@@ -577,14 +565,14 @@ xfs_alloc_ag_vextent_exact(
{
xfs_btree_cur_t *bno_cur;/* by block-number btree cursor */
xfs_btree_cur_t *cnt_cur;/* by count btree cursor */
- xfs_agblock_t end; /* end of allocated extent */
int error;
xfs_agblock_t fbno; /* start block of found extent */
- xfs_agblock_t fend; /* end block of found extent */
xfs_extlen_t flen; /* length of found extent */
+ xfs_agblock_t tbno; /* start block of trimmed extent */
+ xfs_extlen_t tlen; /* length of trimmed extent */
+ xfs_agblock_t tend; /* end block of trimmed extent */
+ xfs_agblock_t end; /* end of allocated extent */
int i; /* success/failure of operation */
- xfs_agblock_t maxend; /* end of maximal extent */
- xfs_agblock_t minend; /* end of minimal extent */
xfs_extlen_t rlen; /* length of returned extent */
ASSERT(args->alignment == 1);
@@ -614,14 +602,22 @@ xfs_alloc_ag_vextent_exact(
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
ASSERT(fbno <= args->agbno);
- minend = args->agbno + args->minlen;
- maxend = args->agbno + args->maxlen;
- fend = fbno + flen;
/*
- * Give up if the freespace isn't long enough for the minimum request.
+ * Check for overlapping busy extents.
+ */
+ xfs_alloc_busy_trim(args, fbno, flen, &tbno, &tlen);
+
+ /*
+ * Give up if the start of the extent is busy, or the freespace isn't
+ * long enough for the minimum request.
*/
- if (fend < minend)
+ if (tbno > args->agbno)
+ goto not_found;
+ if (tlen < args->minlen)
+ goto not_found;
+ tend = tbno + tlen;
+ if (tend < args->agbno + args->minlen)
goto not_found;
/*
@@ -630,14 +626,14 @@ xfs_alloc_ag_vextent_exact(
*
* Fix the length according to mod and prod if given.
*/
- end = XFS_AGBLOCK_MIN(fend, maxend);
+ end = XFS_AGBLOCK_MIN(tend, args->agbno + args->maxlen);
args->len = end - args->agbno;
xfs_alloc_fix_len(args);
if (!xfs_alloc_fix_minleft(args))
goto not_found;
rlen = args->len;
- ASSERT(args->agbno + rlen <= fend);
+ ASSERT(args->agbno + rlen <= tend);
end = args->agbno + rlen;
/*
@@ -686,11 +682,11 @@ xfs_alloc_find_best_extent(
struct xfs_btree_cur **scur, /* searching cursor */
xfs_agblock_t gdiff, /* difference for search comparison */
xfs_agblock_t *sbno, /* extent found by search */
- xfs_extlen_t *slen,
- xfs_extlen_t *slena, /* aligned length */
+ xfs_extlen_t *slen, /* extent length */
+ xfs_agblock_t *sbnoa, /* aligned extent found by search */
+ xfs_extlen_t *slena, /* aligned extent length */
int dir) /* 0 = search right, 1 = search left */
{
- xfs_agblock_t bno;
xfs_agblock_t new;
xfs_agblock_t sdiff;
int error;
@@ -708,16 +704,16 @@ xfs_alloc_find_best_extent(
if (error)
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(args, *sbno, *slen, &bno, slena);
+ xfs_alloc_compute_aligned(args, *sbno, *slen, sbnoa, slena);
/*
* The good extent is closer than this one.
*/
if (!dir) {
- if (bno >= args->agbno + gdiff)
+ if (*sbnoa >= args->agbno + gdiff)
goto out_use_good;
} else {
- if (bno <= args->agbno - gdiff)
+ if (*sbnoa <= args->agbno - gdiff)
goto out_use_good;
}
@@ -729,8 +725,8 @@ xfs_alloc_find_best_extent(
xfs_alloc_fix_len(args);
sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, *sbno,
- *slen, &new);
+ args->alignment, *sbnoa,
+ *slena, &new);
/*
* Choose closer size and invalidate other cursor.
@@ -780,7 +776,7 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t gtbnoa; /* aligned ... */
xfs_extlen_t gtdiff; /* difference to right side entry */
xfs_extlen_t gtlen; /* length of right side entry */
- xfs_extlen_t gtlena = 0; /* aligned ... */
+ xfs_extlen_t gtlena; /* aligned ... */
xfs_agblock_t gtnew; /* useful start bno of right side */
int error; /* error code */
int i; /* result code, temporary */
@@ -789,9 +785,10 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t ltbnoa; /* aligned ... */
xfs_extlen_t ltdiff; /* difference to left side entry */
xfs_extlen_t ltlen; /* length of left side entry */
- xfs_extlen_t ltlena = 0; /* aligned ... */
+ xfs_extlen_t ltlena; /* aligned ... */
xfs_agblock_t ltnew; /* useful start bno of left side */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
#if defined(DEBUG) && defined(__KERNEL__)
/*
* Randomly don't execute the first algorithm.
@@ -800,13 +797,20 @@ xfs_alloc_ag_vextent_near(
dofirst = random32() & 1;
#endif
+
+restart:
+ bno_cur_lt = NULL;
+ bno_cur_gt = NULL;
+ ltlen = 0;
+ gtlena = 0;
+ ltlena = 0;
+
/*
* Get a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
- ltlen = 0;
- bno_cur_lt = bno_cur_gt = NULL;
+
/*
* See if there are any free extents as big as maxlen.
*/
@@ -822,11 +826,13 @@ xfs_alloc_ag_vextent_near(
goto error0;
if (i == 0 || ltlen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_near_noentry(args);
return 0;
}
ASSERT(i == 1);
}
args->wasfromfl = 0;
+
/*
* First algorithm.
* If the requested extent is large wrt the freespaces available
@@ -890,7 +896,7 @@ xfs_alloc_ag_vextent_near(
if (args->len < blen)
continue;
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
if (ltnew != NULLAGBLOCK &&
(args->len > blen || ltdiff < bdiff)) {
bdiff = ltdiff;
@@ -1042,11 +1048,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
xfs_alloc_fix_len(args);
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_lt, &bno_cur_gt,
- ltdiff, >bno, >len, >lena,
+ ltdiff, >bno, >len,
+ >bnoa, >lena,
0 /* search right */);
} else {
ASSERT(gtlena >= args->minlen);
@@ -1057,11 +1064,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
xfs_alloc_fix_len(args);
gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, gtbno, gtlen, >new);
+ args->alignment, gtbnoa, gtlena, >new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_gt, &bno_cur_lt,
- gtdiff, <bno, <len, <lena,
+ gtdiff, <bno, <len,
+ <bnoa, <lena,
1 /* search left */);
}
@@ -1073,6 +1081,12 @@ xfs_alloc_ag_vextent_near(
* If we couldn't get anything, give up.
*/
if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+ if (!forced++) {
+ trace_xfs_alloc_near_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+
trace_xfs_alloc_size_neither(args);
args->agbno = NULLAGBLOCK;
return 0;
@@ -1107,12 +1121,13 @@ xfs_alloc_ag_vextent_near(
return 0;
}
rlen = args->len;
- (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, ltbno,
- ltlen, <new);
+ (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
+ ltbnoa, ltlena, <new);
ASSERT(ltnew >= ltbno);
- ASSERT(ltnew + rlen <= ltbno + ltlen);
+ ASSERT(ltnew + rlen <= ltbnoa + ltlena);
ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
args->agbno = ltnew;
+
if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
ltnew, rlen, XFSA_FIXUP_BNO_OK)))
goto error0;
@@ -1155,26 +1170,35 @@ xfs_alloc_ag_vextent_size(
int i; /* temp status variable */
xfs_agblock_t rbno; /* returned block number */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
+restart:
/*
* Allocate and initialize a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
bno_cur = NULL;
+
/*
* Look for an entry >= maxlen+alignment-1 blocks.
*/
if ((error = xfs_alloc_lookup_ge(cnt_cur, 0,
args->maxlen + args->alignment - 1, &i)))
goto error0;
+
/*
- * If none, then pick up the last entry in the tree unless the
- * tree is empty.
- */
- if (!i) {
- if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &fbno,
- &flen, &i)))
+ * If none or we have busy extents that we cannot allocate from, then
+ * we have to settle for a smaller extent. In the case that there are
+ * no large extents, this will return the last entry in the tree unless
+ * the tree is empty. In the case that there are only busy large
+ * extents, this will return the largest small extent unless there
+ * are no smaller extents available.
+ */
+ if (!i || forced > 1) {
+ error = xfs_alloc_ag_vextent_small(args, cnt_cur,
+ &fbno, &flen, &i);
+ if (error)
goto error0;
if (i == 0 || flen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
@@ -1182,22 +1206,56 @@ xfs_alloc_ag_vextent_size(
return 0;
}
ASSERT(i == 1);
- }
- /*
- * There's a freespace as big as maxlen+alignment-1, get it.
- */
- else {
- if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i)))
- goto error0;
- XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- }
+ xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
+ } else {
+ /*
+ * Search for a non-busy extent that is large enough.
+ * If we are at low space, don't check, or if we fall of
+ * the end of the btree, turn off the busy check and
+ * restart.
+ */
+ for (;;) {
+ error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
+ if (error)
+ goto error0;
+ XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+
+ xfs_alloc_compute_aligned(args, fbno, flen,
+ &rbno, &rlen);
+
+ if (rlen >= args->maxlen)
+ break;
+
+ error = xfs_btree_increment(cnt_cur, 0, &i);
+ if (error)
+ goto error0;
+ if (i == 0) {
+ /*
+ * Our only valid extents must have been busy.
+ * Make it unbusy by forcing the log out and
+ * retrying. If we've been here before, forcing
+ * the log isn't making the extents available,
+ * which means they have probably been freed in
+ * this transaction. In that case, we have to
+ * give up on them and we'll attempt a minlen
+ * allocation the next time around.
+ */
+ xfs_btree_del_cursor(cnt_cur,
+ XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ if (!forced++)
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ }
+ }
+
/*
* In the first case above, we got the last entry in the
* by-size btree. Now we check to see if the space hits maxlen
* once aligned; if not, we search left for something better.
* This can't happen in the second case above.
*/
- xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
(rlen <= flen && rbno + rlen <= fbno + flen), error0);
@@ -1251,13 +1309,19 @@ xfs_alloc_ag_vextent_size(
* Fix up the length.
*/
args->len = rlen;
- xfs_alloc_fix_len(args);
- if (rlen < args->minlen || !xfs_alloc_fix_minleft(args)) {
- xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
- trace_xfs_alloc_size_nominleft(args);
- args->agbno = NULLAGBLOCK;
- return 0;
+ if (rlen < args->minlen) {
+ if (!forced++) {
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ goto out_nominleft;
}
+ xfs_alloc_fix_len(args);
+
+ if (!xfs_alloc_fix_minleft(args))
+ goto out_nominleft;
rlen = args->len;
XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0);
/*
@@ -1287,6 +1351,12 @@ error0:
if (bno_cur)
xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
return error;
+
+out_nominleft:
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_nominleft(args);
+ args->agbno = NULLAGBLOCK;
+ return 0;
}
/*
@@ -2657,6 +2727,177 @@ xfs_alloc_busy_search(
return match;
}
+/*
+ * For a given extent [fbno, flen], search the busy extent list
+ * to find a subset of the extent that is not busy.
+ */
+STATIC void
+xfs_alloc_busy_trim(
+ struct xfs_alloc_arg *args,
+ xfs_agblock_t fbno,
+ xfs_extlen_t flen,
+ xfs_agblock_t *rbno,
+ xfs_extlen_t *rlen)
+{
+ struct rb_node *rbp;
+
+ ASSERT(flen > 0);
+
+ spin_lock(&args->pag->pagb_lock);
+ rbp = args->pag->pagb_tree.rb_node;
+ while (rbp && flen >= args->minlen) {
+ struct xfs_busy_extent *busyp =
+ rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ xfs_agblock_t fend = fbno + flen;
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (fbno + flen <= bbno) {
+ rbp = rbp->rb_left;
+ continue;
+ } else if (fbno >= bend) {
+ rbp = rbp->rb_right;
+ continue;
+ }
+
+ if (bbno <= fbno) {
+ /* start overlap */
+ ASSERT(bend > fbno);
+ ASSERT(bend <= fend);
+
+ /*
+ * Case 1:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * bno end
+ *
+ * Case 2:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * bno end
+ *
+ * Case 3:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------+
+ * bno end
+ *
+ * No unbusy region in extent, return failure.
+ */
+ if (fend <= bend)
+ goto fail;
+
+ /*
+ * Case 4:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +----------------------+
+ * bno end
+ *
+ * Case 5:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * bno end
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * bno end
+ */
+ fbno = bend;
+ } else if (bend >= fend) {
+ /* end overlap */
+
+ /*
+ * Case 6:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +------------------+
+ * bno end
+ *
+ * Case 7:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * bno end
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * bno end
+ */
+ fend = bbno;
+ } else {
+ /* middle overlap */
+
+ /*
+ * Case 9:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------------------------+
+ * bno end
+ *
+ * Can be trimmed to:
+ * +-------+ OR +-------+
+ * bno end bno end
+ *
+ * We prefer the lower bno extent because the next
+ * allocation for this inode will use "end" as the
+ * target for first block. If the busy segment has
+ * cleared, this will get a contiguous allocation next
+ * time around; if thebusy segment has not cleared,
+ * it will get an allocation at bend, which is a forward
+ * allocation.
+ *
+ * If we choose segment at bend, and this remains the
+ * best extent for the next allocation (e.g. NEAR_BNO
+ * allocation) we'll next allocate at bno, which will
+ * give us backwards allocation. We already know that
+ * backwards allocation direction causes significant
+ * fragmentation of directories and degradataion of
+ * directory performance.
+ *
+ * Always chose the option that produces forward
+ * allocation patterns so that sequential reads and
+ * writes only ever seek in one direction. Only choose
+ * the higher bno extent if the remainin unused extent
+ * length is much larger than the current allocation
+ * request, promising us a contiguous allocation in
+ * the following free space.
+ */
+
+ if (bbno - fbno >= args->maxlen) {
+ /* left candidate fits perfect */
+ fend = bbno;
+ } else if (fend - bend >= args->maxlen * 4) {
+ /* right candidate has enough free space */
+ fbno = bend;
+ } else if (bbno - fbno >= args->minlen) {
+ /* left candidate fits minimum requirement */
+ fend = bbno;
+ } else {
+ goto fail;
+ }
+ }
+
+ flen = fend - fbno;
+ }
+ spin_unlock(&args->pag->pagb_lock);
+
+ *rbno = fbno;
+ *rlen = flen;
+ return;
+fail:
+ /*
+ * Return a zero extent length as failure indications. All callers
+ * re-check if the trimmed extent satisfies the minlen requirement.
+ */
+ spin_unlock(&args->pag->pagb_lock);
+ *rbno = fbno;
+ *rlen = 0;
+}
+
void
xfs_alloc_busy_clear(
struct xfs_mount *mp,
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h 2011-03-02 12:17:26.235027219 -0500
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h 2011-03-02 12:18:02.011028461 -0500
@@ -1433,11 +1433,14 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first)
DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_size_neither);
DEFINE_ALLOC_EVENT(xfs_alloc_size_noentry);
DEFINE_ALLOC_EVENT(xfs_alloc_size_nominleft);
DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
DEFINE_ALLOC_EVENT(xfs_alloc_size_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_size_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_small_freelist);
DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 0/3] avoid busy extents during user data allocations
2011-03-04 12:59 [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
2011-03-04 12:59 ` [PATCH 1/3] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
2011-03-04 12:59 ` [PATCH 3/3] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-04 13:19 ` Christoph Hellwig
2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-04 13:19 UTC (permalink / raw)
To: xfs
On Fri, Mar 04, 2011 at 07:59:53AM -0500, Christoph Hellwig wrote:
> This patchset adds support to trim down extents
Sorry, sent this out before finishing up the introduction.
This patchset adds support for trimming down allocations of user data
to avoid busy extents. I'm actually not quite sure it's overly useful
in this form, as we're much better off allowing free reallocation
between data extents, and only avoid busy extents coming from freed
metadata. Neverless I'd like to get a review of the new search
algorithm in patch 3, especially for the nice comments explaining it
all, based on a mail from Dave.
Patches 1 and 2 on the other hand are simple cleanups which I think
should go into the tree ASAP.
The other patches from the previous submitting are back to the drawing
board - implementing Dave's suggestion of skipping busy extents for
metadata to user data reallocation promises to give a lot of speedups,
but making it work with the requirement to track freed extents for
discard purposes isn't quite trivial as we might have to remove extents
from the busy list during reallocations, which requires additional
infrastructure to lock the list of busy extents in the transaction / cil
context which isn't there yet, and additional exclusion of allocations
from ongoing discards.
>
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
---end quoted text---
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] xfs: do not immediately reuse busy extent ranges
2011-03-04 12:59 ` [PATCH 3/3] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-07 23:01 ` Alex Elder
2011-03-09 11:11 ` Christoph Hellwig
0 siblings, 1 reply; 6+ messages in thread
From: Alex Elder @ 2011-03-07 23:01 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, 2011-03-04 at 07:59 -0500, Christoph Hellwig wrote:
> plain text document attachment (xfs-skip-busy-extents)
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent. This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
>
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list. If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
This time I just scanned most of the change, since it appears
it's almost the same except for the (renamed) xfs_alloc_busy_trim()
function. It looks correct now, but I have a few things for you
to consider. I'll wait for your response in case you want to
change anything. After that I'll pull in the three patches
(probably tomorrow).
> [hch: merged two earlier patches from Dave and fixed various bugs]
>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: xfs/fs/xfs/xfs_alloc.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-02 12:18:01.599040095 -0500
> +++ xfs/fs/xfs/xfs_alloc.c 2011-03-02 12:19:10.599027233 -0500
. . .
> @@ -2657,6 +2727,177 @@ xfs_alloc_busy_search(
> return match;
> }
>
> +/*
> + * For a given extent [fbno, flen], search the busy extent list
> + * to find a subset of the extent that is not busy.
> + */
I agree that the notation (from Dave) that you use here
is very helpful in visualizing what's going on. But
the underlying code is pretty simple, and it gets somewhat
lost in the comments I think. I therefore would kind of
prefer to have the explanation moved up above the function.
It clearly labels the cases being treated, and references
to those can be put in the code, below.
(This is a style thing, so if you feel strongly that it's
better as you have it, so be it.)
> +STATIC void
> +xfs_alloc_busy_trim(
> + struct xfs_alloc_arg *args,
> + xfs_agblock_t fbno,
> + xfs_extlen_t flen,
> + xfs_agblock_t *rbno,
> + xfs_extlen_t *rlen)
> +{
> + struct rb_node *rbp;
> +
> + ASSERT(flen > 0);
> +
> + spin_lock(&args->pag->pagb_lock);
> + rbp = args->pag->pagb_tree.rb_node;
> + while (rbp && flen >= args->minlen) {
> + struct xfs_busy_extent *busyp =
> + rb_entry(rbp, struct xfs_busy_extent, rb_node);
> + xfs_agblock_t fend = fbno + flen;
All the nice diagrams refer to the variable "fbno"
and "fend" using "bno" and "end. I think you should
either drop the "f" in the variables or add it to
the comments.
> + xfs_agblock_t bbno = busyp->bno;
> + xfs_agblock_t bend = bbno + busyp->length;
> +
> + if (fbno + flen <= bbno) {
if (fend <= bbno) {
> + rbp = rbp->rb_left;
> + continue;
> + } else if (fbno >= bend) {
> + rbp = rbp->rb_right;
> + continue;
> + }
> +
> + if (bbno <= fbno) {
> + /* start overlap */
> + ASSERT(bend > fbno);
> + ASSERT(bend <= fend);
This assertion is wrong (Case 1 is an example).
The only things you know are:
bbno <= fbno
bbno < bend therefore (fbno < bend)
and
fbno < fend
but you don't know the relationship between bend and fend.
> +
> + /*
> + * Case 1:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +---------+
> + * bno end
> + *
As long as you're enumerating all the cases, there's
one that you don't mention (but which is covered in
this block):
* bbno bend
* +BBBBBBBBBBBBBB+
* +---------+
* bno end
*
I think this should be added to the description
for completeness.
> + * Case 2:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-------------+
> + * bno end
> + *
> +
. . .
> + } else {
> + /* middle overlap */
> +
> + /*
> + * Case 9:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-----------------------------------+
> + * bno end
> + *
> + * Can be trimmed to:
> + * +-------+ OR +-------+
> + * bno end bno end
> + *
The following block of text explains things, but it might
be clearer if it's rearranged a bit:
- Backward allocation leads to significant fragmentation
of directories, which degrades directory performance.
- Therefore we always want to choose the option that
produces forward allocation patterns.
- Preferring the lower bno extent will make the next
request use "end" as the start of the next allocation.
If the segment is no longer busy at that point, we'll
then get a contiguous allocation, but even if it is
still busy, we'll get a forward allocation.
- We try to avoid choosing the segment at "bend",
because that can lead to the next allocation taking
the segment at "bno"--which would be a backward
allocation.
- We only use the segment at "bno" if it is much larger
than the current requested size, because in that case
there's a good chance subsequent allocations will
be contiguous.
(Something like that anyway, I just wanted to provide
an example rather than just saying "it's wrong.")
> + * We prefer the lower bno extent because the next
> + * allocation for this inode will use "end" as the
> + * target for first block. If the busy segment has
> + * cleared, this will get a contiguous allocation next
> + * time around; if thebusy segment has not cleared,
> + * it will get an allocation at bend, which is a forward
> + * allocation.
> + *
> + * If we choose segment at bend, and this remains the
> + * best extent for the next allocation (e.g. NEAR_BNO
> + * allocation) we'll next allocate at bno, which will
> + * give us backwards allocation. We already know that
> + * backwards allocation direction causes significant
> + * fragmentation of directories and degradataion of
> + * directory performance.
> + *
> + * Always chose the option that produces forward
> + * allocation patterns so that sequential reads and
> + * writes only ever seek in one direction. Only choose
> + * the higher bno extent if the remainin unused extent
> + * length is much larger than the current allocation
> + * request, promising us a contiguous allocation in
> + * the following free space.
> + */
> +
> + if (bbno - fbno >= args->maxlen) {
> + /* left candidate fits perfect */
> + fend = bbno;
> + } else if (fend - bend >= args->maxlen * 4) {
This magic value 4 ought to be justified, or experimented
with, or possibly set as a tunable (for the time being).
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 3/3] xfs: do not immediately reuse busy extent ranges
2011-03-07 23:01 ` Alex Elder
@ 2011-03-09 11:11 ` Christoph Hellwig
0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2011-03-09 11:11 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Mon, Mar 07, 2011 at 05:01:36PM -0600, Alex Elder wrote:
> This time I just scanned most of the change, since it appears
> it's almost the same except for the (renamed) xfs_alloc_busy_trim()
> function.
Yes.
> It looks correct now, but I have a few things for you
> to consider. I'll wait for your response in case you want to
> change anything. After that I'll pull in the three patches
> (probably tomorrow).
For now please just pull the first two. There's a fair chance number
three will change based on how the discard work goes.
> I agree that the notation (from Dave) that you use here
> is very helpful in visualizing what's going on. But
> the underlying code is pretty simple, and it gets somewhat
> lost in the comments I think. I therefore would kind of
> prefer to have the explanation moved up above the function.
> It clearly labels the cases being treated, and references
> to those can be put in the code, below.
>
> (This is a style thing, so if you feel strongly that it's
> better as you have it, so be it.)
I tried that before, but matching the cases to numbers in comments
wasn't very readable so I switch to this notation.
> All the nice diagrams refer to the variable "fbno"
> and "fend" using "bno" and "end. I think you should
> either drop the "f" in the variables or add it to
> the comments.
Indeed. I did a last minute cleanup to consolidate the duplicate
variables and didn't update the comments.
> (Something like that anyway, I just wanted to provide
> an example rather than just saying "it's wrong.")
Ok.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2011-03-09 11:08 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-03-04 12:59 [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
2011-03-04 12:59 ` [PATCH 1/3] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
2011-03-04 12:59 ` [PATCH 3/3] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-03-07 23:01 ` Alex Elder
2011-03-09 11:11 ` Christoph Hellwig
2011-03-04 13:19 ` [PATCH 0/3] avoid busy extents during user data allocations Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox