From: Dave Chinner <david@fromorbit.com>
To: linux-xfs@vger.kernel.org
Subject: [PATCH 4/5] xfs: speed up directory bestfree block scanning
Date: Thu, 29 Aug 2019 16:30:41 +1000 [thread overview]
Message-ID: <20190829063042.22902-5-david@fromorbit.com> (raw)
In-Reply-To: <20190829063042.22902-1-david@fromorbit.com>
From: Dave Chinner <dchinner@redhat.com>
When running a "create millions inodes in a directory" test
recently, I noticed we were spending a huge amount of time
converting freespace block headers from disk format to in-memory
format:
31.47% [kernel] [k] xfs_dir2_node_addname
17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk
3.55% [kernel] [k] xfs_dir3_free_bests_p
We shouldn't be hitting the best free block scanning code so hard
when doing sequential directory creates, and it turns out there's
a highly suboptimal loop searching the the best free array in
the freespace block - it decodes the block header before checking
each entry inside a loop, instead of decoding the header once before
running the entry search loop.
This makes a massive difference to create rates. Profile now looks
like this:
13.15% [kernel] [k] xfs_dir2_node_addname
3.52% [kernel] [k] xfs_dir3_leaf_check_int
3.11% [kernel] [k] xfs_log_commit_cil
And the wall time/average file create rate differences are
just as stark:
create time(sec) / rate (files/s)
File count vanilla patched
10k 0.41 / 24.3k 0.42 / 23.8k
20k 0.74 / 27.0k 0.76 / 26.3k
100k 3.81 / 26.4k 3.47 / 28.8k
200k 8.58 / 23.3k 7.19 / 27.8k
1M 85.69 / 11.7k 48.53 / 20.6k
2M 280.31 / 7.1k 130.14 / 15.3k
The larger the directory, the bigger the performance improvement.
Signed-Off-By: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/libxfs/xfs_dir2_node.c | 101 +++++++++++++---------------------
1 file changed, 37 insertions(+), 64 deletions(-)
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 1d3d1c9b5961..c6a1d1cc638f 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1751,8 +1751,8 @@ xfs_dir2_node_find_freeblk(
xfs_dir2_db_t dbno = -1;
xfs_dir2_db_t fbno = -1;
xfs_fileoff_t fo;
- __be16 *bests;
- int findex;
+ __be16 *bests = NULL;
+ int findex = 0;
int error;
/*
@@ -1764,16 +1764,15 @@ xfs_dir2_node_find_freeblk(
fbp = fblk->bp;
free = fbp->b_addr;
findex = fblk->index;
+ bests = dp->d_ops->free_bests_p(free);
+ dp->d_ops->free_hdr_from_disk(&freehdr, free);
if (findex >= 0) {
/* caller already found the freespace for us. */
- bests = dp->d_ops->free_bests_p(free);
- dp->d_ops->free_hdr_from_disk(&freehdr, free);
-
ASSERT(findex < freehdr.nvalid);
ASSERT(be16_to_cpu(bests[findex]) != NULLDATAOFF);
ASSERT(be16_to_cpu(bests[findex]) >= length);
dbno = freehdr.firstdb + findex;
- goto out;
+ goto found_block;
}
/*
@@ -1783,8 +1782,6 @@ xfs_dir2_node_find_freeblk(
ifbno = fblk->blkno;
fbno = ifbno;
}
- ASSERT(dbno == -1);
- findex = 0;
/*
* If we don't have a data block yet, we're going to scan the freespace
@@ -1802,69 +1799,45 @@ xfs_dir2_node_find_freeblk(
/*
* While we haven't identified a data block, search the freeblock
- * data for a good data block. If we find a null freeblock entry,
- * indicating a hole in the data blocks, remember that.
+ * data for a data block with enough free space in it.
*/
- while (dbno == -1) {
- /*
- * If we don't have a freeblock in hand, get the next one.
- */
- if (fbp == NULL) {
- /*
- * If it's ifbno we already looked at it.
- */
- if (++fbno == ifbno)
- fbno++;
- /*
- * If it's off the end we're done.
- */
- if (fbno >= lastfbno)
- break;
- /*
- * Read the block. There can be holes in the
- * freespace blocks, so this might not succeed.
- * This should be really rare, so there's no reason
- * to avoid it.
- */
- error = xfs_dir2_free_try_read(tp, dp,
- xfs_dir2_db_to_da(args->geo, fbno),
- &fbp);
- if (error)
- return error;
- if (!fbp)
- continue;
- free = fbp->b_addr;
- findex = 0;
- }
+ for ( ; fbno < lastfbno; fbno++) {
+ /* If it's ifbno we already looked at it. */
+ if (fbno == ifbno)
+ continue;
+
/*
- * Look at the current free entry. Is it good enough?
- *
- * The bests initialisation should be where the bufer is read in
- * the above branch. But gcc is too stupid to realise that bests
- * and the freehdr are actually initialised if they are placed
- * there, so we have to do it here to avoid warnings. Blech.
+ * Read the block. There can be holes in the freespace blocks,
+ * so this might not succeed. This should be really rare, so
+ * there's no reason to avoid it.
*/
+ error = xfs_dir2_free_try_read(tp, dp,
+ xfs_dir2_db_to_da(args->geo, fbno),
+ &fbp);
+ if (error)
+ return error;
+ if (!fbp)
+ continue;
+
+ findex = 0;
+ free = fbp->b_addr;
bests = dp->d_ops->free_bests_p(free);
dp->d_ops->free_hdr_from_disk(&freehdr, free);
- if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
- be16_to_cpu(bests[findex]) >= length)
- dbno = freehdr.firstdb + findex;
- else {
- /*
- * Are we done with the freeblock?
- */
- if (++findex == freehdr.nvalid) {
- /*
- * Drop the block.
- */
- xfs_trans_brelse(tp, fbp);
- fbp = NULL;
- if (fblk && fblk->bp)
- fblk->bp = NULL;
+
+ /* Scan the free entry array for a large enough free space. */
+ do {
+ if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
+ be16_to_cpu(bests[findex]) >= length) {
+ dbno = freehdr.firstdb + findex;
+ goto found_block;
}
- }
+ } while (++findex < freehdr.nvalid);
+
+ /* Didn't find free space, go on to next free block */
+ xfs_trans_brelse(tp, fbp);
}
-out:
+
+found_block:
*dbnop = dbno;
*fbpp = fbp;
*findexp = findex;
--
2.23.0.rc1
next prev parent reply other threads:[~2019-08-29 6:30 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-29 6:30 [PATCH V2 0/5] xfs: speed up large directory modifications Dave Chinner
2019-08-29 6:30 ` [PATCH 1/5] xfs: move xfs_dir2_addname() Dave Chinner
2019-08-29 7:59 ` Christoph Hellwig
2019-08-29 6:30 ` [PATCH 2/5] xfs: factor data block addition from xfs_dir2_node_addname_int() Dave Chinner
2019-08-29 8:05 ` Christoph Hellwig
2019-08-29 8:34 ` Dave Chinner
2019-08-29 6:30 ` [PATCH 3/5] xfs: factor free block index lookup " Dave Chinner
2019-08-29 8:10 ` Christoph Hellwig
2019-08-29 8:35 ` Dave Chinner
2019-08-29 6:30 ` Dave Chinner [this message]
2019-08-29 8:18 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Christoph Hellwig
2019-08-29 8:45 ` Dave Chinner
2019-08-29 8:47 ` Christoph Hellwig
2019-08-29 8:55 ` Dave Chinner
2019-08-29 8:25 ` Christoph Hellwig
2019-08-29 9:31 ` Dave Chinner
2019-08-29 9:33 ` Christoph Hellwig
2019-08-29 6:30 ` [PATCH 5/5] xfs: reverse search directory freespace indexes Dave Chinner
2019-08-29 8:23 ` Christoph Hellwig
2019-08-29 9:14 ` Dave Chinner
-- strict thread matches above, loose matches on Subject: below --
2019-08-29 10:47 [PATCH v3 0/5] xfs: speed up large directory modifications Dave Chinner
2019-08-29 10:47 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Dave Chinner
2019-08-29 21:18 ` Darrick J. Wong
2019-08-30 5:24 ` Christoph Hellwig
2018-10-24 22:57 [PATCH 0/5] xfs: speed up large directory modifications Dave Chinner
2018-10-24 22:57 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Dave Chinner
2018-10-26 10:24 ` Christoph Hellwig
2018-10-26 10:58 ` Dave Chinner
2018-10-26 11:56 ` Christoph Hellwig
2018-10-26 12:12 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190829063042.22902-5-david@fromorbit.com \
--to=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox