From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH 4/5] xfs: speed up directory bestfree block scanning
Date: Thu, 29 Aug 2019 14:18:14 -0700 [thread overview]
Message-ID: <20190829211814.GL5354@magnolia> (raw)
In-Reply-To: <20190829104710.28239-5-david@fromorbit.com>
On Thu, Aug 29, 2019 at 08:47:09PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When running a "create millions inodes in a directory" test
> recently, I noticed we were spending a huge amount of time
> converting freespace block headers from disk format to in-memory
> format:
>
> 31.47% [kernel] [k] xfs_dir2_node_addname
> 17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk
> 3.55% [kernel] [k] xfs_dir3_free_bests_p
>
> We shouldn't be hitting the best free block scanning code so hard
> when doing sequential directory creates, and it turns out there's
> a highly suboptimal loop searching the the best free array in
> the freespace block - it decodes the block header before checking
> each entry inside a loop, instead of decoding the header once before
> running the entry search loop.
>
> This makes a massive difference to create rates. Profile now looks
> like this:
>
> 13.15% [kernel] [k] xfs_dir2_node_addname
> 3.52% [kernel] [k] xfs_dir3_leaf_check_int
> 3.11% [kernel] [k] xfs_log_commit_cil
>
> And the wall time/average file create rate differences are
> just as stark:
>
> create time(sec) / rate (files/s)
> File count vanilla patched
> 10k 0.41 / 24.3k 0.42 / 23.8k
> 20k 0.74 / 27.0k 0.76 / 26.3k
> 100k 3.81 / 26.4k 3.47 / 28.8k
> 200k 8.58 / 23.3k 7.19 / 27.8k
> 1M 85.69 / 11.7k 48.53 / 20.6k
> 2M 280.31 / 7.1k 130.14 / 15.3k
>
> The larger the directory, the bigger the performance improvement.
>
> Signed-Off-By: Dave Chinner <dchinner@redhat.com>
It took me a long time to figure out what the old code even did.
It's much easier to figure out what's going on in the new version.
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
--D
> ---
> fs/xfs/libxfs/xfs_dir2_node.c | 97 ++++++++++++-----------------------
> 1 file changed, 34 insertions(+), 63 deletions(-)
>
> diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
> index 93254f45a5f9..a81f56d9e538 100644
> --- a/fs/xfs/libxfs/xfs_dir2_node.c
> +++ b/fs/xfs/libxfs/xfs_dir2_node.c
> @@ -1750,8 +1750,8 @@ xfs_dir2_node_find_freeblk(
> xfs_dir2_db_t dbno = -1;
> xfs_dir2_db_t fbno = -1;
> xfs_fileoff_t fo;
> - __be16 *bests;
> - int findex;
> + __be16 *bests = NULL;
> + int findex = 0;
> int error;
>
> /*
> @@ -1781,14 +1781,14 @@ xfs_dir2_node_find_freeblk(
> */
> ifbno = fblk->blkno;
> fbno = ifbno;
> + xfs_trans_brelse(tp, fbp);
> + fbp = NULL;
> + fblk->bp = NULL;
> }
> - ASSERT(dbno == -1);
> - findex = 0;
>
> /*
> * If we don't have a data block yet, we're going to scan the freespace
> - * blocks looking for one. Figure out what the highest freespace block
> - * number is.
> + * data for a data block with enough free space in it.
> */
> error = xfs_bmap_last_offset(dp, &fo, XFS_DATA_FORK);
> if (error)
> @@ -1799,70 +1799,41 @@ xfs_dir2_node_find_freeblk(
> if (fbno == -1)
> fbno = xfs_dir2_byte_to_db(args->geo, XFS_DIR2_FREE_OFFSET);
>
> - /*
> - * While we haven't identified a data block, search the freeblock
> - * data for a good data block. If we find a null freeblock entry,
> - * indicating a hole in the data blocks, remember that.
> - */
> - while (dbno == -1) {
> - /*
> - * If we don't have a freeblock in hand, get the next one.
> - */
> - if (fbp == NULL) {
> - /*
> - * If it's ifbno we already looked at it.
> - */
> - if (++fbno == ifbno)
> - fbno++;
> - /*
> - * If it's off the end we're done.
> - */
> - if (fbno >= lastfbno)
> - break;
> - /*
> - * Read the block. There can be holes in the
> - * freespace blocks, so this might not succeed.
> - * This should be really rare, so there's no reason
> - * to avoid it.
> - */
> - error = xfs_dir2_free_try_read(tp, dp,
> - xfs_dir2_db_to_da(args->geo, fbno),
> - &fbp);
> - if (error)
> - return error;
> - if (!fbp)
> - continue;
> - free = fbp->b_addr;
> - findex = 0;
> - }
> + for ( ; fbno < lastfbno; fbno++) {
> + /* If it's ifbno we already looked at it. */
> + if (fbno == ifbno)
> + continue;
> +
> /*
> - * Look at the current free entry. Is it good enough?
> - *
> - * The bests initialisation should be where the bufer is read in
> - * the above branch. But gcc is too stupid to realise that bests
> - * and the freehdr are actually initialised if they are placed
> - * there, so we have to do it here to avoid warnings. Blech.
> + * Read the block. There can be holes in the freespace blocks,
> + * so this might not succeed. This should be really rare, so
> + * there's no reason to avoid it.
> */
> + error = xfs_dir2_free_try_read(tp, dp,
> + xfs_dir2_db_to_da(args->geo, fbno),
> + &fbp);
> + if (error)
> + return error;
> + if (!fbp)
> + continue;
> +
> + free = fbp->b_addr;
> bests = dp->d_ops->free_bests_p(free);
> dp->d_ops->free_hdr_from_disk(&freehdr, free);
> - if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> - be16_to_cpu(bests[findex]) >= length)
> - dbno = freehdr.firstdb + findex;
> - else {
> - /*
> - * Are we done with the freeblock?
> - */
> - if (++findex == freehdr.nvalid) {
> - /*
> - * Drop the block.
> - */
> - xfs_trans_brelse(tp, fbp);
> - fbp = NULL;
> - if (fblk && fblk->bp)
> - fblk->bp = NULL;
> +
> + /* Scan the free entry array for a large enough free space. */
> + for (findex = 0; findex < freehdr.nvalid; findex++) {
> + if (be16_to_cpu(bests[findex]) != NULLDATAOFF &&
> + be16_to_cpu(bests[findex]) >= length) {
> + dbno = freehdr.firstdb + findex;
> + goto found_block;
> }
> }
> +
> + /* Didn't find free space, go on to next free block */
> + xfs_trans_brelse(tp, fbp);
> }
> +
> found_block:
> *dbnop = dbno;
> *fbpp = fbp;
> --
> 2.23.0.rc1
>
next prev parent reply other threads:[~2019-08-29 21:20 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-29 10:47 [PATCH v3 0/5] xfs: speed up large directory modifications Dave Chinner
2019-08-29 10:47 ` [PATCH 1/5] xfs: move xfs_dir2_addname() Dave Chinner
2019-08-29 20:52 ` Darrick J. Wong
2019-08-30 5:22 ` Christoph Hellwig
2019-08-29 10:47 ` [PATCH 2/5] xfs: factor data block addition from xfs_dir2_node_addname_int() Dave Chinner
2019-08-29 21:02 ` Darrick J. Wong
2019-08-31 1:00 ` Dave Chinner
2019-08-29 10:47 ` [PATCH 3/5] xfs: factor free block index lookup " Dave Chinner
2019-08-29 21:07 ` Darrick J. Wong
2019-08-29 10:47 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Dave Chinner
2019-08-29 21:18 ` Darrick J. Wong [this message]
2019-08-30 5:24 ` Christoph Hellwig
2019-08-29 10:47 ` [PATCH 5/5] xfs: reverse search directory freespace indexes Dave Chinner
2019-08-29 21:20 ` Darrick J. Wong
2019-08-30 5:24 ` Christoph Hellwig
-- strict thread matches above, loose matches on Subject: below --
2019-08-29 6:30 [PATCH V2 0/5] xfs: speed up large directory modifications Dave Chinner
2019-08-29 6:30 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Dave Chinner
2019-08-29 8:18 ` Christoph Hellwig
2019-08-29 8:45 ` Dave Chinner
2019-08-29 8:47 ` Christoph Hellwig
2019-08-29 8:55 ` Dave Chinner
2019-08-29 8:25 ` Christoph Hellwig
2019-08-29 9:31 ` Dave Chinner
2019-08-29 9:33 ` Christoph Hellwig
2018-10-24 22:57 [PATCH 0/5] xfs: speed up large directory modifications Dave Chinner
2018-10-24 22:57 ` [PATCH 4/5] xfs: speed up directory bestfree block scanning Dave Chinner
2018-10-26 10:24 ` Christoph Hellwig
2018-10-26 10:58 ` Dave Chinner
2018-10-26 11:56 ` Christoph Hellwig
2018-10-26 12:12 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190829211814.GL5354@magnolia \
--to=darrick.wong@oracle.com \
--cc=david@fromorbit.com \
--cc=linux-xfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox