From: Eric Sandeen <sandeen@sandeen.net>
To: Dave Chinner <david@fromorbit.com>, xfs@oss.sgi.com
Subject: Re: [PATCH 2/2] repair: don't grind CPUs with large extent lists
Date: Thu, 08 May 2014 22:01:37 -0500 [thread overview]
Message-ID: <536C4511.3050201@sandeen.net> (raw)
In-Reply-To: <1399598222-4349-3-git-send-email-david@fromorbit.com>
On 5/8/14, 8:17 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> When repairing a large filesystem with fragemented files, xfs_repair
> can grind to a halt burning multiple CPUs in blkmap_set_ext().
> blkmap_set_ext() inserts extents into the blockmap for the inode
> fork and it keeps them in order, even if the inserts are not done in
> order.
>
> The ordered insert is highly inefficient - it starts at the first
> extent, and simple walks the array to find the insertion point. i.e.
> it is an O(n) operation. When we have a fragemented file with a
> large number of extents, the cost of the entire mapping operation
> is rather costly.
>
> The thing is, we are doing the insertion from an *ordered btree
> scan* which is inserting the extents in ascending offset order.
> IOWs, we are always inserting the extent at the end of the array
> after searching the entire array. i.e. the mapping operation cost is
> O(N^2).
>
> Fix this simply by reversing the order of the insert slot search.
> Start at the end of the blockmap array when we do almost all
> insertions, which brings the overhead of each insertion down to O(1)
> complexity. This, in turn, results in the overall map building
> operation being reduced to an O(N) operation, and so performance
> degrades linearly with increasing extent counts rather than
> exponentially.
>
> The result is that the test filesystem (27TB, 30M inodes, at ENOSPC)
> takes 5m10s to *fully repair* on my test system, rather that getting
> 15 (of 60) AGs into phase three and sitting there burning 3-4 CPUs
> making no progress for over half an hour.
Did the blkmap_grow() changes sneak in here accidentally?
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
> repair/bmap.c | 36 ++++++++++++++++++++++++++++--------
> 1 file changed, 28 insertions(+), 8 deletions(-)
>
> diff --git a/repair/bmap.c b/repair/bmap.c
> index 14161cb..1e913c5 100644
> --- a/repair/bmap.c
> +++ b/repair/bmap.c
> @@ -260,7 +260,15 @@ blkmap_grow(
> {
> pthread_key_t key = dblkmap_key;
> blkmap_t *new_blkmap;
> - int new_naexts = blkmap->naexts + 4;
> + int new_naexts;
> +
> + /* reduce the number of reallocations for large files */
> + if (blkmap->naexts < 1000)
> + new_naexts = blkmap->naexts + 4;
> + else if (blkmap->naexts < 10000)
> + new_naexts = blkmap->naexts + 100;
> + else
> + new_naexts = blkmap->naexts + 1000;
>
> if (pthread_getspecific(key) != blkmap) {
> key = ablkmap_key;
> @@ -308,7 +316,7 @@ blkmap_set_ext(
> xfs_dfilblks_t c)
> {
> blkmap_t *blkmap = *blkmapp;
> - xfs_extnum_t i;
> + xfs_extnum_t i = 0;
I wonder if rather than initializing i here...
>
> if (blkmap->nexts == blkmap->naexts) {
> blkmap = blkmap_grow(blkmap);
> @@ -318,15 +326,27 @@ blkmap_set_ext(
> }
>
> ASSERT(blkmap->nexts < blkmap->naexts);
> - for (i = 0; i < blkmap->nexts; i++) {
> - if (blkmap->exts[i].startoff > o) {
> - memmove(blkmap->exts + i + 1,
> - blkmap->exts + i,
> - sizeof(bmap_ext_t) * (blkmap->nexts - i));
> +
> + if (blkmap->nexts == 0)
... setting i = 0 here would be a bit more explicit & clear?
> + goto insert;
> +
> + /*
> + * Do a reverse order search as the most common insertion during
> + * a bmapbt scan is at the end of the map. Use "plus 1" indexing
> + * for the loop counter so when we break out we are at the correct index
> + * for insertion.
> + */
perhaps say why it's most common at the end?
> + for (i = blkmap->nexts; i > 0; i--) {
> + if (blkmap->exts[i - 1].startoff < o)
> break;
> - }
> }
>
> + /* make space for the new extent */
> + memmove(blkmap->exts + i + 1,
> + blkmap->exts + i,
> + sizeof(bmap_ext_t) * (blkmap->nexts - i));
> +
> +insert:
> blkmap->exts[i].startoff = o;
> blkmap->exts[i].startblock = b;
> blkmap->exts[i].blockcount = c;
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2014-05-09 3:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-05-09 1:17 [PATCH 0/2] repair: fixes for 3.2.0-rc2 Dave Chinner
2014-05-09 1:17 ` [PATCH 1/2] repair: don't unlock prefetch tree to read discontig buffers Dave Chinner
2014-05-09 2:14 ` Eric Sandeen
2014-05-09 3:53 ` Dave Chinner
2014-05-09 1:17 ` [PATCH 2/2] repair: don't grind CPUs with large extent lists Dave Chinner
2014-05-09 3:01 ` Eric Sandeen [this message]
2014-05-09 3:56 ` Dave Chinner
2014-05-09 4:18 ` Eric Sandeen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=536C4511.3050201@sandeen.net \
--to=sandeen@sandeen.net \
--cc=david@fromorbit.com \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox