From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from relay.sgi.com (relay1.corp.sgi.com [137.38.102.111]) by oss.sgi.com (Postfix) with ESMTP id 36BA97F66 for ; Thu, 8 May 2014 22:01:38 -0500 (CDT) Received: from cuda.sgi.com (cuda1.sgi.com [192.48.157.11]) by relay1.corp.sgi.com (Postfix) with ESMTP id 246B38F8039 for ; Thu, 8 May 2014 20:01:34 -0700 (PDT) Received: from sandeen.net (sandeen.net [63.231.237.45]) by cuda.sgi.com with ESMTP id wJm8eJ5eKNTebwv4 for ; Thu, 08 May 2014 20:01:33 -0700 (PDT) Message-ID: <536C4511.3050201@sandeen.net> Date: Thu, 08 May 2014 22:01:37 -0500 From: Eric Sandeen MIME-Version: 1.0 Subject: Re: [PATCH 2/2] repair: don't grind CPUs with large extent lists References: <1399598222-4349-1-git-send-email-david@fromorbit.com> <1399598222-4349-3-git-send-email-david@fromorbit.com> In-Reply-To: <1399598222-4349-3-git-send-email-david@fromorbit.com> List-Id: XFS Filesystem from SGI List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: xfs-bounces@oss.sgi.com Sender: xfs-bounces@oss.sgi.com To: Dave Chinner , xfs@oss.sgi.com On 5/8/14, 8:17 PM, Dave Chinner wrote: > From: Dave Chinner > > When repairing a large filesystem with fragemented files, xfs_repair > can grind to a halt burning multiple CPUs in blkmap_set_ext(). > blkmap_set_ext() inserts extents into the blockmap for the inode > fork and it keeps them in order, even if the inserts are not done in > order. > > The ordered insert is highly inefficient - it starts at the first > extent, and simple walks the array to find the insertion point. i.e. > it is an O(n) operation. When we have a fragemented file with a > large number of extents, the cost of the entire mapping operation > is rather costly. > > The thing is, we are doing the insertion from an *ordered btree > scan* which is inserting the extents in ascending offset order. > IOWs, we are always inserting the extent at the end of the array > after searching the entire array. i.e. the mapping operation cost is > O(N^2). > > Fix this simply by reversing the order of the insert slot search. > Start at the end of the blockmap array when we do almost all > insertions, which brings the overhead of each insertion down to O(1) > complexity. This, in turn, results in the overall map building > operation being reduced to an O(N) operation, and so performance > degrades linearly with increasing extent counts rather than > exponentially. > > The result is that the test filesystem (27TB, 30M inodes, at ENOSPC) > takes 5m10s to *fully repair* on my test system, rather that getting > 15 (of 60) AGs into phase three and sitting there burning 3-4 CPUs > making no progress for over half an hour. Did the blkmap_grow() changes sneak in here accidentally? > Signed-off-by: Dave Chinner > --- > repair/bmap.c | 36 ++++++++++++++++++++++++++++-------- > 1 file changed, 28 insertions(+), 8 deletions(-) > > diff --git a/repair/bmap.c b/repair/bmap.c > index 14161cb..1e913c5 100644 > --- a/repair/bmap.c > +++ b/repair/bmap.c > @@ -260,7 +260,15 @@ blkmap_grow( > { > pthread_key_t key = dblkmap_key; > blkmap_t *new_blkmap; > - int new_naexts = blkmap->naexts + 4; > + int new_naexts; > + > + /* reduce the number of reallocations for large files */ > + if (blkmap->naexts < 1000) > + new_naexts = blkmap->naexts + 4; > + else if (blkmap->naexts < 10000) > + new_naexts = blkmap->naexts + 100; > + else > + new_naexts = blkmap->naexts + 1000; > > if (pthread_getspecific(key) != blkmap) { > key = ablkmap_key; > @@ -308,7 +316,7 @@ blkmap_set_ext( > xfs_dfilblks_t c) > { > blkmap_t *blkmap = *blkmapp; > - xfs_extnum_t i; > + xfs_extnum_t i = 0; I wonder if rather than initializing i here... > > if (blkmap->nexts == blkmap->naexts) { > blkmap = blkmap_grow(blkmap); > @@ -318,15 +326,27 @@ blkmap_set_ext( > } > > ASSERT(blkmap->nexts < blkmap->naexts); > - for (i = 0; i < blkmap->nexts; i++) { > - if (blkmap->exts[i].startoff > o) { > - memmove(blkmap->exts + i + 1, > - blkmap->exts + i, > - sizeof(bmap_ext_t) * (blkmap->nexts - i)); > + > + if (blkmap->nexts == 0) ... setting i = 0 here would be a bit more explicit & clear? > + goto insert; > + > + /* > + * Do a reverse order search as the most common insertion during > + * a bmapbt scan is at the end of the map. Use "plus 1" indexing > + * for the loop counter so when we break out we are at the correct index > + * for insertion. > + */ perhaps say why it's most common at the end? > + for (i = blkmap->nexts; i > 0; i--) { > + if (blkmap->exts[i - 1].startoff < o) > break; > - } > } > > + /* make space for the new extent */ > + memmove(blkmap->exts + i + 1, > + blkmap->exts + i, > + sizeof(bmap_ext_t) * (blkmap->nexts - i)); > + > +insert: > blkmap->exts[i].startoff = o; > blkmap->exts[i].startblock = b; > blkmap->exts[i].blockcount = c; > _______________________________________________ xfs mailing list xfs@oss.sgi.com http://oss.sgi.com/mailman/listinfo/xfs