From: Michael Haggerty <mhagger@alum.mit.edu>
To: Jeff King <peff@peff.net>
Cc: Martin Fick <mfick@codeaurora.org>,
git discussion list <git@vger.kernel.org>,
Junio C Hamano <gitster@pobox.com>
Subject: Re: remove_duplicates() in builtin/fetch-pack.c is O(N^2)
Date: Thu, 24 May 2012 14:05:16 +0200 [thread overview]
Message-ID: <4FBE23FC.5070405@alum.mit.edu> (raw)
In-Reply-To: <20120522173355.GC11600@sigill.intra.peff.net>
On 05/22/2012 07:33 PM, Jeff King wrote:
> On Tue, May 22, 2012 at 03:28:32PM +0200, Michael Haggerty wrote:
>
>>> When I try it with both 'next' and v1.7.10, I see that the latter is
>>> much faster. I did my tests with a warm cache, but the interesting
>>> number is the CPU time, which is quite different.
>>
>> I cannot reproduce anything as big as the performance regression that
>> you see. I did find a regression 9.5 s -> 10.1 s caused by
>>
>> 5fa044184 find_containing_dir(): use strbuf in implementation of this
>> function
>>
>> It is fixed by the patch that I just sent to the mailing list [1],
>> which sizes the strbuf in that function to strlen(refname) instead of
>> PATH_MAX. Since your experiments suggest that the performance
>> regression is related to the size of the repository contents, it
>> could be that the same test produces more memory pressure on your
>> system and therefore a larger effect. Please try the patch and tell
>> me if it fixes the problem for you.
>
> That patch drops about a second off of the slow case, but the main
> problem still remains. Just to be sure we are both doing the exact same
> thing, here is a complete reproduction recipe:
>
> GIT=/path/to/your/git/checkout
> RAILS=/path/to/unpacked/rails.git
>
> cd $GIT&&
> git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8^&&
> make&&
>
> cd $RAILS&&
> time $GIT/bin-wrappers/git fetch . refs/*:refs/*&&
>
> cd $GIT&&
> git checkout 432ad41e60cedb87ceec446ab034d46a53f5f9d8&&
> make&&
>
> cd $RAILS&&
> time $GIT/bin-wrappers/git fetch . refs/*:refs/*
>
> produces:
>
> [before]
> real 0m9.128s
> user 0m9.369s
> sys 0m0.976s
>
> [after]
> real 0m15.926s
> user 0m16.181s
> sys 0m0.984s
>
> I don't think memory pressure is involved. The git process maxes out at
> ~300M, and this machine has 7.5G available.
>
> I wonder why we are getting different results. Could it be compilation
> options? As I mentioned, I compile with -O0, but I got similar results
> with -O3.
Thanks for the idiot-proof reproduction steps. I don't know what I was
doing differently last time, but now I can reproduce your slowdown.
The thing that triggers the problem is a reference directory
("refs/remotes/8514/pull/") with 3810 sub-namespaces
("refs/remotes/8514/pull/1091", "refs/remotes/8514/pull/1092", etc),
each subnamespace containing only one or two references. It wouldn't be
a problem having so many *references* in a namespace, because references
are just added to the end of the directory unsorted, and typically only
sorted once after all of them have been added. But every time that a
sub-namespace is accessed, the namespace has to be searched to see if
that sub-namespace already exists. The searching requires the namespace
to be sorted. So, for example, when adding the following sequence of
references:
1. refs/remotes/8514/pull/1091/head
2. refs/remotes/8514/pull/1091/merge
3. refs/remotes/8514/pull/1092/head
4. refs/remotes/8514/pull/1092/merge
5. refs/remotes/8514/pull/1093/head
6. refs/remotes/8514/pull/1093/merge
At step 1, sub-namespace "refs/remotes/8514/pull/1091/" is added to
"refs/remotes/8514/pull/". This makes the code think that
"refs/remotes/8514/pull/" is unsorted.
Step 2 is not problematic; the new references is added to
"refs/remotes/8514/pull/1091/", but adding a reference doesn't require
the ref_dir to be sorted.
At step 3, namespace "refs/remotes/8514/pull/" is first checked to see
if sub-namespace "refs/remotes/8514/pull/1092/" already exists. This
search requires namespace "refs/remotes/8514/pull/" to be sorted because
step 1 caused it to be considered unsorted.
Again at step 5, namespace "refs/remotes/8514/pull/" needs to be sorted,
and so on every time a subnamespace is added.
I will submit a patch shortly.
Michael
--
Michael Haggerty
mhagger@alum.mit.edu
http://softwareswirl.blogspot.com/
next prev parent reply other threads:[~2012-05-24 12:12 UTC|newest]
Thread overview: 46+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-05-21 8:13 remove_duplicates() in builtin/fetch-pack.c is O(N^2) Michael Haggerty
2012-05-21 9:09 ` Junio C Hamano
2012-05-21 9:42 ` demerphq
2012-05-21 17:45 ` Jeff King
2012-05-21 22:14 ` Jeff King
2012-05-21 22:17 ` [PATCH 1/5] fetch-pack: sort incoming heads Jeff King
2012-05-22 20:08 ` Junio C Hamano
2012-05-22 20:23 ` Jeff King
2012-05-24 6:04 ` Jeff King
2012-05-21 22:17 ` [PATCH 2/5] fetch-pack: avoid quadratic behavior in remove_duplicates Jeff King
2012-05-21 22:19 ` [PATCH 3/5] add sorting infrastructure for list refs Jeff King
2012-05-21 22:19 ` [PATCH 4/5] fetch-pack: sort the list of incoming refs Jeff King
2012-05-21 22:23 ` [PATCH 5/5] fetch-pack: avoid quadratic loop in filter_refs Jeff King
2012-05-22 20:16 ` Junio C Hamano
2012-05-21 23:52 ` remove_duplicates() in builtin/fetch-pack.c is O(N^2) Jeff King
2012-05-22 0:07 ` Jeff King
2012-05-22 3:59 ` Michael Haggerty
2012-05-22 4:11 ` Jeff King
2012-05-22 7:15 ` Michael Haggerty
2012-05-22 7:37 ` Jeff King
2012-05-22 13:28 ` Michael Haggerty
2012-05-22 17:33 ` Jeff King
2012-05-24 12:05 ` Michael Haggerty [this message]
2012-05-25 0:17 ` Martin Fick
2012-05-25 0:39 ` Jeff King
2012-05-25 0:54 ` Martin Fick
2012-05-25 1:04 ` Jeff King
2012-05-25 1:32 ` Martin Fick
2012-05-25 6:50 ` Jeff King
2012-05-22 12:18 ` Nguyen Thai Ngoc Duy
2012-05-22 13:35 ` Michael Haggerty
2012-05-22 17:01 ` Jeff King
2012-05-22 17:35 ` Junio C Hamano
2012-05-22 17:46 ` Jeff King
2012-05-24 4:54 ` Michael Haggerty
2012-05-23 1:20 ` Nguyen Thai Ngoc Duy
2012-05-22 16:16 ` Junio C Hamano
2012-05-21 18:15 ` Martin Fick
2012-05-21 19:41 ` Jeff King
2012-05-21 22:08 ` Junio C Hamano
2012-05-21 22:24 ` Jeff King
2012-05-22 5:51 ` Martin Fick
2012-05-22 18:21 ` Jeff King
2012-05-22 22:19 ` Martin Fick
2012-05-22 23:23 ` Junio C Hamano
2012-05-23 0:46 ` Martin Fick
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4FBE23FC.5070405@alum.mit.edu \
--to=mhagger@alum.mit.edu \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=mfick@codeaurora.org \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.