From: Thomas Rast <trast@student.ethz.ch>
To: Nguyen Thai Ngoc Duy <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: in_merge_bases() is too expensive for recent "pu" update
Date: Thu, 23 Aug 2012 16:20:41 +0200 [thread overview]
Message-ID: <878vd5k7uu.fsf@thomas.inf.ethz.ch> (raw)
In-Reply-To: <CACsJy8C-VxzwigyUDHnUkXN7vhB+93X96pH9MvgB0ps7v-_NmQ@mail.gmail.com> (Nguyen Thai Ngoc Duy's message of "Thu, 23 Aug 2012 19:32:33 +0700")
Nguyen Thai Ngoc Duy <pclouds@gmail.com> writes:
> I just did a "git fetch". It took 19 seconds (measured with
> gettimeofday) to finish in_merge_bases() in update_local_ref() in
> fetch.c, just to print this line
>
> + a4f2db3...b95a282 pu -> origin/pu (forced update)
>
> It's partly my fault because I'm on gcc -O0. But should it not take
> that much time for one line? As a grumpy user, I'd be perfectly ok
> with git saying "probably forced update, check it yourself" if the
> spent time exceeds half a second. On the same token, if diffstat takes
> too long for git-pull, then perhaps just stop and print "do it
> yourself".
I missed out on the a4f2db3 update, so I can't test exactly this, but my
reflog tells the I had before b95a282 was d3dd556. Computing all
merge-bases takes very long indeed:
$ time git merge-base --all d3dd556 b95a282 | wc -l
51
real 0m4.877s
user 0m4.829s
sys 0m0.026s
And I think the reason is that the algorithm is just doing too much.
From a quick glance, it appears to first compute the result list rslt[]
of merge bases, then for each pair attempts to eliminate one of them by
determining that rslt[i] or rslt[j] is also a merge base of rslt[j:].
It is thus spending quadratic time in the number of merge-bases, times
the size of history (ok, I can't actually come up with an example that
does the latter).
At the very least it should be possible to change in_merge_bases() to
not do any of the post-filtering; perhaps like the patch below. It
passes the test suite. The whole "merge bases of A and a list of Bs"
thing is blowing my overheated mind, though, so I'm not able to convince
myself that it is correct in all cases.
Even if this turns out to be flawed, we should also identify uses of
in_merge_bases() where the real question was is_descendant_of() [I
somewhat suspect that's all of them], and then replace is_descendant_of
with a much cheaper lookup. This can be as simple as propagating a mark
from the candidate until it either goes beyond all possible ancestors,
or hits one of them.
By the way, the internal slowness of git-merge-base also affects the
A...B syntax. For example,
git rev-list --left-right --count @{upstream}...
is used by the __git_ps1 code to determine the prompt display, which
just got very slow for me today. Again, it should be easy to figure out
the boundary of the symmetric difference simply by propagating two
marks. I do not think that the result of A...B actually depends on
figuring out exactly what the merge bases are, simply excluding *any*
candidate without pruning is enough.
diff --git i/commit.c w/commit.c
index 65a8485..70427ab 100644
--- i/commit.c
+++ w/commit.c
@@ -837,10 +837,13 @@ int in_merge_bases(struct commit *commit, struct commit **reference, int num)
struct commit_list *bases, *b;
int ret = 0;
- if (num == 1)
- bases = get_merge_bases(commit, *reference, 1);
- else
+ if (num != 1)
die("not yet");
+
+ bases = merge_bases_many(commit, 1, reference);
+ clear_commit_marks(commit, all_flags);
+ clear_commit_marks(*reference, all_flags);
+
for (b = bases; b; b = b->next) {
if (!hashcmp(commit->object.sha1, b->item->object.sha1)) {
ret = 1;
--
Thomas Rast
trast@{inf,student}.ethz.ch
next prev parent reply other threads:[~2012-08-23 14:20 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-08-23 12:32 in_merge_bases() is too expensive for recent "pu" update Nguyen Thai Ngoc Duy
2012-08-23 14:20 ` Thomas Rast [this message]
2012-08-23 18:08 ` Junio C Hamano
2012-08-23 20:42 ` Junio C Hamano
2012-08-23 21:03 ` Junio C Hamano
2012-08-24 9:32 ` Thomas Rast
2012-08-24 15:50 ` Junio C Hamano
2012-08-24 9:43 ` Thomas Rast
2012-08-24 15:15 ` Jeff King
2012-08-24 16:15 ` Junio C Hamano
2012-08-24 15:52 ` Junio C Hamano
2012-08-24 11:42 ` Nguyen Thai Ngoc Duy
2012-08-24 11:51 ` Nguyen Thai Ngoc Duy
2012-08-28 1:50 ` Junio C Hamano
2012-08-28 8:12 ` Thomas Rast
2012-08-28 16:03 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878vd5k7uu.fsf@thomas.inf.ethz.ch \
--to=trast@student.ethz.ch \
--cc=git@vger.kernel.org \
--cc=pclouds@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).