* git merge performance problem..
@ 2006-07-15 21:48 Linus Torvalds
2006-07-16 3:18 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-07-15 21:48 UTC (permalink / raw)
To: Junio C Hamano, Git Mailing List
Junio, I think there is something wrong with git-merge. It sometimes takes
up to ten seconds, and it's stuck at the
git-show-branch --independent "$head" "$@"
call.
I don't know quite what that thing is even meant to do (we do already know
the parents, why do we do something special here?) but even apart from
that, the whole thing must be doing something seriously wrong, since it
takes so long. Does it check the whole commit history?
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git merge performance problem..
2006-07-15 21:48 git merge performance problem Linus Torvalds
@ 2006-07-16 3:18 ` Junio C Hamano
2006-07-16 4:24 ` Junio C Hamano
2006-07-16 4:26 ` Linus Torvalds
0 siblings, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16 3:18 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Git Mailing List
Linus Torvalds <torvalds@osdl.org> writes:
> Junio, I think there is something wrong with git-merge. It sometimes takes
> up to ten seconds, and it's stuck at the
>
> git-show-branch --independent "$head" "$@"
>
> call.
>
> I don't know quite what that thing is even meant to do (we do already know
> the parents, why do we do something special here?) but even apart from
> that, the whole thing must be doing something seriously wrong, since it
> takes so long. Does it check the whole commit history?
The code is to cull redundant parents primarily in octopus and
is not strictly necessary. Can I have the $head and $@ (the
other merge parents, but in your case you never do an octopus so
that would be the other branch head) to see what is going on
please? It should not descend down the history all the way but
with the recent changes to the object marking/unmarking code it
is possible we might have broken something.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git merge performance problem..
2006-07-16 3:18 ` Junio C Hamano
@ 2006-07-16 4:24 ` Junio C Hamano
2006-07-16 4:43 ` Linus Torvalds
2006-07-16 4:26 ` Linus Torvalds
1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16 4:24 UTC (permalink / raw)
To: git
Junio C Hamano <junkio@cox.net> writes:
> Linus Torvalds <torvalds@osdl.org> writes:
>
>> Junio, I think there is something wrong with git-merge. It sometimes takes
>> up to ten seconds, and it's stuck at the
>>
>> git-show-branch --independent "$head" "$@"
>>
>> call.
>>
>> I don't know quite what that thing is even meant to do (we do already know
>> the parents, why do we do something special here?) but even apart from
>> that, the whole thing must be doing something seriously wrong, since it
>> takes so long. Does it check the whole commit history?
>
> The code is to cull redundant parents primarily in octopus and
> is not strictly necessary.
Wrong. The commit log says it was to remove redundant parents;
I think this as a reaction after seeing a few incorrectly made
merge commits in the kernel archive.
> ..., but in your case you never do an octopus so
> that would be the other branch head) to see what is going on
> please?
Disregard this request please. I see a few commits that this
step takes a long time to process in the kernel archive. The
last merge before you left to Ottawa was one of them.
b5032a5 48ce8b0
I do not think we need to do the --independent check there
especially for two-head cases because we have already done
fast-forward and up-to-date tests when we get there, so let's
revert that part from commit 6ea2334.
-- >8 --
diff --git a/git-merge.sh b/git-merge.sh
index 24e3b50..ee41077 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -314,7 +314,11 @@ # If we have a resulting tree, that mean
# auto resolved the merge cleanly.
if test '' != "$result_tree"
then
- parents=$(git-show-branch --independent "$head" "$@" | sed -e 's/^/-p /')
+ parents="-p $head"
+ for remote
+ do
+ parents="$parents -p $remote"
+ done
result_commit=$(echo "$merge_msg" | git-commit-tree $result_tree $parents) || exit
finish "$result_commit" "Merge $result_commit, made by $wt_strategy."
dropsave
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: git merge performance problem..
2006-07-16 3:18 ` Junio C Hamano
2006-07-16 4:24 ` Junio C Hamano
@ 2006-07-16 4:26 ` Linus Torvalds
1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-07-16 4:26 UTC (permalink / raw)
To: Junio C Hamano; +Cc: Git Mailing List
On Sat, 15 Jul 2006, Junio C Hamano wrote:
>
> The code is to cull redundant parents primarily in octopus and
> is not strictly necessary. Can I have the $head and $@ (the
> other merge parents, but in your case you never do an octopus so
> that would be the other branch head) to see what is going on
> please?
I think it was commit b20e481 that I reacted to this time, merging b5032a5
and 48ce8b0.
Ie, lookie here:
[torvalds@evo linux]$ time git merge-base --all b5032a5 48ce8b0
672c6108a51bf559d19595d9f8193dfd81f0f752
real 0m1.426s
user 0m1.404s
sys 0m0.016s
so it can find a merge-base in 1.4 seconds, which should certainly
guarantee that they are independent. Then:
[torvalds@evo linux]$ time git-show-branch --independent b5032a5 48ce8b0
b5032a50aea76b6230db74b1d171a7f56b204bb7
48ce8b056c88920c8ac187781048f5dae33c81b9
real 0m30.657s
user 0m30.414s
sys 0m0.076s
Whee. Half a minute. Ok, so this is on my laptop (I'm oat the airport
right now), so it was probably twice as fast on my desktop, but that is
still not acceptable.
I really don't know what it's doing, because
[torvalds@evo linux]$ time git-rev-list b5032a5 48ce8b0 > /dev/null
real 0m3.248s
user 0m2.588s
sys 0m0.036s
so it's really doing something very expensive - more so than just parsing
the commits.
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git merge performance problem..
2006-07-16 4:24 ` Junio C Hamano
@ 2006-07-16 4:43 ` Linus Torvalds
2006-07-16 6:57 ` Junio C Hamano
0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-07-16 4:43 UTC (permalink / raw)
To: Junio C Hamano; +Cc: git
On Sat, 15 Jul 2006, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
> >
> > The code is to cull redundant parents primarily in octopus and
> > is not strictly necessary.
>
> Wrong. The commit log says it was to remove redundant parents;
> I think this as a reaction after seeing a few incorrectly made
> merge commits in the kernel archive.
Arguing with yourself? ;)
But that should already have been handled by the fact that we did the
merge-base improvements. So I don't really see why we'd need the extremely
heavy git-show-branch.
I think the problems we had with rmk generating patches that had two
parents but really were _meant_ to be regular patches were due to an
independent problem, namely that we'd commit with a stale MERGE_HEAD from
a previous (failed) merge that was never done.
I think. It's a long time ago.
> Disregard this request please. I see a few commits that this
> step takes a long time to process in the kernel archive. The
> last merge before you left to Ottawa was one of them.
>
> b5032a5 48ce8b0
Yup.
And your patch will obviously fix it (by not calling git-show-branch at
all), but I'm still left wondering why git-show-branch took that long in
the first place. Half a minute when traversing the whole commit history
only takes three seconds (as per my previous email)?
Now, as long as nothing I use actually ends up using git-show-branch, I
won't care, but maybe a sign that something else can be improved?
Traditionally, what has made things _that_ slow has almost always been
logic that traverses all sides of a merge, without having the logic to
ignore already-seen commits (so each merge basically doubles the number of
commits we will traverse, and the problem size goes from O(n+m) to O(m^2)
where 'n' is number of commits, and 'm' is number of merges.
Or is git-show-branch doing something else really expensive?
Linus
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: git merge performance problem..
2006-07-16 4:43 ` Linus Torvalds
@ 2006-07-16 6:57 ` Junio C Hamano
0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16 6:57 UTC (permalink / raw)
To: Linus Torvalds; +Cc: git
Linus Torvalds <torvalds@osdl.org> writes:
> On Sat, 15 Jul 2006, Junio C Hamano wrote:
>> Junio C Hamano <junkio@cox.net> writes:
>> >
>> > The code is to cull redundant parents primarily in octopus and
>> > is not strictly necessary.
>>
>> Wrong. The commit log says it was to remove redundant parents;
>> I think this as a reaction after seeing a few incorrectly made
>> merge commits in the kernel archive.
>
> Arguing with yourself? ;)
Yes, I was kind of tired ;-)
> And your patch will obviously fix it (by not calling git-show-branch at
> all), but I'm still left wondering why git-show-branch took that long in
> the first place. Half a minute when traversing the whole commit history
> only takes three seconds (as per my previous email)?
>
> Now, as long as nothing I use actually ends up using git-show-branch, I
> won't care, but maybe a sign that something else can be improved?
I instrumented builtin-show-branch.c::join_revs() and
commit.c::merge-bases(), and run another problematic case
between commits 1d3737 and 7f2857. They traverse exactly the
same commits in the same order. After all in two head case they
are essentially the same algorithm, modulo that the heuristics
with horizon effect has not been removed from join_revs() yet.
Similarly, "merge-base --all" vs "show-branch --merge-base"
between these commits has the same issue. Although they
traverse exactly the same set of commits, the former takes 10x
longer for some reason.
And (drumroll please), thanks to gprof, the guilty party turns
out to be insert_by_date() in mark_seen().
I think this will fix both problems.
-- >8 --
show-branch: fix performance problem.
The core function used in show-branch, join_revs(), was supposed
to be exactly the same algorithm as merge_bases(), except that
it was a version enhanced for use with more than two heads.
However, it needed to mark and keep a list of all the commits it
has seen, because it needed them for its semi-graphical output.
The function to implement this list, mark_seen(), stupidly used
insert_by_date(), when it did not need to keep the list sorted
during its processing. This made "show-branch --merge-base"
more than 20x slower compared to "merge-base --all" in some
cases (e.g. between b5032a5 and 48ce8b0 in the Linux 2.6 kernel
archive). The performance of "show-branch --independent"
suffered from the same reason.
This patch sorts the resulting list after the list traversal
just once to fix these problems.
Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/builtin-show-branch.c b/builtin-show-branch.c
index 260cb22..3d240ca 100644
--- a/builtin-show-branch.c
+++ b/builtin-show-branch.c
@@ -172,7 +172,7 @@ static void name_commits(struct commit_l
static int mark_seen(struct commit *commit, struct commit_list **seen_p)
{
if (!commit->object.flags) {
- insert_by_date(commit, seen_p);
+ commit_list_insert(commit, seen_p);
return 1;
}
return 0;
@@ -218,9 +218,8 @@ static void join_revs(struct commit_list
* Postprocess to complete well-poisoning.
*
* At this point we have all the commits we have seen in
- * seen_p list (which happens to be sorted chronologically but
- * it does not really matter). Mark anything that can be
- * reached from uninteresting commits not interesting.
+ * seen_p list. Mark anything that can be reached from
+ * uninteresting commits not interesting.
*/
for (;;) {
int changed = 0;
@@ -701,6 +700,8 @@ int cmd_show_branch(int ac, const char *
if (0 <= extra)
join_revs(&list, &seen, num_rev, extra);
+ sort_by_date(&seen);
+
if (merge_base)
return show_merge_base(seen, num_rev);
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2006-07-16 6:57 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-15 21:48 git merge performance problem Linus Torvalds
2006-07-16 3:18 ` Junio C Hamano
2006-07-16 4:24 ` Junio C Hamano
2006-07-16 4:43 ` Linus Torvalds
2006-07-16 6:57 ` Junio C Hamano
2006-07-16 4:26 ` Linus Torvalds
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).