git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* git merge performance problem..
@ 2006-07-15 21:48 Linus Torvalds
  2006-07-16  3:18 ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-07-15 21:48 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List


Junio, I think there is something wrong with git-merge. It sometimes takes 
up to ten seconds, and it's stuck at the

	git-show-branch --independent "$head" "$@"

call.

I don't know quite what that thing is even meant to do (we do already know 
the parents, why do we do something special here?) but even apart from 
that, the whole thing must be doing something seriously wrong, since it 
takes so long. Does it check the whole commit history?

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git merge performance problem..
  2006-07-15 21:48 git merge performance problem Linus Torvalds
@ 2006-07-16  3:18 ` Junio C Hamano
  2006-07-16  4:24   ` Junio C Hamano
  2006-07-16  4:26   ` Linus Torvalds
  0 siblings, 2 replies; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16  3:18 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List

Linus Torvalds <torvalds@osdl.org> writes:

> Junio, I think there is something wrong with git-merge. It sometimes takes 
> up to ten seconds, and it's stuck at the
>
> 	git-show-branch --independent "$head" "$@"
>
> call.
>
> I don't know quite what that thing is even meant to do (we do already know 
> the parents, why do we do something special here?) but even apart from 
> that, the whole thing must be doing something seriously wrong, since it 
> takes so long. Does it check the whole commit history?

The code is to cull redundant parents primarily in octopus and
is not strictly necessary.  Can I have the $head and $@ (the
other merge parents, but in your case you never do an octopus so
that would be the other branch head) to see what is going on
please?  It should not descend down the history all the way but
with the recent changes to the object marking/unmarking code it
is possible we might have broken something.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git merge performance problem..
  2006-07-16  3:18 ` Junio C Hamano
@ 2006-07-16  4:24   ` Junio C Hamano
  2006-07-16  4:43     ` Linus Torvalds
  2006-07-16  4:26   ` Linus Torvalds
  1 sibling, 1 reply; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16  4:24 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

> Linus Torvalds <torvalds@osdl.org> writes:
>
>> Junio, I think there is something wrong with git-merge. It sometimes takes 
>> up to ten seconds, and it's stuck at the
>>
>> 	git-show-branch --independent "$head" "$@"
>>
>> call.
>>
>> I don't know quite what that thing is even meant to do (we do already know 
>> the parents, why do we do something special here?) but even apart from 
>> that, the whole thing must be doing something seriously wrong, since it 
>> takes so long. Does it check the whole commit history?
>
> The code is to cull redundant parents primarily in octopus and
> is not strictly necessary.

Wrong.  The commit log says it was to remove redundant parents;
I think this as a reaction after seeing a few incorrectly made
merge commits in the kernel archive.

> ..., but in your case you never do an octopus so
> that would be the other branch head) to see what is going on
> please?

Disregard this request please.  I see a few commits that this
step takes a long time to process in the kernel archive.  The
last merge before you left to Ottawa was one of them.

b5032a5 48ce8b0

I do not think we need to do the --independent check there
especially for two-head cases because we have already done
fast-forward and up-to-date tests when we get there, so let's
revert that part from commit 6ea2334.

-- >8 --

diff --git a/git-merge.sh b/git-merge.sh
index 24e3b50..ee41077 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -314,7 +314,11 @@ # If we have a resulting tree, that mean
 # auto resolved the merge cleanly.
 if test '' != "$result_tree"
 then
-    parents=$(git-show-branch --independent "$head" "$@" | sed -e 's/^/-p /')
+    parents="-p $head"
+    for remote
+    do
+	parents="$parents -p $remote"
+    done
     result_commit=$(echo "$merge_msg" | git-commit-tree $result_tree $parents) || exit
     finish "$result_commit" "Merge $result_commit, made by $wt_strategy."
     dropsave

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: git merge performance problem..
  2006-07-16  3:18 ` Junio C Hamano
  2006-07-16  4:24   ` Junio C Hamano
@ 2006-07-16  4:26   ` Linus Torvalds
  1 sibling, 0 replies; 6+ messages in thread
From: Linus Torvalds @ 2006-07-16  4:26 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Git Mailing List



On Sat, 15 Jul 2006, Junio C Hamano wrote:
> 
> The code is to cull redundant parents primarily in octopus and
> is not strictly necessary.  Can I have the $head and $@ (the
> other merge parents, but in your case you never do an octopus so
> that would be the other branch head) to see what is going on
> please?

I think it was commit b20e481 that I reacted to this time, merging b5032a5 
and 48ce8b0.

Ie, lookie here:

	[torvalds@evo linux]$ time git merge-base --all b5032a5 48ce8b0
	672c6108a51bf559d19595d9f8193dfd81f0f752
	
	real    0m1.426s
	user    0m1.404s
	sys     0m0.016s

so it can find a merge-base in 1.4 seconds, which should certainly 
guarantee that they are independent. Then:

	[torvalds@evo linux]$ time git-show-branch --independent b5032a5 48ce8b0
	b5032a50aea76b6230db74b1d171a7f56b204bb7
	48ce8b056c88920c8ac187781048f5dae33c81b9
	
	real    0m30.657s
	user    0m30.414s
	sys     0m0.076s

Whee. Half a minute. Ok, so this is on my laptop (I'm oat the airport 
right now), so it was probably twice as fast on my desktop, but that is 
still not acceptable.

I really don't know what it's doing, because

	[torvalds@evo linux]$ time git-rev-list b5032a5 48ce8b0 > /dev/null 
	
	real    0m3.248s
	user    0m2.588s
	sys     0m0.036s

so it's really doing something very expensive - more so than just parsing 
the commits.

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git merge performance problem..
  2006-07-16  4:24   ` Junio C Hamano
@ 2006-07-16  4:43     ` Linus Torvalds
  2006-07-16  6:57       ` Junio C Hamano
  0 siblings, 1 reply; 6+ messages in thread
From: Linus Torvalds @ 2006-07-16  4:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git



On Sat, 15 Jul 2006, Junio C Hamano wrote:
> Junio C Hamano <junkio@cox.net> writes:
> >
> > The code is to cull redundant parents primarily in octopus and
> > is not strictly necessary.
> 
> Wrong.  The commit log says it was to remove redundant parents;
> I think this as a reaction after seeing a few incorrectly made
> merge commits in the kernel archive.

Arguing with yourself? ;)

But that should already have been handled by the fact that we did the 
merge-base improvements. So I don't really see why we'd need the extremely 
heavy git-show-branch.

I think the problems we had with rmk generating patches that had two 
parents but really were _meant_ to be regular patches were due to an 
independent problem, namely that we'd commit with a stale MERGE_HEAD from 
a previous (failed) merge that was never done.

I think. It's a long time ago.

> Disregard this request please.  I see a few commits that this
> step takes a long time to process in the kernel archive.  The
> last merge before you left to Ottawa was one of them.
> 
> b5032a5 48ce8b0

Yup.

And your patch will obviously fix it (by not calling git-show-branch at 
all), but I'm still left wondering why git-show-branch took that long in 
the first place. Half a minute when traversing the whole commit history 
only takes three seconds (as per my previous email)?

Now, as long as nothing I use actually ends up using git-show-branch, I 
won't care, but maybe a sign that something else can be improved?

Traditionally, what has made things _that_ slow has almost always been 
logic that traverses all sides of a merge, without having the logic to 
ignore already-seen commits (so each merge basically doubles the number of 
commits we will traverse, and the problem size goes from O(n+m) to O(m^2) 
where 'n' is number of commits, and 'm' is number of merges.

Or is git-show-branch doing something else really expensive?

		Linus

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: git merge performance problem..
  2006-07-16  4:43     ` Linus Torvalds
@ 2006-07-16  6:57       ` Junio C Hamano
  0 siblings, 0 replies; 6+ messages in thread
From: Junio C Hamano @ 2006-07-16  6:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

Linus Torvalds <torvalds@osdl.org> writes:

> On Sat, 15 Jul 2006, Junio C Hamano wrote:
>> Junio C Hamano <junkio@cox.net> writes:
>> >
>> > The code is to cull redundant parents primarily in octopus and
>> > is not strictly necessary.
>> 
>> Wrong.  The commit log says it was to remove redundant parents;
>> I think this as a reaction after seeing a few incorrectly made
>> merge commits in the kernel archive.
>
> Arguing with yourself? ;)

Yes, I was kind of tired ;-)

> And your patch will obviously fix it (by not calling git-show-branch at 
> all), but I'm still left wondering why git-show-branch took that long in 
> the first place. Half a minute when traversing the whole commit history 
> only takes three seconds (as per my previous email)?
>
> Now, as long as nothing I use actually ends up using git-show-branch, I 
> won't care, but maybe a sign that something else can be improved?

I instrumented builtin-show-branch.c::join_revs() and
commit.c::merge-bases(), and run another problematic case
between commits 1d3737 and 7f2857.  They traverse exactly the
same commits in the same order.  After all in two head case they
are essentially the same algorithm, modulo that the heuristics
with horizon effect has not been removed from join_revs() yet.

Similarly, "merge-base --all" vs "show-branch --merge-base"
between these commits has the same issue.  Although they
traverse exactly the same set of commits, the former takes 10x
longer for some reason.

And (drumroll please), thanks to gprof, the guilty party turns
out to be insert_by_date() in mark_seen().

I think this will fix both problems.

-- >8 --
show-branch: fix performance problem.

The core function used in show-branch, join_revs(), was supposed
to be exactly the same algorithm as merge_bases(), except that
it was a version enhanced for use with more than two heads.
However, it needed to mark and keep a list of all the commits it
has seen, because it needed them for its semi-graphical output.
The function to implement this list, mark_seen(), stupidly used
insert_by_date(), when it did not need to keep the list sorted
during its processing.  This made "show-branch --merge-base"
more than 20x slower compared to "merge-base --all" in some
cases (e.g. between b5032a5 and 48ce8b0 in the Linux 2.6 kernel
archive).  The performance of "show-branch --independent"
suffered from the same reason.

This patch sorts the resulting list after the list traversal
just once to fix these problems.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---
diff --git a/builtin-show-branch.c b/builtin-show-branch.c
index 260cb22..3d240ca 100644
--- a/builtin-show-branch.c
+++ b/builtin-show-branch.c
@@ -172,7 +172,7 @@ static void name_commits(struct commit_l
 static int mark_seen(struct commit *commit, struct commit_list **seen_p)
 {
 	if (!commit->object.flags) {
-		insert_by_date(commit, seen_p);
+		commit_list_insert(commit, seen_p);
 		return 1;
 	}
 	return 0;
@@ -218,9 +218,8 @@ static void join_revs(struct commit_list
 	 * Postprocess to complete well-poisoning.
 	 *
 	 * At this point we have all the commits we have seen in
-	 * seen_p list (which happens to be sorted chronologically but
-	 * it does not really matter).  Mark anything that can be
-	 * reached from uninteresting commits not interesting.
+	 * seen_p list.  Mark anything that can be reached from
+	 * uninteresting commits not interesting.
 	 */
 	for (;;) {
 		int changed = 0;
@@ -701,6 +700,8 @@ int cmd_show_branch(int ac, const char *
 	if (0 <= extra)
 		join_revs(&list, &seen, num_rev, extra);
 
+	sort_by_date(&seen);
+
 	if (merge_base)
 		return show_merge_base(seen, num_rev);
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2006-07-16  6:57 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-15 21:48 git merge performance problem Linus Torvalds
2006-07-16  3:18 ` Junio C Hamano
2006-07-16  4:24   ` Junio C Hamano
2006-07-16  4:43     ` Linus Torvalds
2006-07-16  6:57       ` Junio C Hamano
2006-07-16  4:26   ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).