Bottlenecks in git merge

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Bottlenecks in git merge
@ 2006-01-31 21:33 Peter Eriksen
  2006-01-31 23:06 ` Junio C Hamano
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Peter Eriksen @ 2006-01-31 21:33 UTC (permalink / raw)
  To: Git Mailing List

Hello, 

In connection with Ian Molton's question about merge have I played a
little with 'git merge' on the kernel sources.  What I find is that a
merge can take quite some time, but I'm not sure where that time exactly
goes to.  Here are the times I got:

Recursive (default):  4m22.282s
Resolve (-s resolve): 3m23.548s


What is taking so long?

Regards,

Peter


==============================>8==================
#!/bin/sh
#  Run from linux-2.6

change_readme() {
	sed -i "s/are much better/are better/" README
}

git checkout -f master
git branch -d test
git checkout -b test v2.6.12
change_readme
git commit -a -m "Work, work, work"
time git merge $STRATEGY "Merging happily." HEAD v2.6.15

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 21:33 Bottlenecks in git merge Peter Eriksen
@ 2006-01-31 23:06 ` Junio C Hamano
  2006-01-31 23:35   ` Petr Baudis
  2006-01-31 23:45   ` Linus Torvalds
  2006-01-31 23:27 ` Petr Baudis
  2006-02-04  7:31 ` [PATCH] read-tree --aggressive Junio C Hamano
  2 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-01-31 23:06 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: Git Mailing List

"Peter Eriksen" <s022018@student.dtu.dk> writes:

> Recursive (default):  4m22.282s
> Resolve (-s resolve): 3m23.548s
>
> What is taking so long?

I am actually surprised that recursive is not much slower than
resolve.  I expected to see bigger difference for a merge like
this.

> git checkout -b test v2.6.12
> change_readme
> git commit -a -m "Work, work, work"
> time git merge $STRATEGY "Merging happily." HEAD v2.6.15

You are merging a variant of v2.6.12 and v2.6.15.  Each of these
two official revisions has roughly 18,000 files, and they differ
at 10,723 files among them.

With an up-to-date index that has small changes from v2.6.12,
merging these two revisions using read-tree -m to do the trivial
merge (the part that comes before recursive/resolve) leaves
about 850 files to be resolved in the working tree.  For these
files, you need to do an equivalent of merge-one-file to merge
the differences (in this particular case, most of them are
"removed in one but unchanged in the other" kind).  In addition,
you have to checkout the result of the merge, which means you
need to update at least 10,723 files.

I suspect that it might make things quite faster if we resolved
case 8 and 10 (see either Documentation/technical/trivial-merge
or t/t1000) in index for this particular case, but it has
correctness issues.  A merge strategy may want to say "This file
was removed by the other branch while it stayed on our branch;
but this is not a remove but actually a rename", and do
something different from what merge-one-file does, and resolving
these cases in index closes the door for that possibility.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 21:33 Bottlenecks in git merge Peter Eriksen
  2006-01-31 23:06 ` Junio C Hamano
@ 2006-01-31 23:27 ` Petr Baudis
  2006-02-04  7:31 ` [PATCH] read-tree --aggressive Junio C Hamano
  2 siblings, 0 replies; 11+ messages in thread
From: Petr Baudis @ 2006-01-31 23:27 UTC (permalink / raw)
  To: Git Mailing List

Hello,

Dear diary, on Tue, Jan 31, 2006 at 10:33:14PM CET, I got a letter
where Peter Eriksen <s022018@student.dtu.dk> said that...
> In connection with Ian Molton's question about merge have I played a
> little with 'git merge' on the kernel sources.  What I find is that a
> merge can take quite some time, but I'm not sure where that time exactly
> goes to.  Here are the times I got:
> 
> Recursive (default):  4m22.282s
> Resolve (-s resolve): 3m23.548s
> 
> 
> What is taking so long?

it is difficult to benchmark for me since everything required for the
merge (that is, both all the objects and the whole working tree) just
won't fit into my caches (or Linux at least won't let it stay there for
long enough). I ended up repeatedly calling the subcommands, but that
obviously is not a real world usage pattern. Proportionally, the
significant eaters of time for cg-merge (similar to -s resolve) are:

git-merge-base       --- 1s cached, 10s to 20s uncached
git-read-tree -m     --- 1s cached, 10s or more uncached
git-read-tree -m -u  --- 1m50s w/ heavy disk activity, but big part of it
                         is writing blocks
git-merge-index -a \
	-o /bin/true --- 1s cached
git-merge-index -a \
	-o ~/cogito/cg-Xmergefile
                     --- 1m27s with some disk activity (44s user, 20s sys)
                         cg-Xmergefile is very similar to
			 git-merge-one-file

Note that the time spent by git-read-tree here is just checking out the
new file versions, which is inevitable. ;-)

The real killer here is therefore git-merge-one-file. Most frequent hits
here are probably of the added-in-one case, resulting in two more
fork()s, reloading the index like mad all the time.

Comparing cg-merge to git-merge, one difference is that git-merge tries
to do kind of "trivial" merge first (apparently even if -s was passed to
it), the point of which kind of escapes me if you are using the resolve
strategy, but which causes two git-update-index calls - even one can
take good half a minute or more if your cache is cold.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 23:06 ` Junio C Hamano
@ 2006-01-31 23:35   ` Petr Baudis
  2006-02-01  0:43     ` Junio C Hamano
  2006-01-31 23:45   ` Linus Torvalds
  1 sibling, 1 reply; 11+ messages in thread
From: Petr Baudis @ 2006-01-31 23:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Peter Eriksen, Git Mailing List

Dear diary, on Wed, Feb 01, 2006 at 12:06:57AM CET, I got a letter
where Junio C Hamano <junkio@cox.net> said that...
> "Peter Eriksen" <s022018@student.dtu.dk> writes:
> 
> > Recursive (default):  4m22.282s
> > Resolve (-s resolve): 3m23.548s
> >
> > What is taking so long?
> 
> I am actually surprised that recursive is not much slower than
> resolve.  I expected to see bigger difference for a merge like
> this.
> 
> > git checkout -b test v2.6.12
> > change_readme
> > git commit -a -m "Work, work, work"
> > time git merge $STRATEGY "Merging happily." HEAD v2.6.15
> 
> You are merging a variant of v2.6.12 and v2.6.15.  Each of these
> two official revisions has roughly 18,000 files, and they differ
> at 10,723 files among them.
> 
> With an up-to-date index that has small changes from v2.6.12,
> merging these two revisions using read-tree -m to do the trivial
> merge (the part that comes before recursive/resolve) leaves
> about 850 files to be resolved in the working tree.  For these
> files, you need to do an equivalent of merge-one-file to merge
> the differences (in this particular case, most of them are
> "removed in one but unchanged in the other" kind).  In addition,
> you have to checkout the result of the merge, which means you
> need to update at least 10,723 files.
> 
> I suspect that it might make things quite faster if we resolved
> case 8 and 10 (see either Documentation/technical/trivial-merge
> or t/t1000) in index for this particular case, but it has
> correctness issues.  A merge strategy may want to say "This file
> was removed by the other branch while it stayed on our branch;
> but this is not a remove but actually a rename", and do
> something different from what merge-one-file does, and resolving
> these cases in index closes the door for that possibility.

What about letting the file-handler actually tell merge-index what to
do? merge-index could make a fifo at fd 3 for it (we might fork a
special buffering process for it to avoid PIPE_BUF issues) and let it
write there a sequence of lines like:

	path\0{add|remove|update} {workingcopy|<sha1> <mode>}

That would avoid many in-file-handler forks and especially perpetual
reloading and rewriting of the index file, which _seems_ to be the main
time waster according to my somewhat fuzzy benchmarks.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 23:06 ` Junio C Hamano
  2006-01-31 23:35   ` Petr Baudis
@ 2006-01-31 23:45   ` Linus Torvalds
  2006-02-01  0:50     ` Petr Baudis
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2006-01-31 23:45 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Peter Eriksen, Git Mailing List

On Tue, 31 Jan 2006, Junio C Hamano wrote:
> 
> I am actually surprised that recursive is not much slower than
> resolve.  I expected to see bigger difference for a merge like
> this.

Well, if most of the cost is just the trivial single-file merges and the 
fact that we have to update a ton of files from an old version (and it 
probably is), the difference between the trivial and the recursive merge 
is not going to be huge.

> With an up-to-date index that has small changes from v2.6.12,
> merging these two revisions using read-tree -m to do the trivial
> merge (the part that comes before recursive/resolve) leaves
> about 850 files to be resolved in the working tree.  For these
> files, you need to do an equivalent of merge-one-file to merge
> the differences (in this particular case, most of them are
> "removed in one but unchanged in the other" kind).  In addition,
> you have to checkout the result of the merge, which means you
> need to update at least 10,723 files.

It would be interesting to see how big the "resolve 850 files" part is vs 
the "check out 10k+ files" is.

In particular, if the "resolve 850 files" is a noticeable portion of it, 
then the right thing to do may be to just re-write git-merge-one-file.sh 
in C. Right now, almost _all_ of the expense of that thing is just the 
shell interpreter startup. The actual actions it does are usually fairly 
cheap.

(yes, a real three-way merge is more expensive, but I suspect that even 
that isn't much more expensive than starting up an invocation of "bash". 
The other actions that merge-one-file does are _really_ trivial).

In fact, we could hardcode the "git-merge-one-file" behaviour inside 
"git-merge-index". 

Now, that won't help "recursive" (which doesn't use git-merge-one-file at 
all, and does it all by hand), but it would be an interesting test to 
make, becuase if it makes the simpler "-s resolve" merge even faster, then 
we know that this is likely a large portion of the time.

Then, somebody would have to consider what to do about 
git-merge-recursive. For example, if the _common_ case is "modified in 
both, but differently", and they merge cleanly, maybe the recursive merge 
could handle those separately and fast with a special "git-merge-one-file" 
invocation (just to cut down the number of files that it needs to think 
more about).

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 23:35   ` Petr Baudis
@ 2006-02-01  0:43     ` Junio C Hamano
  0 siblings, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-02-01  0:43 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

Petr Baudis <pasky@suse.cz> writes:

> What about letting the file-handler actually tell merge-index what to
> do? merge-index could make a fifo at fd 3 for it (we might fork a
> special buffering process for it to avoid PIPE_BUF issues) and let it
> write there a sequence of lines like...

Yes.  That is sensible.

A merge handler that is willing to look at the current index
stages (and merge-recursive certainly is capable of doing that)
can open an outgoing pipe to 'git-update-index --index-info' and
drive it.  The command syntax is a bit different from what you
wrote in your message and I think if we go this route we should
make --index-info a bit easier to use by allowing it to accept
stage 0 entries.

-- >8 --
update-index --index-info: allow stage 0 entries.

Somehow we did not allow stuffing the index with stage 0 entries
through --index-info interface.  I do not think of a reason to
forbid it offhand.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
diff --git a/update-index.c b/update-index.c
index 2361e41..94436dd 100644
--- a/update-index.c
+++ b/update-index.c
@@ -303,7 +303,7 @@ static void read_index_info(int line_ter
 		if (!tab || tab - ptr < 41)
 			goto bad_line;

-		if (tab[-2] == ' ' && '1' <= tab[-1] && tab[-1] <= '3') {
+		if (tab[-2] == ' ' && '0' <= tab[-1] && tab[-1] <= '3') {
 			stage = tab[-1] - '0';
 			ptr = tab + 1; /* point at the head of path */
 			tab = tab - 2; /* point at tail of sha1 */

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-01-31 23:45   ` Linus Torvalds
@ 2006-02-01  0:50     ` Petr Baudis
  2006-02-01  1:04       ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: Petr Baudis @ 2006-02-01  0:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Peter Eriksen, Git Mailing List

Dear diary, on Wed, Feb 01, 2006 at 12:45:27AM CET, I got a letter
where Linus Torvalds <torvalds@osdl.org> said that...
> It would be interesting to see how big the "resolve 850 files" part is vs 
> the "check out 10k+ files" is.

See my other mail.

> In particular, if the "resolve 850 files" is a noticeable portion of it, 
> then the right thing to do may be to just re-write git-merge-one-file.sh 
> in C. Right now, almost _all_ of the expense of that thing is just the 
> shell interpreter startup. The actual actions it does are usually fairly 
> cheap.

Nope.

xpasky@machine[0:0]~/linux-2.6.git$ echo -e '#!/bin/sh\n/bin/true' >r && chmod a+x r
xpasky@machine[0:0]~/linux-2.6.git$ time git-merge-index -o ./r -a

real    0m3.827s
user    0m1.788s
sys     0m2.004s
xpasky@machine[0:0]~/linux-2.6.git$ time git-merge-index -o ~/git-pb/git-merge-one-file -a
[lots of "Removing"]

real    1m21.773s
user    0m30.806s
sys     0m13.248s

The costs are apparently in git-update-index, not in the shell.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
Of the 3 great composers Mozart tells us what it's like to be human,
Beethoven tells us what it's like to be Beethoven and Bach tells us
what it's like to be the universe.  -- Douglas Adams

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Bottlenecks in git merge
  2006-02-01  0:50     ` Petr Baudis
@ 2006-02-01  1:04       ` Linus Torvalds
  0 siblings, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2006-02-01  1:04 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Junio C Hamano, Peter Eriksen, Git Mailing List

On Wed, 1 Feb 2006, Petr Baudis wrote:
> 
> xpasky@machine[0:0]~/linux-2.6.git$ echo -e '#!/bin/sh\n/bin/true' >r && chmod a+x r
> xpasky@machine[0:0]~/linux-2.6.git$ time git-merge-index -o ./r -a
> 
> real    0m3.827s
> user    0m1.788s
> sys     0m2.004s
> xpasky@machine[0:0]~/linux-2.6.git$ time git-merge-index -o ~/git-pb/git-merge-one-file -a
> [lots of "Removing"]
> 
> real    1m21.773s
> user    0m30.806s
> sys     0m13.248s
> 
> The costs are apparently in git-update-index, not in the shell.

Btw, this is where "oprofile" really shines. You can get exact listings of 
which symbols in which programs are taking up CPU-time, even when the time 
is spent in hundreds of different executions of a program.

Me, I'm too lazy to use it often, but not too lazy to point others towards 
it.

You do need to have kernel support for it compiled in, and it works better 
on certain CPU's (that have supported counters) than on others.

		Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] read-tree --aggressive
  2006-01-31 21:33 Bottlenecks in git merge Peter Eriksen
  2006-01-31 23:06 ` Junio C Hamano
  2006-01-31 23:27 ` Petr Baudis
@ 2006-02-04  7:31 ` Junio C Hamano
  2006-02-04 11:52   ` Peter Eriksen
  2006-02-04 12:56   ` Junio C Hamano
  2 siblings, 2 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-02-04  7:31 UTC (permalink / raw)
  To: Peter Eriksen; +Cc: Git Mailing List, Linus Torvalds, Fredrik Kuivinen

"Peter Eriksen" <s022018@student.dtu.dk> writes:

> In connection with Ian Molton's question about merge have I played a
> little with 'git merge' on the kernel sources.  What I find is that a
> merge can take quite some time, but I'm not sure where that time exactly
> goes to.  Here are the times I got:
>
> Recursive (default):  4m22.282s
> Resolve (-s resolve): 3m23.548s

In your sample script, you do not disable the post-merge diff,
which is typically one of the most expensive part in the whole
merge, and I am wondering how fast a machine you are using to
get 4 minutes.  The post-merge diff is generated by piping the
output of 'diff-tree -M' to 'apply --stat --summary', and that
step alone takes about 12 minutes wallclock time on my box X-<.

Since my box is not as fast as yours, I've eliminated the
post-merge diff step and tried your final merge step like this:

	$ time git merge --no-summary -s resolve \
            'Merging happily' HEAD v2.6.15 >/dev/null

and got this:

        real	2m15.737s
        user	1m43.320s
        sys	0m26.690s

With the attached patch, the most expensive part, which is the
repeated invocation of git-merge-one-file to remove many deleted
paths, is eliminated.  The result is this.

        real	0m20.311s
        user	0m15.780s
        sys	0m4.150s

This patch would not help recursive strategy, though.  Calling
read-tree with --aggressive flag essentially disables the
benefit we would expect to get from it -- rename detection.

-- >8 --
A new flag --aggressive resolves what we traditionally resolved
with external git-merge-one-file inside index while read-tree
3-way merge works.

git-merge-octopus and git-merge-resolve use this flag before
running git-merge-index with git-merge-one-file.

Signed-off-by: Junio C Hamano <junkio@cox.net>

---

 git-merge-octopus.sh |    2 +-
 git-merge-resolve.sh |    2 +-
 read-tree.c          |   32 ++++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

2a4bb6bc618bdad6529d9ffe361bc8b7dd28a56c
diff --git a/git-merge-octopus.sh b/git-merge-octopus.sh
index eb74f96..eb3f473 100755
--- a/git-merge-octopus.sh
+++ b/git-merge-octopus.sh
@@ -90,7 +90,7 @@ do
 	NON_FF_MERGE=1
 
 	echo "Trying simple merge with $SHA1"
-	git-read-tree -u -m $common $MRT $SHA1 || exit 2
+	git-read-tree -u -m --aggressive  $common $MRT $SHA1 || exit 2
 	next=$(git-write-tree 2>/dev/null)
 	if test $? -ne 0
 	then
diff --git a/git-merge-resolve.sh b/git-merge-resolve.sh
index 966e81f..0a8ef21 100755
--- a/git-merge-resolve.sh
+++ b/git-merge-resolve.sh
@@ -38,7 +38,7 @@ then
 fi
 
 git-update-index --refresh 2>/dev/null
-git-read-tree -u -m $bases $head $remotes || exit 2
+git-read-tree -u -m --aggressive $bases $head $remotes || exit 2
 echo "Trying simple merge."
 if result_tree=$(git-write-tree  2>/dev/null)
 then
diff --git a/read-tree.c b/read-tree.c
index a46c6fe..5580f15 100644
--- a/read-tree.c
+++ b/read-tree.c
@@ -15,6 +15,7 @@ static int update = 0;
 static int index_only = 0;
 static int nontrivial_merge = 0;
 static int trivial_merges_only = 0;
+static int aggressive = 0;
 
 static int head_idx = -1;
 static int merge_size = 0;
@@ -424,11 +425,14 @@ static int threeway_merge(struct cache_e
 	int df_conflict_remote = 0;
 
 	int any_anc_missing = 0;
+	int no_anc_exists = 1;
 	int i;
 
 	for (i = 1; i < head_idx; i++) {
 		if (!stages[i])
 			any_anc_missing = 1;
+		else
+			no_anc_exists = 0;
 	}
 
 	index = stages[0];
@@ -489,6 +493,29 @@ static int threeway_merge(struct cache_e
 	if (!head && !remote && any_anc_missing)
 		return 0;
 
+	/* Under the new "aggressive" rule, we resolve mostly trivial
+	 * cases that we historically had git-merge-one-file resolve.
+	 */
+	if (aggressive) {
+		int head_deleted = !head && !df_conflict_head;
+		int remote_deleted = !remote && !df_conflict_remote;
+		/*
+		 * Deleted in both.
+		 * Deleted in one and unchanged in the other.
+		 */
+		if ((head_deleted && remote_deleted) ||
+		    (head_deleted && remote && remote_match) ||
+		    (remote_deleted && head && head_match))
+			return 0;
+
+		/*
+		 * Added in both, identically.
+		 */
+		if (no_anc_exists && head && remote && same(head, remote))
+			return merged_entry(head, index);
+
+	}
+
 	/* Below are "no merge" cases, which require that the index be
 	 * up-to-date to avoid the files getting overwritten with
 	 * conflict resolution files. 
@@ -677,6 +704,11 @@ int main(int argc, char **argv)
 			continue;
 		}
 
+		if (!strcmp(arg, "--aggressive")) {
+			aggressive = 1;
+			continue;
+		}
+
 		/* "-m" stands for "merge", meaning we start in stage 1 */
 		if (!strcmp(arg, "-m")) {
 			if (stage || merge)
-- 
1.1.6.ge2129

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] read-tree --aggressive
  2006-02-04  7:31 ` [PATCH] read-tree --aggressive Junio C Hamano
@ 2006-02-04 11:52   ` Peter Eriksen
  2006-02-04 12:56   ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Peter Eriksen @ 2006-02-04 11:52 UTC (permalink / raw)
  To: Git Mailing List

On Fri, Feb 03, 2006 at 11:31:13PM -0800, Junio C Hamano wrote:
> "Peter Eriksen" <s022018@student.dtu.dk> writes:
> 
> > In connection with Ian Molton's question about merge have I played a
> > little with 'git merge' on the kernel sources.  What I find is that a
> > merge can take quite some time, but I'm not sure where that time exactly
> > goes to.  Here are the times I got:
> >
> > Recursive (default):  4m22.282s
> > Resolve (-s resolve): 3m23.548s
> 
> In your sample script, you do not disable the post-merge diff,
> which is typically one of the most expensive part in the whole
> merge, and I am wondering how fast a machine you are using to
> get 4 minutes.  The post-merge diff is generated by piping the
> output of 'diff-tree -M' to 'apply --stat --summary', and that
> step alone takes about 12 minutes wallclock time on my box X-<.
> 
> Since my box is not as fast as yours, I've eliminated the
> post-merge diff step and tried your final merge step like this:
> 
> 	$ time git merge --no-summary -s resolve \
>             'Merging happily' HEAD v2.6.15 >/dev/null
> 
> and got this:
> 
>         real	2m15.737s
>         user	1m43.320s
>         sys	0m26.690s

I got this:

real    0m51.661s
user    0m28.302s
sys     0m8.949s

> With the attached patch, the most expensive part, which is the
> repeated invocation of git-merge-one-file to remove many deleted
> paths, is eliminated.  The result is this.
> 
>         real	0m20.311s
>         user	0m15.780s
>         sys	0m4.150s

I got this:

real    0m20.221s
user    0m6.456s
sys     0m1.828s

> This patch would not help recursive strategy, though.  Calling
> read-tree with --aggressive flag essentially disables the
> benefit we would expect to get from it -- rename detection.

Aha, so now I better understand where all the time goes.  Most of the
time is spend calculating the merge summary.  After that the bottleneck
was the large amount of git-merge-one-file invocations.

With the aggressive patch applied it feels like the merge is mostly IO
bound, which might explain why we get similar running times.  I did get
one run of 12s, but that was a lucky shot I guess.  Repeated runs gave
between 15s and 21s but most where close to 20s.

Thanks,

Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] read-tree --aggressive
  2006-02-04  7:31 ` [PATCH] read-tree --aggressive Junio C Hamano
  2006-02-04 11:52   ` Peter Eriksen
@ 2006-02-04 12:56   ` Junio C Hamano
  1 sibling, 0 replies; 11+ messages in thread
From: Junio C Hamano @ 2006-02-04 12:56 UTC (permalink / raw)
  To: git; +Cc: Daniel Barkalow, Linus Torvalds, Fredrik Kuivinen

Junio C Hamano <junkio@cox.net> writes:

> This patch would not help recursive strategy, though.  Calling
> read-tree with --aggressive flag essentially disables the
> benefit we would expect to get from it -- rename detection.

I think we could fairly easily tweak this by trying at least
half of the rename detection inside read-tree.  That would make
it usable by merge-recursive as well.

Instead of doing "aggressive" in the threeway_merge function
first, we can process the stages without it in the first pass,
and run an equivalent of diff-stages -M internally between stage
#2 and stage #3, and keep the matched paths unmerged (we need to
mark these paths somehow).  After that, we can do "aggressive"
collapsing to reduce the amount of trivial merges that recursive
does not have to look at for renaming merges.

If we are ambitious, we could go further.  We could actually
move the stages around after running the rename detection diff
between stages #2 and #3, along with working tree files as
needed.  Then merge-resolve would be able to do the renaming
merge similar to merge-recursive.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2006-02-04 12:56 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-01-31 21:33 Bottlenecks in git merge Peter Eriksen
2006-01-31 23:06 ` Junio C Hamano
2006-01-31 23:35   ` Petr Baudis
2006-02-01  0:43     ` Junio C Hamano
2006-01-31 23:45   ` Linus Torvalds
2006-02-01  0:50     ` Petr Baudis
2006-02-01  1:04       ` Linus Torvalds
2006-01-31 23:27 ` Petr Baudis
2006-02-04  7:31 ` [PATCH] read-tree --aggressive Junio C Hamano
2006-02-04 11:52   ` Peter Eriksen
2006-02-04 12:56   ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).