* limiting rename detection during merge is a really bad idea
@ 2008-02-11 6:19 Steffen Prohaska
2008-02-11 7:42 ` Marco Costalba
2008-02-11 7:48 ` Jeff King
0 siblings, 2 replies; 11+ messages in thread
From: Steffen Prohaska @ 2008-02-11 6:19 UTC (permalink / raw)
To: Git Mailing List
I almost lost faith in git's merging capabilities because merge-
recursive limits rename detection to the default limit if not
configured otherwise. I tried to do a large merge and was pretty
convinced that git would handle the merge easily. But it
unexpectedly failed to detect the renames. The reason is that
merge-recursive uses the diff_rename_limit_default, which is 100,
and this was too low in my case.
After debugging all the merge and diff machinery I found out that
I just need to set diff.renamelimit=0 and everything works smoothly.
Well, it was an interesting debugging experience and I learnt a
lot about the diffcore. Nonetheless this was one of the worst
experiences I had with git and it kept me from doing more important
work.
I think that limiting rename detection during merge is a really
bad idea. Either we should set it to unlimited, or at least we
should print a BIG WARNING that rename detection is limited
during the merge. I'd propose to override diff.renamelimit
to unlimited for a merge, even if diff.renamelimit is explicitly
configured by the user. It doesn't make sense not to detect
renames during a merge.
Opinions?
Steffen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 6:19 limiting rename detection during merge is a really bad idea Steffen Prohaska
@ 2008-02-11 7:42 ` Marco Costalba
2008-02-11 7:48 ` Jeff King
1 sibling, 0 replies; 11+ messages in thread
From: Marco Costalba @ 2008-02-11 7:42 UTC (permalink / raw)
To: Steffen Prohaska; +Cc: Git Mailing List
On Feb 11, 2008 7:19 AM, Steffen Prohaska <prohaska@zib.de> wrote:
>
>
> It doesn't make sense not to detect
> renames during a merge.
>
> Opinions?
>
I agree 100%, merge is a very important and sensible task and git
should push as much as possible to avoid fails.
A failed merge is always an hassle for the developer also in simple
cases and when a successful merge is possible git should really try to
achieve that also if it takes a little time more.
So I second overriding diff.renamelimit in this case.
Marco
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 6:19 limiting rename detection during merge is a really bad idea Steffen Prohaska
2008-02-11 7:42 ` Marco Costalba
@ 2008-02-11 7:48 ` Jeff King
2008-02-11 7:55 ` Marco Costalba
2008-02-11 10:41 ` Lars Hjemli
1 sibling, 2 replies; 11+ messages in thread
From: Jeff King @ 2008-02-11 7:48 UTC (permalink / raw)
To: Steffen Prohaska; +Cc: Git Mailing List
On Mon, Feb 11, 2008 at 07:19:32AM +0100, Steffen Prohaska wrote:
> I think that limiting rename detection during merge is a really
> bad idea. Either we should set it to unlimited, or at least we
> should print a BIG WARNING that rename detection is limited
> during the merge. I'd propose to override diff.renamelimit
> to unlimited for a merge, even if diff.renamelimit is explicitly
> configured by the user. It doesn't make sense not to detect
> renames during a merge.
>
> Opinions?
The point of diff.renamelimit was that some rename detection is
literally so time-consuming that we might as well not bother starting
it. The number '100' was pulled out of Linus', er, hat. Perhaps a better
argument is that the renamelimit should be set much higher to accomplish
that goal?
It may also be that multiple rename limits are appropriate. I don't mind
waiting 30 seconds for rename detection during a particularly tricky
merge. I probably do when running 'git-log -p'.
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 7:48 ` Jeff King
@ 2008-02-11 7:55 ` Marco Costalba
2008-02-11 8:03 ` Jeff King
2008-02-11 10:41 ` Lars Hjemli
1 sibling, 1 reply; 11+ messages in thread
From: Marco Costalba @ 2008-02-11 7:55 UTC (permalink / raw)
To: Jeff King; +Cc: Steffen Prohaska, Git Mailing List
On Feb 11, 2008 8:48 AM, Jeff King <peff@peff.net> wrote:
>
> It may also be that multiple rename limits are appropriate. I don't mind
> waiting 30 seconds for rename detection during a particularly tricky
> merge. I probably do when running 'git-log -p'.
>
That's exactly my point: ignoring diff.renamelimit in case of merges.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 7:55 ` Marco Costalba
@ 2008-02-11 8:03 ` Jeff King
0 siblings, 0 replies; 11+ messages in thread
From: Jeff King @ 2008-02-11 8:03 UTC (permalink / raw)
To: Marco Costalba; +Cc: Steffen Prohaska, Git Mailing List
On Mon, Feb 11, 2008 at 08:55:07AM +0100, Marco Costalba wrote:
> On Feb 11, 2008 8:48 AM, Jeff King <peff@peff.net> wrote:
> >
> > It may also be that multiple rename limits are appropriate. I don't mind
> > waiting 30 seconds for rename detection during a particularly tricky
> > merge. I probably do when running 'git-log -p'.
> >
>
> That's exactly my point: ignoring diff.renamelimit in case of merges.
I think you missed my point. You never want to _ignore_ it. You want to
set it higher.
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 7:48 ` Jeff King
2008-02-11 7:55 ` Marco Costalba
@ 2008-02-11 10:41 ` Lars Hjemli
2008-02-11 11:08 ` Jeff King
1 sibling, 1 reply; 11+ messages in thread
From: Lars Hjemli @ 2008-02-11 10:41 UTC (permalink / raw)
To: Jeff King; +Cc: Steffen Prohaska, Git Mailing List
On Feb 11, 2008 8:48 AM, Jeff King <peff@peff.net> wrote:
> On Mon, Feb 11, 2008 at 07:19:32AM +0100, Steffen Prohaska wrote:
>
> > I think that limiting rename detection during merge is a really
> > bad idea. Either we should set it to unlimited, or at least we
> > should print a BIG WARNING that rename detection is limited
> > during the merge. I'd propose to override diff.renamelimit
> > to unlimited for a merge, even if diff.renamelimit is explicitly
> > configured by the user. It doesn't make sense not to detect
> > renames during a merge.
> >
> > Opinions?
>
> The point of diff.renamelimit was that some rename detection is
> literally so time-consuming that we might as well not bother starting
> it. The number '100' was pulled out of Linus', er, hat.
FWIW, prior to 07b45f8c merge-recursive ignored diff.renamelimit. The
effect of this was that 'git diff HEAD somebranch' could detect
renames which 'git merge somebranch' couldn't; Teaching
merge-recursive about diff.renamelimit made sense IMHO since 'git
merge' then would agree with 'git diff' regarding renames.
> It may also be that multiple rename limits are appropriate. I don't mind
> waiting 30 seconds for rename detection during a particularly tricky
> merge. I probably do when running 'git-log -p'.
Yeah, I guess we could add support for merge.renamelimit in addition
to diff.renamelimit (i.e. use diff.renamelimit if merge.renamelimit is
unspecified). And/or add the -l option of git-diff-* to
git-merge.sh/merge-recursive.c.
--
larsh
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 10:41 ` Lars Hjemli
@ 2008-02-11 11:08 ` Jeff King
2008-02-11 11:20 ` Santi Béjar
2008-02-11 11:35 ` Jeff King
0 siblings, 2 replies; 11+ messages in thread
From: Jeff King @ 2008-02-11 11:08 UTC (permalink / raw)
To: Lars Hjemli; +Cc: Steffen Prohaska, Git Mailing List
On Mon, Feb 11, 2008 at 11:41:05AM +0100, Lars Hjemli wrote:
> > It may also be that multiple rename limits are appropriate. I don't mind
> > waiting 30 seconds for rename detection during a particularly tricky
> > merge. I probably do when running 'git-log -p'.
>
> Yeah, I guess we could add support for merge.renamelimit in addition
> to diff.renamelimit (i.e. use diff.renamelimit if merge.renamelimit is
> unspecified). And/or add the -l option of git-diff-* to
> git-merge.sh/merge-recursive.c.
Perhaps we should first figure out what reasonable default values are. I
think the reported problem was not "Oh, I didn't expect my tweaked
diff.renamelimit to affect merge" but rather "I didn't tweak
diff.renamelimit at all".
The mega-commit I was playing with that caused Linus to suggest
diff.renamelimit in the first place is 374 by 641 (src by dest) and
completes in ~15 minutes. The case recently reported in "git-revert is a
memory hog" is 3541 by 8043, and doesn't complete ever. We limit to 100
by 100 by default.
Steffen, can you tell us how large your rename set is (unfortunately,
there is no way to get this information easily; you can stop
merge-recursive in the debugger at diffcore-rename.c:457 and look at
num_src and num_create). And how long it takes to run with
diff.renamelimit turned off?
That might give us a clue where a reasonable value is.
-Peff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 11:08 ` Jeff King
@ 2008-02-11 11:20 ` Santi Béjar
2008-02-11 11:40 ` Jeff King
2008-02-11 13:29 ` Steffen Prohaska
2008-02-11 11:35 ` Jeff King
1 sibling, 2 replies; 11+ messages in thread
From: Santi Béjar @ 2008-02-11 11:20 UTC (permalink / raw)
To: Jeff King; +Cc: Lars Hjemli, Steffen Prohaska, Git Mailing List
On Feb 11, 2008 12:08 PM, Jeff King <peff@peff.net> wrote:
> On Mon, Feb 11, 2008 at 11:41:05AM +0100, Lars Hjemli wrote:
>
> > > It may also be that multiple rename limits are appropriate. I don't mind
> > > waiting 30 seconds for rename detection during a particularly tricky
> > > merge. I probably do when running 'git-log -p'.
> >
> > Yeah, I guess we could add support for merge.renamelimit in addition
> > to diff.renamelimit (i.e. use diff.renamelimit if merge.renamelimit is
> > unspecified). And/or add the -l option of git-diff-* to
> > git-merge.sh/merge-recursive.c.
>
> Perhaps we should first figure out what reasonable default values are. I
> think the reported problem was not "Oh, I didn't expect my tweaked
> diff.renamelimit to affect merge" but rather "I didn't tweak
> diff.renamelimit at all".
>
> The mega-commit I was playing with that caused Linus to suggest
> diff.renamelimit in the first place is 374 by 641 (src by dest) and
> completes in ~15 minutes. The case recently reported in "git-revert is a
> memory hog" is 3541 by 8043, and doesn't complete ever. We limit to 100
> by 100 by default.
>
> Steffen, can you tell us how large your rename set is (unfortunately,
> there is no way to get this information easily; you can stop
> merge-recursive in the debugger at diffcore-rename.c:457 and look at
> num_src and num_create). And how long it takes to run with
> diff.renamelimit turned off?
>
> That might give us a clue where a reasonable value is.
Additionally git could warn if the limit is reached in the merge.
Santi
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 11:08 ` Jeff King
2008-02-11 11:20 ` Santi Béjar
@ 2008-02-11 11:35 ` Jeff King
1 sibling, 0 replies; 11+ messages in thread
From: Jeff King @ 2008-02-11 11:35 UTC (permalink / raw)
To: Lars Hjemli; +Cc: Steffen Prohaska, Git Mailing List
On Mon, Feb 11, 2008 at 06:08:16AM -0500, Jeff King wrote:
> The mega-commit I was playing with that caused Linus to suggest
> diff.renamelimit in the first place is 374 by 641 (src by dest) and
> completes in ~15 minutes. The case recently reported in "git-revert is a
> memory hog" is 3541 by 8043, and doesn't complete ever. We limit to 100
> by 100 by default.
I tried putting together a simple test script to see how much processing
time was taken. It basically does rename detection on an NxN matrix.
Each file is about 21K (the average size of a file in the linux-2.6
repository).
The results are what you would expect from an O(n^2) algorithm:
N CPU seconds
10 0.43
100 0.44
200 1.40
400 4.87
800 18.08
1000 27.82
So for average repositories, we could probably bump the default rename
limit by a factor of 10 for merges (though I think I would keep it under
400 or so for "git log"). Note that this conflicts with the
"mega-commit" I mentioned above; that repository has a much larger
average file size (around 1M). But I think it makes sense to set the
default based on more common repositories.
An alternative would be to set an alarm and just give up on rename
detection after N seconds (which is really what the user wants anyway).
My test script is below (it references the file 'sample', which is
actually a copy of arch/m68/Kconfig, which just happens to have a
size close to the repository mean).
-Peff
-- >8 --
#!/bin/sh
n=$1; shift
rm -rf repo
mkdir repo && cd repo
git init
mkdata() {
mkdir $1
for i in `seq 1 $2`; do
(sed "s/^/$i /" <../sample
echo tag: $1
) >$1/$i
done
}
mkdata initial $n
git add .
git commit -m initial
mkdata new $n
git add .
rm -rf initial
git commit -a -m new
time git-diff-tree -M -l0 --summary HEAD^ HEAD
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 11:20 ` Santi Béjar
@ 2008-02-11 11:40 ` Jeff King
2008-02-11 13:29 ` Steffen Prohaska
1 sibling, 0 replies; 11+ messages in thread
From: Jeff King @ 2008-02-11 11:40 UTC (permalink / raw)
To: Santi Béjar; +Cc: Lars Hjemli, Steffen Prohaska, Git Mailing List
On Mon, Feb 11, 2008 at 12:20:37PM +0100, Santi Béjar wrote:
> Additionally git could warn if the limit is reached in the merge.
I think that is reasonable. It feels a little wrong, though, to be
spewing user messages from such a deep library-ish function (and those
messages will probably be lost for commands with a pager). But something
like this?
---
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 3d37725..31941bc 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -468,10 +468,11 @@ void diffcore_rename(struct diff_options *options)
*/
if (rename_limit <= 0 || rename_limit > 32767)
rename_limit = 32767;
- if (num_create > rename_limit && num_src > rename_limit)
- goto cleanup;
- if (num_create * num_src > rename_limit * rename_limit)
+ if ((num_create > rename_limit && num_src > rename_limit) ||
+ (num_create * num_src > rename_limit * rename_limit)) {
+ warning("too many files, skipping inexact rename detection");
goto cleanup;
+ }
mx = xmalloc(sizeof(*mx) * num_create * num_src);
for (dst_cnt = i = 0; i < rename_dst_nr; i++) {
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: limiting rename detection during merge is a really bad idea
2008-02-11 11:20 ` Santi Béjar
2008-02-11 11:40 ` Jeff King
@ 2008-02-11 13:29 ` Steffen Prohaska
1 sibling, 0 replies; 11+ messages in thread
From: Steffen Prohaska @ 2008-02-11 13:29 UTC (permalink / raw)
To: Santi Béjar, Jeff King; +Cc: Lars Hjemli, Git Mailing List
On Feb 11, 2008, at 12:20 PM, Santi Béjar wrote:
>> Steffen, can you tell us how large your rename set is (unfortunately,
>> there is no way to get this information easily; you can stop
>> merge-recursive in the debugger at diffcore-rename.c:457 and look at
>> num_src and num_create). And how long it takes to run with
>> diff.renamelimit turned off?
I don't know the size. I can only say that it took a few
seconds on a MacBook Pro.
>> That might give us a clue where a reasonable value is.
>
> Additionally git could warn if the limit is reached in the merge.
I think this would have saved me. I did not know about the
rename limit. Therefore I did not understand why the merge
failed. If git merge had reported that the rename detection
limit was reached, I probably would have looked up what this
limit is about. At least git should tell when rename detection
gets disabled. Users expect that renames work and git should tell
the user when it does not even try to detect renames.
Personally, I can easily switch to a larger machine if this
helps completing a merge and I think I would do this. But
I also know that I shouldn't expect too much from a larger
machine for an O(n^2) algorithm.
Steffen
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-02-11 13:29 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-02-11 6:19 limiting rename detection during merge is a really bad idea Steffen Prohaska
2008-02-11 7:42 ` Marco Costalba
2008-02-11 7:48 ` Jeff King
2008-02-11 7:55 ` Marco Costalba
2008-02-11 8:03 ` Jeff King
2008-02-11 10:41 ` Lars Hjemli
2008-02-11 11:08 ` Jeff King
2008-02-11 11:20 ` Santi Béjar
2008-02-11 11:40 ` Jeff King
2008-02-11 13:29 ` Steffen Prohaska
2008-02-11 11:35 ` Jeff King
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).