* [PATCH] Additional merge-base tests @ 2006-07-04 3:55 A Large Angry SCM 2006-07-04 5:47 ` Junio C Hamano 0 siblings, 1 reply; 26+ messages in thread From: A Large Angry SCM @ 2006-07-04 3:55 UTC (permalink / raw) To: Junio C Hamano, git Signed-off-by: A Large Angry SCM <gitzilla@gmail.com> --- This demonstrates a problem with git-merge-base. t6010-merge-base.sh | 33 +++++++++++++++++++++++++++++++++ 1 files changed, 33 insertions(+) diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh index 1dce123..9a815bd 100755 --- a/t/t6010-merge-base.sh +++ b/t/t6010-merge-base.sh @@ -44,6 +44,31 @@ A=$(doit 1 A $B) G=$(doit 7 G $B $E) H=$(doit 8 H $A $F) +# Setup for second test set +# +# PL PR +# / \/ \ +# L2 C2 R2 +# | | | +# L1 C1 R1 +# | | | +# L0 C0 R0 +# \ | / +# S + +S=$(doit 0 S) +C0=$(doit -3 C0 $S) +L0=$(doit 2 L0 $S) +R0=$(doit 2 R0 $S) +C1=$(doit -2 C1 $C0) +L1=$(doit 3 L1 $L0) +R1=$(doit 3 R1 $R0) +C2=$(doit -1 C2 $C1) +L2=$(doit 4 L2 $L1) +R2=$(doit 4 R2 $R1) +PL=$(doit 1 PL $L2 $C2) +PR=$(doit 1 PR $C2 $R2) + test_expect_success 'compute merge-base (single)' \ 'MB=$(git-merge-base G H) && expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/B"' @@ -56,4 +81,12 @@ test_expect_success 'compute merge-base 'MB=$(git-show-branch --merge-base G H) && expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/B"' +test_expect_success 'compute merge-base (single)' \ + 'MB=$(git-merge-base PL PR) && + expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"' + +test_expect_success 'compute merge-base (all)' \ + 'MB=$(git-merge-base --all PL PR) && + expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"' + test_done ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 3:55 [PATCH] Additional merge-base tests A Large Angry SCM @ 2006-07-04 5:47 ` Junio C Hamano 2006-07-04 6:41 ` A Large Angry SCM 0 siblings, 1 reply; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 5:47 UTC (permalink / raw) To: gitzilla; +Cc: git, torvalds A Large Angry SCM <gitzilla@gmail.com> writes: > This demonstrates a problem with git-merge-base. > > +# Setup for second test set > +# > +# PL PR > +# / \/ \ > +# L2 C2 R2 > +# | | | > +# L1 C1 R1 > +# | | | > +# L0 C0 R0 > +# \ | / > +# S Cute. This is a good demonstration that merge-base may not give you minimal set for pathological cases. If you want to be through you could traverse everything to make sure we do not say 'S' is relevant, but that is quite expensive, so I think there will always be artifacts of horizon effect like this no matter how you try to catch it (didn't I keep saying that already?). However, I do not think it is really a "problem". At least what "merge-base --all" did not miss any, that should be OK. I think the practical way to proceed is to say that the test condition should really check that we do not _omit_ C2 in the merge-base --all output. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 5:47 ` Junio C Hamano @ 2006-07-04 6:41 ` A Large Angry SCM 2006-07-04 7:17 ` Junio C Hamano 2006-07-04 7:45 ` Junio C Hamano 0 siblings, 2 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-04 6:41 UTC (permalink / raw) To: Junio C Hamano, git; +Cc: torvalds Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: > >> This demonstrates a problem with git-merge-base. >> >> +# Setup for second test set >> +# >> +# PL PR >> +# / \/ \ >> +# L2 C2 R2 >> +# | | | >> +# L1 C1 R1 >> +# | | | >> +# L0 C0 R0 >> +# \ | / >> +# S > > Cute. > > This is a good demonstration that merge-base may not give you > minimal set for pathological cases. If you want to be through > you could traverse everything to make sure we do not say 'S' is > relevant, but that is quite expensive, so I think there will > always be artifacts of horizon effect like this no matter how > you try to catch it (didn't I keep saying that already?). Not _that_ pathological in practice, given that you can't really depend on the timestamps in a distributed SCM. The problem is in mark_reachable_commits(); it is either superfluous or it needs to parse_commit() those commits that haven't been parsed yet that it needs to traverse. > However, I do not think it is really a "problem". At least what > "merge-base --all" did not miss any, that should be OK. The degree of the problem is, admittedly, situational. > I think the practical way to proceed is to say that the test > condition should really check that we do not _omit_ C2 in the > merge-base --all output. I do not believe that the (current) code will miss any bases but it can certainly return bases that are reachable from other bases. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 6:41 ` A Large Angry SCM @ 2006-07-04 7:17 ` Junio C Hamano 2006-07-04 7:45 ` Junio C Hamano 1 sibling, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 7:17 UTC (permalink / raw) To: gitzilla; +Cc: git, torvalds A Large Angry SCM <gitzilla@gmail.com> writes: > Junio C Hamano wrote: >> A Large Angry SCM <gitzilla@gmail.com> writes: >> >>> This demonstrates a problem with git-merge-base. >>> +# Setup for second test set >>> +# >>> +# PL PR >>> +# / \/ \ >>> +# L2 C2 R2 >>> +# | | | >>> +# L1 C1 R1 >>> +# | | | >>> +# L0 C0 R0 >>> +# \ | / >>> +# S >>... > Not _that_ pathological in practice, given that you can't really > depend on the timestamps in a distributed SCM. After I looked at the timestamps you assigned to these sequences I fully agree they are not pathological at all. For each "repository owner" who makes a single strand of pearls, time seems to be flowing monotonically. 1 1 / \/ \ 4 -1 4 | | | 3 -2 3 | | | 2 -3 2 \ | / 0 ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 6:41 ` A Large Angry SCM 2006-07-04 7:17 ` Junio C Hamano @ 2006-07-04 7:45 ` Junio C Hamano 2006-07-04 8:23 ` Johannes Schindelin ` (2 more replies) 1 sibling, 3 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 7:45 UTC (permalink / raw) To: gitzilla; +Cc: git A Large Angry SCM <gitzilla@gmail.com> writes: >> This is a good demonstration that merge-base may not give you >> minimal set for pathological cases. If you want to be through >> you could traverse everything to make sure we do not say 'S' is >> relevant, but that is quite expensive, so I think there will >> always be artifacts of horizon effect like this no matter how >> you try to catch it (didn't I keep saying that already?). > > The problem is in mark_reachable_commits(); it is either superfluous > or it needs to parse_commit() those commits that haven't been parsed > yet that it needs to traverse. Yes, you could traverse everything. But that is not practical. We have known that the clean-up pass has this horizon effect, and it is a compromise. If you apply this testing patch on top of yours, you will see that parsing more commits at that point makes the clean-up pass go all the way down to the root commit. We may alternatively not use the clean-up pass at all, but I suspect that might give us many false positives. I don't remember the details but I think we added it while fixing merge-base in the real life situation. It may be interesting to run tests on real merges (I believe the kernel repository has a handful merges that have more than one merge bases) to see how effective the current clean-up pass is. It may turn out to be ineffective in practice, in which case we could kill it off. diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh index 9a815bd..4c6ed5c 100755 --- a/t/t6010-merge-base.sh +++ b/t/t6010-merge-base.sh @@ -89,4 +89,33 @@ test_expect_success 'compute merge-base 'MB=$(git-merge-base --all PL PR) && expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"' +# Setup third set +# +# S-U0-U1-U2-U3-U4 +# \ X +# D0-D1-D2-D3-D4 + +U0=$(doit 1 U0 $S) +D0=$(doit 1 D0 $S) +U1=$(doit 2 U1 $U0) +D1=$(doit 2 D1 $D0) +U2=$(doit 3 U2 $U1) +D2=$(doit 3 D2 $D1) +U3=$(doit 4 U3 $U2) +D3=$(doit 4 D3 $D2) +U4=$(doit 5 U4 $U3 $D3) +D4=$(doit 5 D4 $D3 $U3) + +test_expect_success 'compute merge-base' ' + + git merge-base --all U4 D4 >out 2>err + if grep tags/S err + then + echo "went all the way down to S -- very unhappy" + false + else + echo "stopped before going too far" + fi +' + test_done diff --git a/merge-base.c b/merge-base.c index 4856ca0..daab296 100644 --- a/merge-base.c +++ b/merge-base.c @@ -6,8 +6,35 @@ #define PARENT1 1 #define PARENT2 2 #define UNINTERESTING 4 +static void debug_list(struct commit_list *l, const char *msg) +{ + fprintf(stderr, "%s\n", msg); + while (l) { + char buf[1024]; + int parsed; + struct commit *commit = l->item; + l = l->next; + parsed = commit->object.parsed; + + if (parsed) { + pretty_print_commit(CMIT_FMT_ONELINE, commit, + ~0UL, buf, sizeof(buf), 7, + NULL, NULL); + } + else { + sprintf(buf, "git-name-rev %s 1>&2", + sha1_to_hex(commit->object.sha1)); + system(buf); + strcpy(buf, sha1_to_hex(commit->object.sha1)); + } + fprintf(stderr, "%d %d %s\n", commit->object.flags, + parsed, buf); + } +} + static struct commit *interesting(struct commit_list *list) { + debug_list(list, "in interesting()"); while (list) { struct commit *commit = list->item; list = list->next; @@ -134,6 +161,9 @@ static void mark_reachable_commits(struc /* * Postprocess to fully contaminate the well. */ + debug_list(list, "list at top of mark-reachable"); + debug_list(result, "result at top of mark-reachable"); + for (tmp = result; tmp; tmp = tmp->next) { struct commit *c = tmp->item; /* Reinject uninteresting ones to list, @@ -142,10 +172,12 @@ static void mark_reachable_commits(struc if (c->object.flags & UNINTERESTING) commit_list_insert(c, &list); } + while (list) { struct commit *c = list->item; struct commit_list *parents; + debug_list(list, "list in mark-reachable postprocessing"); tmp = list; list = list->next; free(tmp); @@ -155,6 +187,15 @@ static void mark_reachable_commits(struc * parse new ones (we already parsed all the relevant * ones). */ + + /* Parsing object here which is a disaster; + * let's demonstrate it. + */ +#if 1 + if (!c->object.parsed) + parse_commit(c); +#endif + parents = c->parents; while (parents) { struct commit *p = parents->item; @@ -164,6 +205,7 @@ static void mark_reachable_commits(struc commit_list_insert(p, &list); } } + debug_list(result, "result in mark-reachable postprocessing"); } } @@ -196,6 +238,7 @@ static int merge_base(struct commit *rev free(tmp); if (flags == 3) { insert_by_date(commit, &result); + debug_list(result, "a new result"); /* Mark parents of a found merge uninteresting */ flags |= UNINTERESTING; @@ -218,6 +261,7 @@ static int merge_base(struct commit *rev if (result->next && list) mark_reachable_commits(result, list); + debug_list(result, "final result"); while (result) { struct commit *commit = result->item; result = result->next; ^ permalink raw reply related [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 7:45 ` Junio C Hamano @ 2006-07-04 8:23 ` Johannes Schindelin 2006-07-04 10:38 ` Junio C Hamano ` (2 more replies) 2006-07-04 20:08 ` A Large Angry SCM 2006-07-05 0:59 ` Junio C Hamano 2 siblings, 3 replies; 26+ messages in thread From: Johannes Schindelin @ 2006-07-04 8:23 UTC (permalink / raw) To: Junio C Hamano; +Cc: gitzilla, git Hi, On Tue, 4 Jul 2006, Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: > > >> This is a good demonstration that merge-base may not give you > >> minimal set for pathological cases. If you want to be through > >> you could traverse everything to make sure we do not say 'S' is > >> relevant, but that is quite expensive, so I think there will > >> always be artifacts of horizon effect like this no matter how > >> you try to catch it (didn't I keep saying that already?). > > > > The problem is in mark_reachable_commits(); it is either superfluous > > or it needs to parse_commit() those commits that haven't been parsed > > yet that it needs to traverse. > > Yes, you could traverse everything. But that is not practical. > We have known that the clean-up pass has this horizon effect, > and it is a compromise. We could introduce a time.maximumSkew variable, and just walk only that much further when traversing the commits. So, if you do not trust your clients to have a proper ntp setup, just say "I trust my peers to be off at most 1 day". That would save lots vs traverse-everything. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 8:23 ` Johannes Schindelin @ 2006-07-04 10:38 ` Junio C Hamano 2006-07-04 11:16 ` Jakub Narebski 2006-07-04 11:40 ` Johannes Schindelin 2006-07-04 20:18 ` A Large Angry SCM 2006-07-04 21:15 ` Junio C Hamano 2 siblings, 2 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 10:38 UTC (permalink / raw) To: Johannes Schindelin; +Cc: gitzilla, git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > We could introduce a time.maximumSkew variable, and just walk only > that much further when traversing the commits. > > So, if you do not trust your clients to have a proper ntp setup, just say > "I trust my peers to be off at most 1 day". That would save lots vs > traverse-everything. The problem ALASCM's example demonstrates does rely on clock skews. The timestamps used in the example looked like this: 1 1 / \/ \ 4 -1 4 | | | 3 -2 3 | | | 2 -3 2 \ | / 0 The crucial clock skew the case relies on is that the tip of the middle branch (-1) is older than the common commit (0). But the topmost commits with timestamp 1 could be with timestamp 5 to correct the clock skew and still make the example "fail". 5 5 / \/ \ 4 -1 4 | | | 3 -2 3 | | | 2 -3 2 \ | / 0 However, I am not sure how you are going to use that maximumSkew variable. The evil owner of the middle branch may have started running a "git am" to commit 4-patch series just when the machine's clock jumped back by 3 seconds, at the pace of 1 patch a second. Then he pushes '0' out on "master" branch, and the three commits on top of that on "next" branch. Two days later, two friends build left and right strands of pearls based on the "master" branch of the evil owner of the middle branch. Maybe they do that one patch a day. On the fifth day, they both merge the "next" branch. The point is that it does not require a very large clock skew to trigger this. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 10:38 ` Junio C Hamano @ 2006-07-04 11:16 ` Jakub Narebski 2006-07-04 11:35 ` Johannes Schindelin 2006-07-04 11:40 ` Johannes Schindelin 1 sibling, 1 reply; 26+ messages in thread From: Jakub Narebski @ 2006-07-04 11:16 UTC (permalink / raw) To: git Junio C Hamano wrote: > The problem ALASCM's example demonstrates does rely on clock > skews. The timestamps used in the example looked like this: > > > 1 1 > / \/ \ > 4 -1 4 > | | | > 3 -2 3 > | | | > 2 -3 2 > \ | / > 0 > > The crucial clock skew the case relies on is that the tip of the > middle branch (-1) is older than the common commit (0). But the > topmost commits with timestamp 1 could be with timestamp 5 to > correct the clock skew and still make the example "fail". > > 5 5 > / \/ \ > 4 -1 4 > | | | > 3 -2 3 > | | | > 2 -3 2 > \ | / > 0 So would putting timestamp for merge be MAX(now, parents timestamps) solve the problem? -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 11:16 ` Jakub Narebski @ 2006-07-04 11:35 ` Johannes Schindelin 0 siblings, 0 replies; 26+ messages in thread From: Johannes Schindelin @ 2006-07-04 11:35 UTC (permalink / raw) To: Jakub Narebski; +Cc: git Hi, On Tue, 4 Jul 2006, Jakub Narebski wrote: > Junio C Hamano wrote: > > > > The problem ALASCM's example demonstrates does rely on clock > > skews. The timestamps used in the example looked like this: > > > > > > 1 1 > > / \/ \ > > 4 -1 4 > > | | | > > 3 -2 3 > > | | | > > 2 -3 2 > > \ | / > > 0 > > > > The crucial clock skew the case relies on is that the tip of the > > middle branch (-1) is older than the common commit (0). But the > > topmost commits with timestamp 1 could be with timestamp 5 to > > correct the clock skew and still make the example "fail". > > > > 5 5 > > / \/ \ > > 4 -1 4 > > | | | > > 3 -2 3 > > | | | > > 2 -3 2 > > \ | / > > 0 > > So would putting timestamp for merge be MAX(now, parents timestamps) > solve the problem? If there is an evil committer, the parents could have bogus timestamps, too. But then, I would not pull from such an evil person... Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 10:38 ` Junio C Hamano 2006-07-04 11:16 ` Jakub Narebski @ 2006-07-04 11:40 ` Johannes Schindelin 1 sibling, 0 replies; 26+ messages in thread From: Johannes Schindelin @ 2006-07-04 11:40 UTC (permalink / raw) To: Junio C Hamano; +Cc: gitzilla, git Hi, On Tue, 4 Jul 2006, Junio C Hamano wrote: > However, I am not sure how you are going to use that maximumSkew > variable. My idea was to continue traversing the merge base's ancestors, marking them UNINTERESTING, until hitting a commit which is maximumSkew older than the merge base (and not just stop at the merge base, as is the case right now, and neither continue traversing in eternity like suggested). This would not help _evil_ cases (i.e. intentional), but most certainly your regular clock skew in a Microsoft network. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 8:23 ` Johannes Schindelin 2006-07-04 10:38 ` Junio C Hamano @ 2006-07-04 20:18 ` A Large Angry SCM 2006-07-04 21:15 ` Junio C Hamano 2 siblings, 0 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-04 20:18 UTC (permalink / raw) To: Johannes Schindelin, git; +Cc: Junio C Hamano Johannes Schindelin wrote: > Hi, > > On Tue, 4 Jul 2006, Junio C Hamano wrote: > >> A Large Angry SCM <gitzilla@gmail.com> writes: >> >>>> This is a good demonstration that merge-base may not give you >>>> minimal set for pathological cases. If you want to be through >>>> you could traverse everything to make sure we do not say 'S' is >>>> relevant, but that is quite expensive, so I think there will >>>> always be artifacts of horizon effect like this no matter how >>>> you try to catch it (didn't I keep saying that already?). >>> The problem is in mark_reachable_commits(); it is either superfluous >>> or it needs to parse_commit() those commits that haven't been parsed >>> yet that it needs to traverse. >> Yes, you could traverse everything. But that is not practical. >> We have known that the clean-up pass has this horizon effect, >> and it is a compromise. > > We could introduce a time.maximumSkew variable, and just walk only > that much further when traversing the commits. > > So, if you do not trust your clients to have a proper ntp setup, just say > "I trust my peers to be off at most 1 day". That would save lots vs > traverse-everything. The fuzz would only serve to mask, even more, that the heuristic is broken. But, it would also allow the (broken) heuristic to be used _and_ let the user decide how much effort may be used to find the correct bases. If this happens, it should be (yet another) user configurable; either, per repository, command line, or both. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 8:23 ` Johannes Schindelin 2006-07-04 10:38 ` Junio C Hamano 2006-07-04 20:18 ` A Large Angry SCM @ 2006-07-04 21:15 ` Junio C Hamano 2006-07-04 22:25 ` Johannes Schindelin 2006-07-05 8:39 ` Josef Weidendorfer 2 siblings, 2 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 21:15 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > We could introduce a time.maximumSkew variable, and just walk only > that much further when traversing the commits. We could have had "commit generation number" in the commit object header, and use that instead of commit timestamps for these traversal purposes. The generation number for a commit is defined to be max(generation number of its parents)+1 and we prime the recursive this definition by defining the generation number for the root commit to be one. A moral equivalent alternative would be to notice that the commit timestamp we are going to use to create for a new commit is smaller than one of the commit timestamps of its parent commits and adjust the commit timestamp in such a case by git-commit-tree, perhaps with a warning. These are pretty much water under the bridge by now, for two reasons. One is that I think it is better to make the tools that use get_merge_bases() prepared for the case the function includes suboptimal bases anyway, and the other is once we do that this is not a strong enough justification to modify the commit object format. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 21:15 ` Junio C Hamano @ 2006-07-04 22:25 ` Johannes Schindelin 2006-07-04 22:42 ` Junio C Hamano 2006-07-04 23:07 ` A Large Angry SCM 2006-07-05 8:39 ` Josef Weidendorfer 1 sibling, 2 replies; 26+ messages in thread From: Johannes Schindelin @ 2006-07-04 22:25 UTC (permalink / raw) To: Junio C Hamano; +Cc: git Hi, On Tue, 4 Jul 2006, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > We could introduce a time.maximumSkew variable, and just walk only > > that much further when traversing the commits. > > We could have had "commit generation number" in the commit > object header, and use that instead of commit timestamps for > these traversal purposes. The generation number for a commit is > defined to be max(generation number of its parents)+1 and we > prime the recursive this definition by defining the generation > number for the root commit to be one. Are you really, really sure this is a remedy? I, for one, am quite sure of the opposite. What you propose is just another time scale, only this time, it is not universally true (not even minus local incompetence to keep the clock accurate). If that should be not true, you always could rely on topo order. Which does not seem to solve the problem for you. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 22:25 ` Johannes Schindelin @ 2006-07-04 22:42 ` Junio C Hamano 2006-07-04 23:07 ` A Large Angry SCM 1 sibling, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 22:42 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > If that should be not true, you always could rely on topo order. Which > does not seem to solve the problem for you. The computation of merge-base is about computing topo order cheaply, so that is a recursive definition of the problem, not a solution, I am afraid. With the generation counter, we know the clean-up phase needs to parse and traverse unparsed parents with the same or higher generation counter than the lowest we have in the result list, which would limit our clean-up traversal. In order to look at the generation number of parent, we would need to parse it, so we would end up parsing one level more than needed at the edge, though. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 22:25 ` Johannes Schindelin 2006-07-04 22:42 ` Junio C Hamano @ 2006-07-04 23:07 ` A Large Angry SCM 2006-07-04 23:14 ` Jakub Narebski ` (2 more replies) 1 sibling, 3 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-04 23:07 UTC (permalink / raw) To: Johannes Schindelin, git; +Cc: Junio C Hamano Johannes Schindelin wrote: > Hi, > > On Tue, 4 Jul 2006, Junio C Hamano wrote: > >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >>> We could introduce a time.maximumSkew variable, and just walk only >>> that much further when traversing the commits. >> We could have had "commit generation number" in the commit >> object header, and use that instead of commit timestamps for >> these traversal purposes. The generation number for a commit is >> defined to be max(generation number of its parents)+1 and we >> prime the recursive this definition by defining the generation >> number for the root commit to be one. > > Are you really, really sure this is a remedy? I, for one, am quite sure of > the opposite. What you propose is just another time scale, only this time, > it is not universally true (not even minus local incompetence to keep the > clock accurate). It works[*] and it does what using the timestamp was trying to do. Namely, work from "more recent" (or "closer") commits toward "older" (or "farther") commits until you've gone past the point you care about. It's a little late to be changing the structure of a commit and you'd have to deal with some size/scale issues, but it's do-able. A better idea may be to generate and keep the generation number on a per repository basis, and you'd be able to work around changing grafts. [*] Grafts do _really_ nasty things to this. Just like clock skew does now. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 23:07 ` A Large Angry SCM @ 2006-07-04 23:14 ` Jakub Narebski 2006-07-04 23:39 ` Junio C Hamano 2006-07-05 0:26 ` A Large Angry SCM 2006-07-05 0:24 ` Junio C Hamano 2006-07-05 7:56 ` Johannes Schindelin 2 siblings, 2 replies; 26+ messages in thread From: Jakub Narebski @ 2006-07-04 23:14 UTC (permalink / raw) To: git A Large Angry SCM wrote: > It works[*] and it does what using the timestamp was trying to do. > Namely, work from "more recent" (or "closer") commits toward "older" (or > "farther") commits until you've gone past the point you care about. > > It's a little late to be changing the structure of a commit and you'd > have to deal with some size/scale issues, but it's do-able. A better > idea may be to generate and keep the generation number on a per > repository basis, and you'd be able to work around changing grafts. What about timestamp = MAX(now(), timestamps of parents) idea, which doesn't need changing the structure of a commit? -- Jakub Narebski Warsaw, Poland ShadeHawk on #git ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 23:14 ` Jakub Narebski @ 2006-07-04 23:39 ` Junio C Hamano 2006-07-05 0:26 ` A Large Angry SCM 1 sibling, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-04 23:39 UTC (permalink / raw) To: jnareb; +Cc: git Jakub Narebski <jnareb@gmail.com> writes: > What about timestamp = MAX(now(), timestamps of parents) idea, which > doesn't need changing the structure of a commit? It changes the semantics without change syntax, which is not any better (and in fact I think it is worse). ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 23:14 ` Jakub Narebski 2006-07-04 23:39 ` Junio C Hamano @ 2006-07-05 0:26 ` A Large Angry SCM 1 sibling, 0 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-05 0:26 UTC (permalink / raw) To: Jakub Narebski, git Jakub Narebski wrote: > A Large Angry SCM wrote: > >> It works[*] and it does what using the timestamp was trying to do. >> Namely, work from "more recent" (or "closer") commits toward "older" (or >> "farther") commits until you've gone past the point you care about. >> >> It's a little late to be changing the structure of a commit and you'd >> have to deal with some size/scale issues, but it's do-able. A better >> idea may be to generate and keep the generation number on a per >> repository basis, and you'd be able to work around changing grafts. > > What about timestamp = MAX(now(), timestamps of parents) idea, which > doesn't need changing the structure of a commit? > So, do you really want your name as committer on a commit with a date 300 years in the future because one of the parents had a bad date? ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 23:07 ` A Large Angry SCM 2006-07-04 23:14 ` Jakub Narebski @ 2006-07-05 0:24 ` Junio C Hamano 2006-07-05 7:56 ` Johannes Schindelin 2 siblings, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-05 0:24 UTC (permalink / raw) To: gitzilla, Johannes Schindelin; +Cc: git A Large Angry SCM <gitzilla@gmail.com> writes: > > It works[*] and it does what using the timestamp was trying to > do. Namely, work from "more recent" (or "closer") commits toward > "older" (or "farther") commits until you've gone past the point you > care about. If you really really care, now we have clear_commit_marks() with get_merge_bases() infrastructure in, you _could_ run another round of get_merge_bases() on the commit on the result list to see which ones are reachable from others by performing an equivalent of fast-forward/already-up-to-date check. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 23:07 ` A Large Angry SCM 2006-07-04 23:14 ` Jakub Narebski 2006-07-05 0:24 ` Junio C Hamano @ 2006-07-05 7:56 ` Johannes Schindelin 2006-07-05 16:15 ` A Large Angry SCM 2 siblings, 1 reply; 26+ messages in thread From: Johannes Schindelin @ 2006-07-05 7:56 UTC (permalink / raw) To: A Large Angry SCM; +Cc: git, Junio C Hamano Hi, On Tue, 4 Jul 2006, A Large Angry SCM wrote: > Johannes Schindelin wrote: > > Hi, > > > > On Tue, 4 Jul 2006, Junio C Hamano wrote: > > > > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > > > > > We could introduce a time.maximumSkew variable, and just walk only that > > > > much further when traversing the commits. > > > We could have had "commit generation number" in the commit > > > object header, and use that instead of commit timestamps for > > > these traversal purposes. The generation number for a commit is > > > defined to be max(generation number of its parents)+1 and we > > > prime the recursive this definition by defining the generation > > > number for the root commit to be one. > > > > Are you really, really sure this is a remedy? I, for one, am quite sure of > > the opposite. What you propose is just another time scale, only this time, > > it is not universally true (not even minus local incompetence to keep the > > clock accurate). > > It works[*] and it does what using the timestamp was trying to do. Namely, > work from "more recent" (or "closer") commits toward "older" (or "farther") > commits until you've gone past the point you care about. > > It's a little late to be changing the structure of a commit and you'd have to > deal with some size/scale issues, but it's do-able. A better idea may be to > generate and keep the generation number on a per repository basis, and you'd > be able to work around changing grafts. Like, inside the cache? I dunno. IMHO it is way too late to change the structure of a commit in that particular manner, _plus_ you would get overflow issues. > [*] Grafts do _really_ nasty things to this. Just like clock skew does now. Grafts can do much nastier things to you, for example having a circular history. _But_ they cannot do that nasty thing outside of your repo. Clock skews can. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-05 7:56 ` Johannes Schindelin @ 2006-07-05 16:15 ` A Large Angry SCM 2006-07-05 17:04 ` Johannes Schindelin 0 siblings, 1 reply; 26+ messages in thread From: A Large Angry SCM @ 2006-07-05 16:15 UTC (permalink / raw) To: Johannes Schindelin; +Cc: git, Junio C Hamano Johannes Schindelin wrote: > Hi, > > On Tue, 4 Jul 2006, A Large Angry SCM wrote: > >> Johannes Schindelin wrote: >>> Hi, >>> >>> On Tue, 4 Jul 2006, Junio C Hamano wrote: >>> >>>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >>>> >>>>> We could introduce a time.maximumSkew variable, and just walk only that >>>>> much further when traversing the commits. >>>> We could have had "commit generation number" in the commit >>>> object header, and use that instead of commit timestamps for >>>> these traversal purposes. The generation number for a commit is >>>> defined to be max(generation number of its parents)+1 and we >>>> prime the recursive this definition by defining the generation >>>> number for the root commit to be one. >>> Are you really, really sure this is a remedy? I, for one, am quite sure of >>> the opposite. What you propose is just another time scale, only this time, >>> it is not universally true (not even minus local incompetence to keep the >>> clock accurate). >> It works[*] and it does what using the timestamp was trying to do. Namely, >> work from "more recent" (or "closer") commits toward "older" (or "farther") >> commits until you've gone past the point you care about. >> >> It's a little late to be changing the structure of a commit and you'd have to >> deal with some size/scale issues, but it's do-able. A better idea may be to >> generate and keep the generation number on a per repository basis, and you'd >> be able to work around changing grafts. > > Like, inside the cache? I dunno. IMHO it is way too late to change the > structure of a commit in that particular manner, _plus_ you would get > overflow issues. Your don't need to change the commit object, create some repository specific, local, auxiliary information. Overflow should not be a problem until a path length to a root commit exceeds the machine word size. But is it really worth the work? Does it help anything other than merge-base? >> [*] Grafts do _really_ nasty things to this. Just like clock skew does now. > > Grafts can do much nastier things to you, for example having a circular > history. _But_ they cannot do that nasty thing outside of your repo. Clock > skews can. If grafts in your repository create a cycle, the misbehavior of merge-base should be among the least of your concerns. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-05 16:15 ` A Large Angry SCM @ 2006-07-05 17:04 ` Johannes Schindelin 0 siblings, 0 replies; 26+ messages in thread From: Johannes Schindelin @ 2006-07-05 17:04 UTC (permalink / raw) To: A Large Angry SCM; +Cc: git, Junio C Hamano Hi, On Wed, 5 Jul 2006, A Large Angry SCM wrote: > If grafts in your repository create a cycle, the misbehavior of merge-base > should be among the least of your concerns. Right. BUT grafts can be very helpful to connect branches which were independently imported into git. And in these cases, the clockSkew really is no clockSkew. But in that case, the generation number would have to be recalculated also. Anyway, I think that it should be a configurable, which defaults to off, i.e. in the normal case merge-base should behave as it does now. And if we have that configurable, we might as well take the safe but dumb approach to just traverse _everything_. Ciao, Dscho ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 21:15 ` Junio C Hamano 2006-07-04 22:25 ` Johannes Schindelin @ 2006-07-05 8:39 ` Josef Weidendorfer 2006-07-05 14:28 ` A Large Angry SCM 1 sibling, 1 reply; 26+ messages in thread From: Josef Weidendorfer @ 2006-07-05 8:39 UTC (permalink / raw) To: Junio C Hamano; +Cc: Johannes Schindelin, git On Tuesday 04 July 2006 23:15, Junio C Hamano wrote: > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > > We could introduce a time.maximumSkew variable, and just walk only > > that much further when traversing the commits. > > We could have had "commit generation number" in the commit > object header, and use that instead of commit timestamps for > these traversal purposes. Isn't this "commit generation number" information that can be regenerated on the fly, i.e. a perfect fit for data to be stored in a persistant cache, e.g. in ".git/tmp/virtual-commit-timestamps"? Josef ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-05 8:39 ` Josef Weidendorfer @ 2006-07-05 14:28 ` A Large Angry SCM 0 siblings, 0 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-05 14:28 UTC (permalink / raw) To: Josef Weidendorfer, git; +Cc: Junio C Hamano, Johannes Schindelin Josef Weidendorfer wrote: > On Tuesday 04 July 2006 23:15, Junio C Hamano wrote: >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >>> We could introduce a time.maximumSkew variable, and just walk only >>> that much further when traversing the commits. >> We could have had "commit generation number" in the commit >> object header, and use that instead of commit timestamps for >> these traversal purposes. > > Isn't this "commit generation number" information that can be > regenerated on the fly, i.e. a perfect fit for data to be stored > in a persistant cache, e.g. in ".git/tmp/virtual-commit-timestamps"? Yes ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 7:45 ` Junio C Hamano 2006-07-04 8:23 ` Johannes Schindelin @ 2006-07-04 20:08 ` A Large Angry SCM 2006-07-05 0:59 ` Junio C Hamano 2 siblings, 0 replies; 26+ messages in thread From: A Large Angry SCM @ 2006-07-04 20:08 UTC (permalink / raw) To: Junio C Hamano, git Junio C Hamano wrote: > A Large Angry SCM <gitzilla@gmail.com> writes: > >>> This is a good demonstration that merge-base may not give you >>> minimal set for pathological cases. If you want to be through >>> you could traverse everything to make sure we do not say 'S' is >>> relevant, but that is quite expensive, so I think there will >>> always be artifacts of horizon effect like this no matter how >>> you try to catch it (didn't I keep saying that already?). >> The problem is in mark_reachable_commits(); it is either superfluous >> or it needs to parse_commit() those commits that haven't been parsed >> yet that it needs to traverse. > > Yes, you could traverse everything. But that is not practical. > We have known that the clean-up pass has this horizon effect, > and it is a compromise. The clean-up pass was devised to eliminate bases that are reachable from other bases. It just doesn't look hard enough. > If you apply this testing patch on top of yours, you will see > that parsing more commits at that point makes the clean-up > pass go all the way down to the root commit. Yes, I was aware of graphs that would have that behavior. The root of the problem is that the heuristic, that attempts to use timestamps to detect that a commit is _not_ reachable from a given commit, relies on the timestamps of commits with a reachability relationship to have a relationship that matches the graph. > We may alternatively not use the clean-up pass at all, but I > suspect that might give us many false positives. I don't > remember the details but I think we added it while fixing > merge-base in the real life situation. The history of the clean-up pass is that before it was added, git-merge-base was returning a base reachable from another base, and the base returned was, in some significant way, worse for merging. My construct demonstrates that the clean-up pass only deals with special case. > It may be interesting to run tests on real merges (I believe the > kernel repository has a handful merges that have more than one > merge bases) to see how effective the current clean-up pass is. > It may turn out to be ineffective in practice, in which case we > could kill it off. Although a very important set of repositories to Git, the linux kernel repositories may no longer be representative of the diversity of Git use. Still, it would be interesting to know the outcome. ^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: [PATCH] Additional merge-base tests 2006-07-04 7:45 ` Junio C Hamano 2006-07-04 8:23 ` Johannes Schindelin 2006-07-04 20:08 ` A Large Angry SCM @ 2006-07-05 0:59 ` Junio C Hamano 2 siblings, 0 replies; 26+ messages in thread From: Junio C Hamano @ 2006-07-05 0:59 UTC (permalink / raw) To: git Junio C Hamano <junkio@cox.net> writes: > It may be interesting to run tests on real merges (I believe the > kernel repository has a handful merges that have more than one > merge bases) to see how effective the current clean-up pass is. > It may turn out to be ineffective in practice, in which case we > could kill it off. So I counted. There are 23 commits in the kernel history that have more than one merge-bases. The current merge-base code tells us that all of them have two merge-bases. None of them suffers from the horizon effect; the two bases are not ancestor/descendant of each other. A good news is that get_merge_bases() gives the same answer without the clean-up pass mark_reachable_commits(). d0e5f39f1ee2e55d140064bb6d74c8bad25d71d0 361ea93cbff0e42cbc6a0f3c7a8238db9ed15648 4b2d9cf00962d0a0e697f887f3ecaa155cbde555 ba9b28d19a3251bb1dfe6a6f8cc89b96fb85f683 db21e578e551421d76641d72cb3f8296ed3f9e61 b425c8c5922562c562dc55a636c3c8d758ed6d17 2e9ff56efbc005ab2b92b68df65940c7459446c6 75e47b36004d136edff68295420424cba3a5ccd0 c45ae87ec9d03c9adfc466a6b560cb38b154813a 09e4f9029da1b53e835555c353a89c36b71233b0 0b310f36d7d96e27f6941ec0f9b95e15142f1e78 db9ace7083dbdcc3d02bdd6a1d26132c80b5b726 80c7af4074cbb4cb6be5d35c443ea6d5e8838a84 701db69d6647f61e4660c9102d7f2fd5dffc203d 5e3c2b95dd560baa41b08c8f5f00bbd6fbeebdcb c7fb577e2a6cb04732541f2dc402bd46747f7558 ba9b543d5bec0a7605952e2ba501fb8b0f3b6407 84ffa747520edd4556b136bdfc9df9eb1673ce12 da28c12089dfcfb8695b6b555cdb8e03dda2b690 3190186362466658f01b2e354e639378ce07e1a9 0c168775709faa74c1b87f1e61046e0c51ade7f3 0e396ee43e445cb7c215a98da4e76d0ce354d9d7 467ca22d3371f132ee225a5591a1ed0cd518cb3d ^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2006-07-05 17:05 UTC | newest] Thread overview: 26+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-07-04 3:55 [PATCH] Additional merge-base tests A Large Angry SCM 2006-07-04 5:47 ` Junio C Hamano 2006-07-04 6:41 ` A Large Angry SCM 2006-07-04 7:17 ` Junio C Hamano 2006-07-04 7:45 ` Junio C Hamano 2006-07-04 8:23 ` Johannes Schindelin 2006-07-04 10:38 ` Junio C Hamano 2006-07-04 11:16 ` Jakub Narebski 2006-07-04 11:35 ` Johannes Schindelin 2006-07-04 11:40 ` Johannes Schindelin 2006-07-04 20:18 ` A Large Angry SCM 2006-07-04 21:15 ` Junio C Hamano 2006-07-04 22:25 ` Johannes Schindelin 2006-07-04 22:42 ` Junio C Hamano 2006-07-04 23:07 ` A Large Angry SCM 2006-07-04 23:14 ` Jakub Narebski 2006-07-04 23:39 ` Junio C Hamano 2006-07-05 0:26 ` A Large Angry SCM 2006-07-05 0:24 ` Junio C Hamano 2006-07-05 7:56 ` Johannes Schindelin 2006-07-05 16:15 ` A Large Angry SCM 2006-07-05 17:04 ` Johannes Schindelin 2006-07-05 8:39 ` Josef Weidendorfer 2006-07-05 14:28 ` A Large Angry SCM 2006-07-04 20:08 ` A Large Angry SCM 2006-07-05 0:59 ` Junio C Hamano
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).