git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Additional merge-base tests
@ 2006-07-04  3:55 A Large Angry SCM
  2006-07-04  5:47 ` Junio C Hamano
  0 siblings, 1 reply; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-04  3:55 UTC (permalink / raw)
  To: Junio C Hamano, git

Signed-off-by: A Large Angry SCM <gitzilla@gmail.com>
---
This demonstrates a problem with git-merge-base.

 t6010-merge-base.sh |   33 +++++++++++++++++++++++++++++++++
 1 files changed, 33 insertions(+)

diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 1dce123..9a815bd 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -44,6 +44,31 @@ A=$(doit 1 A $B)
 G=$(doit 7 G $B $E)
 H=$(doit 8 H $A $F)
 
+# Setup for second test set
+#
+#   PL  PR
+#  /  \/  \
+# L2  C2  R2
+# |   |   |
+# L1  C1  R1
+# |   |   |
+# L0  C0  R0
+#   \ |  /
+#     S
+
+S=$(doit  0 S)
+C0=$(doit -3 C0 $S)
+L0=$(doit  2 L0 $S)
+R0=$(doit  2 R0 $S)
+C1=$(doit -2 C1 $C0)
+L1=$(doit  3 L1 $L0)
+R1=$(doit  3 R1 $R0)
+C2=$(doit -1 C2 $C1)
+L2=$(doit  4 L2 $L1)
+R2=$(doit  4 R2 $R1)
+PL=$(doit  1 PL $L2 $C2)
+PR=$(doit  1 PR $C2 $R2)
+
 test_expect_success 'compute merge-base (single)' \
     'MB=$(git-merge-base G H) &&
      expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/B"'
@@ -56,4 +81,12 @@ test_expect_success 'compute merge-base 
     'MB=$(git-show-branch --merge-base G H) &&
      expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/B"'
 
+test_expect_success 'compute merge-base (single)' \
+    'MB=$(git-merge-base PL PR) &&
+     expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"'
+
+test_expect_success 'compute merge-base (all)' \
+    'MB=$(git-merge-base --all PL PR) &&
+     expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"'
+
 test_done

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  3:55 [PATCH] Additional merge-base tests A Large Angry SCM
@ 2006-07-04  5:47 ` Junio C Hamano
  2006-07-04  6:41   ` A Large Angry SCM
  0 siblings, 1 reply; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04  5:47 UTC (permalink / raw)
  To: gitzilla; +Cc: git, torvalds

A Large Angry SCM <gitzilla@gmail.com> writes:

> This demonstrates a problem with git-merge-base.
>  
> +# Setup for second test set
> +#
> +#   PL  PR
> +#  /  \/  \
> +# L2  C2  R2
> +# |   |   |
> +# L1  C1  R1
> +# |   |   |
> +# L0  C0  R0
> +#   \ |  /
> +#     S

Cute.

This is a good demonstration that merge-base may not give you
minimal set for pathological cases.  If you want to be through
you could traverse everything to make sure we do not say 'S' is
relevant, but that is quite expensive, so I think there will
always be artifacts of horizon effect like this no matter how
you try to catch it (didn't I keep saying that already?).

However, I do not think it is really a "problem".  At least what
"merge-base --all" did not miss any, that should be OK.

I think the practical way to proceed is to say that the test
condition should really check that we do not _omit_ C2 in the
merge-base --all output.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  5:47 ` Junio C Hamano
@ 2006-07-04  6:41   ` A Large Angry SCM
  2006-07-04  7:17     ` Junio C Hamano
  2006-07-04  7:45     ` Junio C Hamano
  0 siblings, 2 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-04  6:41 UTC (permalink / raw)
  To: Junio C Hamano, git; +Cc: torvalds

Junio C Hamano wrote:
> A Large Angry SCM <gitzilla@gmail.com> writes:
> 
>> This demonstrates a problem with git-merge-base.
>>  
>> +# Setup for second test set
>> +#
>> +#   PL  PR
>> +#  /  \/  \
>> +# L2  C2  R2
>> +# |   |   |
>> +# L1  C1  R1
>> +# |   |   |
>> +# L0  C0  R0
>> +#   \ |  /
>> +#     S
> 
> Cute.
> 
> This is a good demonstration that merge-base may not give you
> minimal set for pathological cases.  If you want to be through
> you could traverse everything to make sure we do not say 'S' is
> relevant, but that is quite expensive, so I think there will
> always be artifacts of horizon effect like this no matter how
> you try to catch it (didn't I keep saying that already?).

Not _that_ pathological in practice, given that you can't really depend 
on the timestamps in a distributed SCM.

The problem is in mark_reachable_commits(); it is either superfluous or 
it needs to parse_commit() those commits that haven't been parsed yet 
that it needs to traverse.

> However, I do not think it is really a "problem".  At least what
> "merge-base --all" did not miss any, that should be OK.

The degree of the problem is, admittedly, situational.

> I think the practical way to proceed is to say that the test
> condition should really check that we do not _omit_ C2 in the
> merge-base --all output.

I do not believe that the (current) code will miss any bases but it can 
certainly return bases that are reachable from other bases.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  6:41   ` A Large Angry SCM
@ 2006-07-04  7:17     ` Junio C Hamano
  2006-07-04  7:45     ` Junio C Hamano
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04  7:17 UTC (permalink / raw)
  To: gitzilla; +Cc: git, torvalds

A Large Angry SCM <gitzilla@gmail.com> writes:

> Junio C Hamano wrote:
>> A Large Angry SCM <gitzilla@gmail.com> writes:
>>
>>> This demonstrates a problem with git-merge-base.
>>>  +# Setup for second test set
>>> +#
>>> +#   PL  PR
>>> +#  /  \/  \
>>> +# L2  C2  R2
>>> +# |   |   |
>>> +# L1  C1  R1
>>> +# |   |   |
>>> +# L0  C0  R0
>>> +#   \ |  /
>>> +#     S
>>...
> Not _that_ pathological in practice, given that you can't really
> depend on the timestamps in a distributed SCM.

After I looked at the timestamps you assigned to these sequences
I fully agree they are not pathological at all.  For each
"repository owner" who makes a single strand of pearls, time
seems to be flowing monotonically.

   1   1
  /  \/  \
 4  -1   4
 |   |   |
 3  -2   3
 |   |   |
 2  -3   2
   \ |  /
     0

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  6:41   ` A Large Angry SCM
  2006-07-04  7:17     ` Junio C Hamano
@ 2006-07-04  7:45     ` Junio C Hamano
  2006-07-04  8:23       ` Johannes Schindelin
                         ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04  7:45 UTC (permalink / raw)
  To: gitzilla; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> writes:

>> This is a good demonstration that merge-base may not give you
>> minimal set for pathological cases.  If you want to be through
>> you could traverse everything to make sure we do not say 'S' is
>> relevant, but that is quite expensive, so I think there will
>> always be artifacts of horizon effect like this no matter how
>> you try to catch it (didn't I keep saying that already?).
>
> The problem is in mark_reachable_commits(); it is either superfluous
> or it needs to parse_commit() those commits that haven't been parsed
> yet that it needs to traverse.

Yes, you could traverse everything.  But that is not practical.
We have known that the clean-up pass has this horizon effect,
and it is a compromise.

If you apply this testing patch on top of yours, you will see
that parsing more commits at that point makes the clean-up
pass go all the way down to the root commit.

We may alternatively not use the clean-up pass at all, but I
suspect that might give us many false positives.  I don't
remember the details but I think we added it while fixing
merge-base in the real life situation.

It may be interesting to run tests on real merges (I believe the
kernel repository has a handful merges that have more than one
merge bases) to see how effective the current clean-up pass is.
It may turn out to be ineffective in practice, in which case we
could kill it off.


diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
index 9a815bd..4c6ed5c 100755
--- a/t/t6010-merge-base.sh
+++ b/t/t6010-merge-base.sh
@@ -89,4 +89,33 @@ test_expect_success 'compute merge-base 
     'MB=$(git-merge-base --all PL PR) &&
      expr "$(git-name-rev "$MB")" : "[0-9a-f]* tags/C2"'
 
+# Setup third set
+# 
+# S-U0-U1-U2-U3-U4
+#  \           X
+#   D0-D1-D2-D3-D4
+
+U0=$(doit 1 U0 $S)
+D0=$(doit 1 D0 $S)
+U1=$(doit 2 U1 $U0)
+D1=$(doit 2 D1 $D0)
+U2=$(doit 3 U2 $U1)
+D2=$(doit 3 D2 $D1)
+U3=$(doit 4 U3 $U2)
+D3=$(doit 4 D3 $D2)
+U4=$(doit 5 U4 $U3 $D3)
+D4=$(doit 5 D4 $D3 $U3)
+
+test_expect_success 'compute merge-base' '
+
+	git merge-base --all U4 D4 >out 2>err 
+	if grep tags/S err
+	then
+		echo "went all the way down to S -- very unhappy"
+		false
+	else
+		echo "stopped before going too far"
+	fi
+'
+
 test_done
diff --git a/merge-base.c b/merge-base.c
index 4856ca0..daab296 100644
--- a/merge-base.c
+++ b/merge-base.c
@@ -6,8 +6,35 @@ #define PARENT1 1
 #define PARENT2 2
 #define UNINTERESTING 4
 
+static void debug_list(struct commit_list *l, const char *msg)
+{
+	fprintf(stderr, "%s\n", msg);
+	while (l) {
+		char buf[1024];
+		int parsed;
+		struct commit *commit = l->item;
+		l = l->next;
+		parsed = commit->object.parsed;
+
+		if (parsed) {
+			pretty_print_commit(CMIT_FMT_ONELINE, commit,
+					    ~0UL, buf, sizeof(buf), 7,
+					    NULL, NULL);
+		}
+		else {
+			sprintf(buf, "git-name-rev %s 1>&2",
+				sha1_to_hex(commit->object.sha1));
+			system(buf);
+			strcpy(buf, sha1_to_hex(commit->object.sha1));
+		}
+		fprintf(stderr, "%d %d %s\n", commit->object.flags,
+			parsed, buf);
+	}
+}
+
 static struct commit *interesting(struct commit_list *list)
 {
+	debug_list(list, "in interesting()");
 	while (list) {
 		struct commit *commit = list->item;
 		list = list->next;
@@ -134,6 +161,9 @@ static void mark_reachable_commits(struc
 	/*
 	 * Postprocess to fully contaminate the well.
 	 */
+	debug_list(list, "list at top of mark-reachable");
+	debug_list(result, "result at top of mark-reachable");
+
 	for (tmp = result; tmp; tmp = tmp->next) {
 		struct commit *c = tmp->item;
 		/* Reinject uninteresting ones to list,
@@ -142,10 +172,12 @@ static void mark_reachable_commits(struc
 		if (c->object.flags & UNINTERESTING)
 			commit_list_insert(c, &list);
 	}
+
 	while (list) {
 		struct commit *c = list->item;
 		struct commit_list *parents;
 
+		debug_list(list, "list in mark-reachable postprocessing");
 		tmp = list;
 		list = list->next;
 		free(tmp);
@@ -155,6 +187,15 @@ static void mark_reachable_commits(struc
 		 * parse new ones (we already parsed all the relevant
 		 * ones).
 		 */
+		
+		/* Parsing object here which is a disaster;
+		 * let's demonstrate it.
+		 */
+#if 1
+		if (!c->object.parsed)
+			parse_commit(c);
+#endif
+
 		parents = c->parents;
 		while (parents) {
 			struct commit *p = parents->item;
@@ -164,6 +205,7 @@ static void mark_reachable_commits(struc
 				commit_list_insert(p, &list);
 			}
 		}
+		debug_list(result, "result in mark-reachable postprocessing");
 	}
 }
 
@@ -196,6 +238,7 @@ static int merge_base(struct commit *rev
 		free(tmp);
 		if (flags == 3) {
 			insert_by_date(commit, &result);
+			debug_list(result, "a new result");
 
 			/* Mark parents of a found merge uninteresting */
 			flags |= UNINTERESTING;
@@ -218,6 +261,7 @@ static int merge_base(struct commit *rev
 	if (result->next && list)
 		mark_reachable_commits(result, list);
 
+	debug_list(result, "final result");
 	while (result) {
 		struct commit *commit = result->item;
 		result = result->next;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  7:45     ` Junio C Hamano
@ 2006-07-04  8:23       ` Johannes Schindelin
  2006-07-04 10:38         ` Junio C Hamano
                           ` (2 more replies)
  2006-07-04 20:08       ` A Large Angry SCM
  2006-07-05  0:59       ` Junio C Hamano
  2 siblings, 3 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-04  8:23 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gitzilla, git

Hi,

On Tue, 4 Jul 2006, Junio C Hamano wrote:

> A Large Angry SCM <gitzilla@gmail.com> writes:
> 
> >> This is a good demonstration that merge-base may not give you
> >> minimal set for pathological cases.  If you want to be through
> >> you could traverse everything to make sure we do not say 'S' is
> >> relevant, but that is quite expensive, so I think there will
> >> always be artifacts of horizon effect like this no matter how
> >> you try to catch it (didn't I keep saying that already?).
> >
> > The problem is in mark_reachable_commits(); it is either superfluous
> > or it needs to parse_commit() those commits that haven't been parsed
> > yet that it needs to traverse.
> 
> Yes, you could traverse everything.  But that is not practical.
> We have known that the clean-up pass has this horizon effect,
> and it is a compromise.

We could introduce a time.maximumSkew variable, and just walk only 
that much further when traversing the commits.

So, if you do not trust your clients to have a proper ntp setup, just say 
"I trust my peers to be off at most 1 day". That would save lots vs 
traverse-everything.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  8:23       ` Johannes Schindelin
@ 2006-07-04 10:38         ` Junio C Hamano
  2006-07-04 11:16           ` Jakub Narebski
  2006-07-04 11:40           ` Johannes Schindelin
  2006-07-04 20:18         ` A Large Angry SCM
  2006-07-04 21:15         ` Junio C Hamano
  2 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04 10:38 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: gitzilla, git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> We could introduce a time.maximumSkew variable, and just walk only 
> that much further when traversing the commits.
>
> So, if you do not trust your clients to have a proper ntp setup, just say 
> "I trust my peers to be off at most 1 day". That would save lots vs 
> traverse-everything.

The problem ALASCM's example demonstrates does rely on clock
skews.  The timestamps used in the example looked like this:


   1   1
  /  \/  \
 4  -1   4
 |   |   |
 3  -2   3
 |   |   |
 2  -3   2
   \ |  /
     0

The crucial clock skew the case relies on is that the tip of the
middle branch (-1) is older than the common commit (0).  But the
topmost commits with timestamp 1 could be with timestamp 5 to
correct the clock skew and still make the example "fail".

   5   5
  /  \/  \
 4  -1   4
 |   |   |
 3  -2   3
 |   |   |
 2  -3   2
   \ |  /
     0

However, I am not sure how you are going to use that maximumSkew
variable.  The evil owner of the middle branch may have started
running a "git am" to commit 4-patch series just when the
machine's clock jumped back by 3 seconds, at the pace of 1 patch
a second.  Then he pushes '0' out on "master" branch, and the
three commits on top of that on "next" branch.

Two days later, two friends build left and right strands of
pearls based on the "master" branch of the evil owner of the
middle branch.  Maybe they do that one patch a day.  On the
fifth day, they both merge the "next" branch.

The point is that it does not require a very large clock skew to
trigger this.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 10:38         ` Junio C Hamano
@ 2006-07-04 11:16           ` Jakub Narebski
  2006-07-04 11:35             ` Johannes Schindelin
  2006-07-04 11:40           ` Johannes Schindelin
  1 sibling, 1 reply; 26+ messages in thread
From: Jakub Narebski @ 2006-07-04 11:16 UTC (permalink / raw)
  To: git

Junio C Hamano wrote:


> The problem ALASCM's example demonstrates does rely on clock
> skews.  The timestamps used in the example looked like this:
> 
> 
>    1   1
>   /  \/  \
>  4  -1   4
>  |   |   |
>  3  -2   3
>  |   |   |
>  2  -3   2
>    \ |  /
>      0
> 
> The crucial clock skew the case relies on is that the tip of the
> middle branch (-1) is older than the common commit (0).  But the
> topmost commits with timestamp 1 could be with timestamp 5 to
> correct the clock skew and still make the example "fail".
> 
>    5   5
>   /  \/  \
>  4  -1   4
>  |   |   |
>  3  -2   3
>  |   |   |
>  2  -3   2
>    \ |  /
>      0

So would putting timestamp for merge be MAX(now, parents timestamps)
solve the problem?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 11:16           ` Jakub Narebski
@ 2006-07-04 11:35             ` Johannes Schindelin
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-04 11:35 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git

Hi,

On Tue, 4 Jul 2006, Jakub Narebski wrote:

> Junio C Hamano wrote:
> 
> 
> > The problem ALASCM's example demonstrates does rely on clock
> > skews.  The timestamps used in the example looked like this:
> > 
> > 
> >    1   1
> >   /  \/  \
> >  4  -1   4
> >  |   |   |
> >  3  -2   3
> >  |   |   |
> >  2  -3   2
> >    \ |  /
> >      0
> > 
> > The crucial clock skew the case relies on is that the tip of the
> > middle branch (-1) is older than the common commit (0).  But the
> > topmost commits with timestamp 1 could be with timestamp 5 to
> > correct the clock skew and still make the example "fail".
> > 
> >    5   5
> >   /  \/  \
> >  4  -1   4
> >  |   |   |
> >  3  -2   3
> >  |   |   |
> >  2  -3   2
> >    \ |  /
> >      0
> 
> So would putting timestamp for merge be MAX(now, parents timestamps)
> solve the problem?

If there is an evil committer, the parents could have bogus timestamps, 
too. But then, I would not pull from such an evil person...

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 10:38         ` Junio C Hamano
  2006-07-04 11:16           ` Jakub Narebski
@ 2006-07-04 11:40           ` Johannes Schindelin
  1 sibling, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-04 11:40 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gitzilla, git

Hi,

On Tue, 4 Jul 2006, Junio C Hamano wrote:

> However, I am not sure how you are going to use that maximumSkew
> variable.

My idea was to continue traversing the merge base's ancestors, marking 
them UNINTERESTING, until hitting a commit which is maximumSkew older than 
the merge base (and not just stop at the merge base, as is the case right 
now, and neither continue traversing in eternity like suggested).

This would not help _evil_ cases (i.e. intentional), but most certainly 
your regular clock skew in a Microsoft network.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  7:45     ` Junio C Hamano
  2006-07-04  8:23       ` Johannes Schindelin
@ 2006-07-04 20:08       ` A Large Angry SCM
  2006-07-05  0:59       ` Junio C Hamano
  2 siblings, 0 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-04 20:08 UTC (permalink / raw)
  To: Junio C Hamano, git

Junio C Hamano wrote:
> A Large Angry SCM <gitzilla@gmail.com> writes:
> 
>>> This is a good demonstration that merge-base may not give you
>>> minimal set for pathological cases.  If you want to be through
>>> you could traverse everything to make sure we do not say 'S' is
>>> relevant, but that is quite expensive, so I think there will
>>> always be artifacts of horizon effect like this no matter how
>>> you try to catch it (didn't I keep saying that already?).
>> The problem is in mark_reachable_commits(); it is either superfluous
>> or it needs to parse_commit() those commits that haven't been parsed
>> yet that it needs to traverse.
> 
> Yes, you could traverse everything.  But that is not practical.
> We have known that the clean-up pass has this horizon effect,
> and it is a compromise.

The clean-up pass was devised to eliminate bases that are reachable from 
other bases. It just doesn't look hard enough.

> If you apply this testing patch on top of yours, you will see
> that parsing more commits at that point makes the clean-up
> pass go all the way down to the root commit.

Yes, I was aware of graphs that would have that behavior.

The root of the problem is that the heuristic, that attempts to use 
timestamps to detect that a commit is _not_ reachable from a given 
commit, relies on the timestamps of commits with a reachability 
relationship to have a relationship that matches the graph.

> We may alternatively not use the clean-up pass at all, but I
> suspect that might give us many false positives.  I don't
> remember the details but I think we added it while fixing
> merge-base in the real life situation.

The history of the clean-up pass is that before it was added, 
git-merge-base was returning a base reachable from another base, and the 
base returned was, in some significant way, worse for merging. My 
construct demonstrates that the clean-up pass only deals with special case.

> It may be interesting to run tests on real merges (I believe the
> kernel repository has a handful merges that have more than one
> merge bases) to see how effective the current clean-up pass is.
> It may turn out to be ineffective in practice, in which case we
> could kill it off.

Although a very important set of repositories to Git, the linux kernel 
repositories may no longer be representative of the diversity of Git 
use. Still, it would be interesting to know the outcome.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  8:23       ` Johannes Schindelin
  2006-07-04 10:38         ` Junio C Hamano
@ 2006-07-04 20:18         ` A Large Angry SCM
  2006-07-04 21:15         ` Junio C Hamano
  2 siblings, 0 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-04 20:18 UTC (permalink / raw)
  To: Johannes Schindelin, git; +Cc: Junio C Hamano

Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 4 Jul 2006, Junio C Hamano wrote:
> 
>> A Large Angry SCM <gitzilla@gmail.com> writes:
>>
>>>> This is a good demonstration that merge-base may not give you
>>>> minimal set for pathological cases.  If you want to be through
>>>> you could traverse everything to make sure we do not say 'S' is
>>>> relevant, but that is quite expensive, so I think there will
>>>> always be artifacts of horizon effect like this no matter how
>>>> you try to catch it (didn't I keep saying that already?).
>>> The problem is in mark_reachable_commits(); it is either superfluous
>>> or it needs to parse_commit() those commits that haven't been parsed
>>> yet that it needs to traverse.
>> Yes, you could traverse everything.  But that is not practical.
>> We have known that the clean-up pass has this horizon effect,
>> and it is a compromise.
> 
> We could introduce a time.maximumSkew variable, and just walk only 
> that much further when traversing the commits.
> 
> So, if you do not trust your clients to have a proper ntp setup, just say 
> "I trust my peers to be off at most 1 day". That would save lots vs 
> traverse-everything.

The fuzz would only serve to mask, even more, that the heuristic is 
broken. But, it would also allow the (broken) heuristic to be used _and_ 
let the user decide how much effort may be used to find the correct bases.

If this happens, it should be (yet another) user configurable; either, 
per repository, command line, or both.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  8:23       ` Johannes Schindelin
  2006-07-04 10:38         ` Junio C Hamano
  2006-07-04 20:18         ` A Large Angry SCM
@ 2006-07-04 21:15         ` Junio C Hamano
  2006-07-04 22:25           ` Johannes Schindelin
  2006-07-05  8:39           ` Josef Weidendorfer
  2 siblings, 2 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04 21:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> We could introduce a time.maximumSkew variable, and just walk only 
> that much further when traversing the commits.

We could have had "commit generation number" in the commit
object header, and use that instead of commit timestamps for
these traversal purposes.  The generation number for a commit is
defined to be max(generation number of its parents)+1 and we
prime the recursive this definition by defining the generation
number for the root commit to be one.  A moral equivalent
alternative would be to notice that the commit timestamp we are
going to use to create for a new commit is smaller than one of
the commit timestamps of its parent commits and adjust the
commit timestamp in such a case by git-commit-tree, perhaps with
a warning.

These are pretty much water under the bridge by now, for two
reasons.  One is that I think it is better to make the tools
that use get_merge_bases() prepared for the case the function
includes suboptimal bases anyway, and the other is once we do
that this is not a strong enough justification to modify the
commit object format.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 21:15         ` Junio C Hamano
@ 2006-07-04 22:25           ` Johannes Schindelin
  2006-07-04 22:42             ` Junio C Hamano
  2006-07-04 23:07             ` A Large Angry SCM
  2006-07-05  8:39           ` Josef Weidendorfer
  1 sibling, 2 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-04 22:25 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

Hi,

On Tue, 4 Jul 2006, Junio C Hamano wrote:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > We could introduce a time.maximumSkew variable, and just walk only 
> > that much further when traversing the commits.
> 
> We could have had "commit generation number" in the commit
> object header, and use that instead of commit timestamps for
> these traversal purposes.  The generation number for a commit is
> defined to be max(generation number of its parents)+1 and we
> prime the recursive this definition by defining the generation
> number for the root commit to be one.

Are you really, really sure this is a remedy? I, for one, am quite sure of 
the opposite. What you propose is just another time scale, only this time, 
it is not universally true (not even minus local incompetence to keep the 
clock accurate).

If that should be not true, you always could rely on topo order. Which 
does not seem to solve the problem for you.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 22:25           ` Johannes Schindelin
@ 2006-07-04 22:42             ` Junio C Hamano
  2006-07-04 23:07             ` A Large Angry SCM
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04 22:42 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> If that should be not true, you always could rely on topo order. Which 
> does not seem to solve the problem for you.

The computation of merge-base is about computing topo order
cheaply, so that is a recursive definition of the problem, not a
solution, I am afraid.  

With the generation counter, we know the clean-up phase needs to
parse and traverse unparsed parents with the same or higher
generation counter than the lowest we have in the result list,
which would limit our clean-up traversal.  In order to look at
the generation number of parent, we would need to parse it, so
we would end up parsing one level more than needed at the edge,
though.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 22:25           ` Johannes Schindelin
  2006-07-04 22:42             ` Junio C Hamano
@ 2006-07-04 23:07             ` A Large Angry SCM
  2006-07-04 23:14               ` Jakub Narebski
                                 ` (2 more replies)
  1 sibling, 3 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-04 23:07 UTC (permalink / raw)
  To: Johannes Schindelin, git; +Cc: Junio C Hamano

Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 4 Jul 2006, Junio C Hamano wrote:
> 
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>>> We could introduce a time.maximumSkew variable, and just walk only 
>>> that much further when traversing the commits.
>> We could have had "commit generation number" in the commit
>> object header, and use that instead of commit timestamps for
>> these traversal purposes.  The generation number for a commit is
>> defined to be max(generation number of its parents)+1 and we
>> prime the recursive this definition by defining the generation
>> number for the root commit to be one.
> 
> Are you really, really sure this is a remedy? I, for one, am quite sure of 
> the opposite. What you propose is just another time scale, only this time, 
> it is not universally true (not even minus local incompetence to keep the 
> clock accurate).

It works[*] and it does what using the timestamp was trying to do. 
Namely, work from "more recent" (or "closer") commits toward "older" (or 
"farther") commits until you've gone past the point you care about.

It's a little late to be changing the structure of a commit and you'd 
have to deal with some size/scale issues, but it's do-able. A better 
idea may be to generate and keep the generation number on a per 
repository basis, and you'd be able to work around changing grafts.

[*] Grafts do _really_ nasty things to this. Just like clock skew does now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 23:07             ` A Large Angry SCM
@ 2006-07-04 23:14               ` Jakub Narebski
  2006-07-04 23:39                 ` Junio C Hamano
  2006-07-05  0:26                 ` A Large Angry SCM
  2006-07-05  0:24               ` Junio C Hamano
  2006-07-05  7:56               ` Johannes Schindelin
  2 siblings, 2 replies; 26+ messages in thread
From: Jakub Narebski @ 2006-07-04 23:14 UTC (permalink / raw)
  To: git

A Large Angry SCM wrote:

> It works[*] and it does what using the timestamp was trying to do. 
> Namely, work from "more recent" (or "closer") commits toward "older" (or 
> "farther") commits until you've gone past the point you care about.
> 
> It's a little late to be changing the structure of a commit and you'd 
> have to deal with some size/scale issues, but it's do-able. A better 
> idea may be to generate and keep the generation number on a per 
> repository basis, and you'd be able to work around changing grafts.

What about timestamp = MAX(now(), timestamps of parents) idea, which
doesn't need changing the structure of a commit?

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 23:14               ` Jakub Narebski
@ 2006-07-04 23:39                 ` Junio C Hamano
  2006-07-05  0:26                 ` A Large Angry SCM
  1 sibling, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-04 23:39 UTC (permalink / raw)
  To: jnareb; +Cc: git

Jakub Narebski <jnareb@gmail.com> writes:

> What about timestamp = MAX(now(), timestamps of parents) idea, which
> doesn't need changing the structure of a commit?

It changes the semantics without change syntax, which is not any
better (and in fact I think it is worse).

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 23:07             ` A Large Angry SCM
  2006-07-04 23:14               ` Jakub Narebski
@ 2006-07-05  0:24               ` Junio C Hamano
  2006-07-05  7:56               ` Johannes Schindelin
  2 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-05  0:24 UTC (permalink / raw)
  To: gitzilla, Johannes Schindelin; +Cc: git

A Large Angry SCM <gitzilla@gmail.com> writes:

>
> It works[*] and it does what using the timestamp was trying to
> do. Namely, work from "more recent" (or "closer") commits toward
> "older" (or "farther") commits until you've gone past the point you
> care about.

If you really really care, now we have clear_commit_marks() with
get_merge_bases() infrastructure in, you _could_ run another
round of get_merge_bases() on the commit on the result list to
see which ones are reachable from others by performing an
equivalent of fast-forward/already-up-to-date check.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 23:14               ` Jakub Narebski
  2006-07-04 23:39                 ` Junio C Hamano
@ 2006-07-05  0:26                 ` A Large Angry SCM
  1 sibling, 0 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-05  0:26 UTC (permalink / raw)
  To: Jakub Narebski, git

Jakub Narebski wrote:
> A Large Angry SCM wrote:
> 
>> It works[*] and it does what using the timestamp was trying to do. 
>> Namely, work from "more recent" (or "closer") commits toward "older" (or 
>> "farther") commits until you've gone past the point you care about.
>>
>> It's a little late to be changing the structure of a commit and you'd 
>> have to deal with some size/scale issues, but it's do-able. A better 
>> idea may be to generate and keep the generation number on a per 
>> repository basis, and you'd be able to work around changing grafts.
> 
> What about timestamp = MAX(now(), timestamps of parents) idea, which
> doesn't need changing the structure of a commit?
> 

So, do you really want your name as committer on a commit with a date 
300 years in the future because one of the parents had a bad date?

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04  7:45     ` Junio C Hamano
  2006-07-04  8:23       ` Johannes Schindelin
  2006-07-04 20:08       ` A Large Angry SCM
@ 2006-07-05  0:59       ` Junio C Hamano
  2 siblings, 0 replies; 26+ messages in thread
From: Junio C Hamano @ 2006-07-05  0:59 UTC (permalink / raw)
  To: git

Junio C Hamano <junkio@cox.net> writes:

> It may be interesting to run tests on real merges (I believe the
> kernel repository has a handful merges that have more than one
> merge bases) to see how effective the current clean-up pass is.
> It may turn out to be ineffective in practice, in which case we
> could kill it off.

So I counted.

There are 23 commits in the kernel history that have more than
one merge-bases.  The current merge-base code tells us that all
of them have two merge-bases.

None of them suffers from the horizon effect; the two bases
are not ancestor/descendant of each other.

A good news is that get_merge_bases() gives the same answer
without the clean-up pass mark_reachable_commits().

d0e5f39f1ee2e55d140064bb6d74c8bad25d71d0
361ea93cbff0e42cbc6a0f3c7a8238db9ed15648
4b2d9cf00962d0a0e697f887f3ecaa155cbde555
ba9b28d19a3251bb1dfe6a6f8cc89b96fb85f683
db21e578e551421d76641d72cb3f8296ed3f9e61
b425c8c5922562c562dc55a636c3c8d758ed6d17
2e9ff56efbc005ab2b92b68df65940c7459446c6
75e47b36004d136edff68295420424cba3a5ccd0
c45ae87ec9d03c9adfc466a6b560cb38b154813a
09e4f9029da1b53e835555c353a89c36b71233b0
0b310f36d7d96e27f6941ec0f9b95e15142f1e78
db9ace7083dbdcc3d02bdd6a1d26132c80b5b726
80c7af4074cbb4cb6be5d35c443ea6d5e8838a84
701db69d6647f61e4660c9102d7f2fd5dffc203d
5e3c2b95dd560baa41b08c8f5f00bbd6fbeebdcb
c7fb577e2a6cb04732541f2dc402bd46747f7558
ba9b543d5bec0a7605952e2ba501fb8b0f3b6407
84ffa747520edd4556b136bdfc9df9eb1673ce12
da28c12089dfcfb8695b6b555cdb8e03dda2b690
3190186362466658f01b2e354e639378ce07e1a9
0c168775709faa74c1b87f1e61046e0c51ade7f3
0e396ee43e445cb7c215a98da4e76d0ce354d9d7
467ca22d3371f132ee225a5591a1ed0cd518cb3d

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 23:07             ` A Large Angry SCM
  2006-07-04 23:14               ` Jakub Narebski
  2006-07-05  0:24               ` Junio C Hamano
@ 2006-07-05  7:56               ` Johannes Schindelin
  2006-07-05 16:15                 ` A Large Angry SCM
  2 siblings, 1 reply; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-05  7:56 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git, Junio C Hamano

Hi,

On Tue, 4 Jul 2006, A Large Angry SCM wrote:

> Johannes Schindelin wrote:
> > Hi,
> > 
> > On Tue, 4 Jul 2006, Junio C Hamano wrote:
> > 
> > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> > > 
> > > > We could introduce a time.maximumSkew variable, and just walk only that
> > > > much further when traversing the commits.
> > > We could have had "commit generation number" in the commit
> > > object header, and use that instead of commit timestamps for
> > > these traversal purposes.  The generation number for a commit is
> > > defined to be max(generation number of its parents)+1 and we
> > > prime the recursive this definition by defining the generation
> > > number for the root commit to be one.
> > 
> > Are you really, really sure this is a remedy? I, for one, am quite sure of
> > the opposite. What you propose is just another time scale, only this time,
> > it is not universally true (not even minus local incompetence to keep the
> > clock accurate).
> 
> It works[*] and it does what using the timestamp was trying to do. Namely,
> work from "more recent" (or "closer") commits toward "older" (or "farther")
> commits until you've gone past the point you care about.
> 
> It's a little late to be changing the structure of a commit and you'd have to
> deal with some size/scale issues, but it's do-able. A better idea may be to
> generate and keep the generation number on a per repository basis, and you'd
> be able to work around changing grafts.

Like, inside the cache? I dunno. IMHO it is way too late to change the 
structure of a commit in that particular manner, _plus_ you would get 
overflow issues.

> [*] Grafts do _really_ nasty things to this. Just like clock skew does now.

Grafts can do much nastier things to you, for example having a circular 
history. _But_ they cannot do that nasty thing outside of your repo. Clock 
skews can.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-04 21:15         ` Junio C Hamano
  2006-07-04 22:25           ` Johannes Schindelin
@ 2006-07-05  8:39           ` Josef Weidendorfer
  2006-07-05 14:28             ` A Large Angry SCM
  1 sibling, 1 reply; 26+ messages in thread
From: Josef Weidendorfer @ 2006-07-05  8:39 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Johannes Schindelin, git

On Tuesday 04 July 2006 23:15, Junio C Hamano wrote:
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> 
> > We could introduce a time.maximumSkew variable, and just walk only 
> > that much further when traversing the commits.
> 
> We could have had "commit generation number" in the commit
> object header, and use that instead of commit timestamps for
> these traversal purposes.

Isn't this "commit generation number" information that can be
regenerated on the fly, i.e. a perfect fit for data to be stored
in a persistant cache, e.g. in ".git/tmp/virtual-commit-timestamps"?

Josef

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-05  8:39           ` Josef Weidendorfer
@ 2006-07-05 14:28             ` A Large Angry SCM
  0 siblings, 0 replies; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-05 14:28 UTC (permalink / raw)
  To: Josef Weidendorfer, git; +Cc: Junio C Hamano, Johannes Schindelin

Josef Weidendorfer wrote:
> On Tuesday 04 July 2006 23:15, Junio C Hamano wrote:
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>>> We could introduce a time.maximumSkew variable, and just walk only 
>>> that much further when traversing the commits.
>> We could have had "commit generation number" in the commit
>> object header, and use that instead of commit timestamps for
>> these traversal purposes.
> 
> Isn't this "commit generation number" information that can be
> regenerated on the fly, i.e. a perfect fit for data to be stored
> in a persistant cache, e.g. in ".git/tmp/virtual-commit-timestamps"?

Yes

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-05  7:56               ` Johannes Schindelin
@ 2006-07-05 16:15                 ` A Large Angry SCM
  2006-07-05 17:04                   ` Johannes Schindelin
  0 siblings, 1 reply; 26+ messages in thread
From: A Large Angry SCM @ 2006-07-05 16:15 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano

Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 4 Jul 2006, A Large Angry SCM wrote:
> 
>> Johannes Schindelin wrote:
>>> Hi,
>>>
>>> On Tue, 4 Jul 2006, Junio C Hamano wrote:
>>>
>>>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>>>
>>>>> We could introduce a time.maximumSkew variable, and just walk only that
>>>>> much further when traversing the commits.
>>>> We could have had "commit generation number" in the commit
>>>> object header, and use that instead of commit timestamps for
>>>> these traversal purposes.  The generation number for a commit is
>>>> defined to be max(generation number of its parents)+1 and we
>>>> prime the recursive this definition by defining the generation
>>>> number for the root commit to be one.
>>> Are you really, really sure this is a remedy? I, for one, am quite sure of
>>> the opposite. What you propose is just another time scale, only this time,
>>> it is not universally true (not even minus local incompetence to keep the
>>> clock accurate).
>> It works[*] and it does what using the timestamp was trying to do. Namely,
>> work from "more recent" (or "closer") commits toward "older" (or "farther")
>> commits until you've gone past the point you care about.
>>
>> It's a little late to be changing the structure of a commit and you'd have to
>> deal with some size/scale issues, but it's do-able. A better idea may be to
>> generate and keep the generation number on a per repository basis, and you'd
>> be able to work around changing grafts.
> 
> Like, inside the cache? I dunno. IMHO it is way too late to change the 
> structure of a commit in that particular manner, _plus_ you would get 
> overflow issues.

Your don't need to change the commit object, create some repository 
specific, local, auxiliary information. Overflow should not be a problem 
until a path length to a root commit exceeds the machine word size.

But is it really worth the work? Does it help anything other than 
merge-base?

>> [*] Grafts do _really_ nasty things to this. Just like clock skew does now.
> 
> Grafts can do much nastier things to you, for example having a circular 
> history. _But_ they cannot do that nasty thing outside of your repo. Clock 
> skews can.

If grafts in your repository create a cycle, the misbehavior of 
merge-base should be among the least of your concerns.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH] Additional merge-base tests
  2006-07-05 16:15                 ` A Large Angry SCM
@ 2006-07-05 17:04                   ` Johannes Schindelin
  0 siblings, 0 replies; 26+ messages in thread
From: Johannes Schindelin @ 2006-07-05 17:04 UTC (permalink / raw)
  To: A Large Angry SCM; +Cc: git, Junio C Hamano

Hi,

On Wed, 5 Jul 2006, A Large Angry SCM wrote:

> If grafts in your repository create a cycle, the misbehavior of merge-base
> should be among the least of your concerns.

Right. BUT grafts can be very helpful to connect branches which were 
independently imported into git. And in these cases, the clockSkew really 
is no clockSkew. But in that case, the generation number would have to be 
recalculated also.

Anyway, I think that it should be a configurable, which defaults to off, 
i.e. in the normal case merge-base should behave as it does now. And if we 
have that configurable, we might as well take the safe but dumb approach 
to just traverse _everything_.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2006-07-05 17:05 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-07-04  3:55 [PATCH] Additional merge-base tests A Large Angry SCM
2006-07-04  5:47 ` Junio C Hamano
2006-07-04  6:41   ` A Large Angry SCM
2006-07-04  7:17     ` Junio C Hamano
2006-07-04  7:45     ` Junio C Hamano
2006-07-04  8:23       ` Johannes Schindelin
2006-07-04 10:38         ` Junio C Hamano
2006-07-04 11:16           ` Jakub Narebski
2006-07-04 11:35             ` Johannes Schindelin
2006-07-04 11:40           ` Johannes Schindelin
2006-07-04 20:18         ` A Large Angry SCM
2006-07-04 21:15         ` Junio C Hamano
2006-07-04 22:25           ` Johannes Schindelin
2006-07-04 22:42             ` Junio C Hamano
2006-07-04 23:07             ` A Large Angry SCM
2006-07-04 23:14               ` Jakub Narebski
2006-07-04 23:39                 ` Junio C Hamano
2006-07-05  0:26                 ` A Large Angry SCM
2006-07-05  0:24               ` Junio C Hamano
2006-07-05  7:56               ` Johannes Schindelin
2006-07-05 16:15                 ` A Large Angry SCM
2006-07-05 17:04                   ` Johannes Schindelin
2006-07-05  8:39           ` Josef Weidendorfer
2006-07-05 14:28             ` A Large Angry SCM
2006-07-04 20:08       ` A Large Angry SCM
2006-07-05  0:59       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).