git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] optimize fast-forward checks
@ 2012-08-27 23:11 Junio C Hamano
  2012-08-27 23:11 ` [PATCH 1/5] in_merge_bases(): support only one "other" commit Junio C Hamano
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:11 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

This is a follow-up to $gmane/204149; somehow the discussion petered
out, but I think this topic (and also "tag --contains" that was not
really discussed) needs to be looked into.  Here is a first baby
step.

 - The first one is an obvious simplification; we never supported
   more than one "reference" commits and no caller had to invent a
   loop around it to emulate multi-reference in_merge_base().

 - The two patches that follow are the uses of get_merge_bases()
   where in_merge_bases() is sufficient.  These callers are not
   interested in the merge bases between the two points; they only
   want to know if one point is an ancestor of the other.

 - The next one [4/5] should be identical to Thomas's patch.

 - The last one attempts to reduce the cost of postprocessing from
   N*(N-1) to N, but it somehow breaks 6010. I haven't looked into
   why.  They say all bugs are shallow given enough eyeballs, so I
   am sending it out to see if that is true ;-)


Junio C Hamano (5):
  in_merge_bases(): support only one "other" commit
  receive-pack: use in_merge_bases() for fast-forward check
  http-push: use in_merge_bases() for fast-forward check
  in_merge_bases(): omit unnecessary redundant common ancestor
    reduction
  (BROKEN) get_merge_bases_many(): walk from many tips in parallel

 builtin/branch.c                       |  4 +--
 builtin/fetch.c                        |  2 +-
 builtin/receive-pack.c                 |  8 +----
 commit.c                               | 60 ++++++++++++++++++++--------------
 commit.h                               |  2 +-
 contrib/examples/builtin-fetch--tool.c |  2 +-
 fast-import.c                          |  2 +-
 http-push.c                            |  3 +-
 submodule.c                            | 12 +++----
 9 files changed, 49 insertions(+), 46 deletions(-)

-- 
1.7.12.116.g31e0100

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/5] in_merge_bases(): support only one "other" commit
  2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
@ 2012-08-27 23:11 ` Junio C Hamano
  2012-08-27 23:12 ` [PATCH 2/5] receive-pack: use in_merge_bases() for fast-forward check Junio C Hamano
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:11 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

In early days of its life, I planned to make it possible to compute
"is a commit contained in all of these other commits?" with this
function, but it turned out that no caller needed it.

Just make it take two commit objects and add a comment to say what
these two functions do.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/branch.c                       |  4 ++--
 builtin/fetch.c                        |  2 +-
 commit.c                               | 15 +++++++++------
 commit.h                               |  2 +-
 contrib/examples/builtin-fetch--tool.c |  2 +-
 fast-import.c                          |  2 +-
 submodule.c                            | 12 ++++++------
 7 files changed, 21 insertions(+), 18 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 0e060f2..0360774 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -129,7 +129,7 @@ static int branch_merged(int kind, const char *name,
 	if (!reference_rev)
 		reference_rev = head_rev;
 
-	merged = in_merge_bases(rev, &reference_rev, 1);
+	merged = in_merge_bases(rev, reference_rev);
 
 	/*
 	 * After the safety valve is fully redefined to "check with
@@ -139,7 +139,7 @@ static int branch_merged(int kind, const char *name,
 	 * a gentle reminder is in order.
 	 */
 	if ((head_rev != reference_rev) &&
-	    in_merge_bases(rev, &head_rev, 1) != merged) {
+	    in_merge_bases(rev, head_rev) != merged) {
 		if (merged)
 			warning(_("deleting branch '%s' that has been merged to\n"
 				"         '%s', but not yet merged to HEAD."),
diff --git a/builtin/fetch.c b/builtin/fetch.c
index bb9a074..723072f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -323,7 +323,7 @@ static int update_local_ref(struct ref *ref,
 		return r;
 	}
 
-	if (in_merge_bases(current, &updated, 1)) {
+	if (in_merge_bases(current, updated)) {
 		char quickref[83];
 		int r;
 		strcpy(quickref, find_unique_abbrev(current->object.sha1, DEFAULT_ABBREV));
diff --git a/commit.c b/commit.c
index 42af4c1..74ec5f1 100644
--- a/commit.c
+++ b/commit.c
@@ -780,6 +780,9 @@ struct commit_list *get_merge_bases(struct commit *one, struct commit *two,
 	return get_merge_bases_many(one, 1, &two, cleanup);
 }
 
+/*
+ * Is "commit" a decendant of one of the elements on the "with_commit" list?
+ */
 int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
 {
 	if (!with_commit)
@@ -789,21 +792,21 @@ int is_descendant_of(struct commit *commit, struct commit_list *with_commit)
 
 		other = with_commit->item;
 		with_commit = with_commit->next;
-		if (in_merge_bases(other, &commit, 1))
+		if (in_merge_bases(other, commit))
 			return 1;
 	}
 	return 0;
 }
 
-int in_merge_bases(struct commit *commit, struct commit **reference, int num)
+/*
+ * Is "commit" an ancestor of (i.e. reachable from) the "reference"?
+ */
+int in_merge_bases(struct commit *commit, struct commit *reference)
 {
 	struct commit_list *bases, *b;
 	int ret = 0;
 
-	if (num == 1)
-		bases = get_merge_bases(commit, *reference, 1);
-	else
-		die("not yet");
+	bases = get_merge_bases(commit, reference, 1);
 	for (b = bases; b; b = b->next) {
 		if (!hashcmp(commit->object.sha1, b->item->object.sha1)) {
 			ret = 1;
diff --git a/commit.h b/commit.h
index d617fa3..6edce87 100644
--- a/commit.h
+++ b/commit.h
@@ -171,7 +171,7 @@ extern struct commit_list *get_shallow_commits(struct object_array *heads,
 		int depth, int shallow_flag, int not_shallow_flag);
 
 int is_descendant_of(struct commit *, struct commit_list *);
-int in_merge_bases(struct commit *, struct commit **, int);
+int in_merge_bases(struct commit *, struct commit *);
 
 extern int interactive_add(int argc, const char **argv, const char *prefix, int patch);
 extern int run_add_interactive(const char *revision, const char *patch_mode,
diff --git a/contrib/examples/builtin-fetch--tool.c b/contrib/examples/builtin-fetch--tool.c
index 0d54aa7..8bc8c75 100644
--- a/contrib/examples/builtin-fetch--tool.c
+++ b/contrib/examples/builtin-fetch--tool.c
@@ -96,7 +96,7 @@ static int update_local_ref(const char *name,
 	strcpy(oldh, find_unique_abbrev(current->object.sha1, DEFAULT_ABBREV));
 	strcpy(newh, find_unique_abbrev(sha1_new, DEFAULT_ABBREV));
 
-	if (in_merge_bases(current, &updated, 1)) {
+	if (in_merge_bases(current, updated)) {
 		fprintf(stderr, "* %s: fast-forward to %s\n",
 			name, note);
 		fprintf(stderr, "  old..new: %s..%s\n", oldh, newh);
diff --git a/fast-import.c b/fast-import.c
index eed97c8..c2a814e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -1691,7 +1691,7 @@ static int update_branch(struct branch *b)
 			return error("Branch %s is missing commits.", b->name);
 		}
 
-		if (!in_merge_bases(old_cmit, &new_cmit, 1)) {
+		if (!in_merge_bases(old_cmit, new_cmit)) {
 			unlock_ref(lock);
 			warning("Not updating %s"
 				" (new tip %s does not contain %s)",
diff --git a/submodule.c b/submodule.c
index 19dc6a6..d133796 100644
--- a/submodule.c
+++ b/submodule.c
@@ -788,7 +788,7 @@ static int find_first_merges(struct object_array *result, const char *path,
 		die("revision walk setup failed");
 	while ((commit = get_revision(&revs)) != NULL) {
 		struct object *o = &(commit->object);
-		if (in_merge_bases(b, &commit, 1))
+		if (in_merge_bases(b, commit))
 			add_object_array(o, NULL, &merges);
 	}
 	reset_revision_walk();
@@ -803,7 +803,7 @@ static int find_first_merges(struct object_array *result, const char *path,
 		contains_another = 0;
 		for (j = 0; j < merges.nr; j++) {
 			struct commit *m2 = (struct commit *) merges.objects[j].item;
-			if (i != j && in_merge_bases(m2, &m1, 1)) {
+			if (i != j && in_merge_bases(m2, m1)) {
 				contains_another = 1;
 				break;
 			}
@@ -865,18 +865,18 @@ int merge_submodule(unsigned char result[20], const char *path,
 	}
 
 	/* check whether both changes are forward */
-	if (!in_merge_bases(commit_base, &commit_a, 1) ||
-	    !in_merge_bases(commit_base, &commit_b, 1)) {
+	if (!in_merge_bases(commit_base, commit_a) ||
+	    !in_merge_bases(commit_base, commit_b)) {
 		MERGE_WARNING(path, "commits don't follow merge-base");
 		return 0;
 	}
 
 	/* Case #1: a is contained in b or vice versa */
-	if (in_merge_bases(commit_a, &commit_b, 1)) {
+	if (in_merge_bases(commit_a, commit_b)) {
 		hashcpy(result, b);
 		return 1;
 	}
-	if (in_merge_bases(commit_b, &commit_a, 1)) {
+	if (in_merge_bases(commit_b, commit_a)) {
 		hashcpy(result, a);
 		return 1;
 	}
-- 
1.7.12.116.g31e0100

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/5] receive-pack: use in_merge_bases() for fast-forward check
  2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
  2012-08-27 23:11 ` [PATCH 1/5] in_merge_bases(): support only one "other" commit Junio C Hamano
@ 2012-08-27 23:12 ` Junio C Hamano
  2012-08-27 23:12 ` [PATCH 3/5] http-push: " Junio C Hamano
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:12 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

The original computed merge-base between the old commit and the new
commit and checked if the old commit was a merge base between them,
in order to make sure we are fast-forwarding.

Instead, call in_merge_bases(old, new) which does the same.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 builtin/receive-pack.c | 8 +-------
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 3f05d97..b1abe7b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -478,7 +478,6 @@ static const char *update(struct command *cmd)
 	    !prefixcmp(name, "refs/heads/")) {
 		struct object *old_object, *new_object;
 		struct commit *old_commit, *new_commit;
-		struct commit_list *bases, *ent;
 
 		old_object = parse_object(old_sha1);
 		new_object = parse_object(new_sha1);
@@ -491,12 +490,7 @@ static const char *update(struct command *cmd)
 		}
 		old_commit = (struct commit *)old_object;
 		new_commit = (struct commit *)new_object;
-		bases = get_merge_bases(old_commit, new_commit, 1);
-		for (ent = bases; ent; ent = ent->next)
-			if (!hashcmp(old_sha1, ent->item->object.sha1))
-				break;
-		free_commit_list(bases);
-		if (!ent) {
+		if (!in_merge_bases(old_commit, new_commit)) {
 			rp_error("denying non-fast-forward %s"
 				 " (you should pull first)", name);
 			return "non-fast-forward";
-- 
1.7.12.116.g31e0100

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/5] http-push: use in_merge_bases() for fast-forward check
  2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
  2012-08-27 23:11 ` [PATCH 1/5] in_merge_bases(): support only one "other" commit Junio C Hamano
  2012-08-27 23:12 ` [PATCH 2/5] receive-pack: use in_merge_bases() for fast-forward check Junio C Hamano
@ 2012-08-27 23:12 ` Junio C Hamano
  2012-08-27 23:12 ` [PATCH 4/5] in_merge_bases(): omit unnecessary redundant common ancestor reduction Junio C Hamano
  2012-08-27 23:12 ` [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel Junio C Hamano
  4 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:12 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

The original computed merge-base between HEAD and the remote ref and
checked if the remote ref is a merge base between them, in order to
make sure that we are fast-forwarding.

Instead, call in_merge_bases(remote, HEAD) which does the same.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 http-push.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/http-push.c b/http-push.c
index a832ca7..8701c12 100644
--- a/http-push.c
+++ b/http-push.c
@@ -1610,9 +1610,8 @@ static int verify_merge_base(unsigned char *head_sha1, struct ref *remote)
 {
 	struct commit *head = lookup_commit_or_die(head_sha1, "HEAD");
 	struct commit *branch = lookup_commit_or_die(remote->old_sha1, remote->name);
-	struct commit_list *merge_bases = get_merge_bases(head, branch, 1);
 
-	return (merge_bases && !merge_bases->next && merge_bases->item == branch);
+	return in_merge_bases(branch, head);
 }
 
 static int delete_remote_branch(const char *pattern, int force)
-- 
1.7.12.116.g31e0100

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/5] in_merge_bases(): omit unnecessary redundant common ancestor reduction
  2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
                   ` (2 preceding siblings ...)
  2012-08-27 23:12 ` [PATCH 3/5] http-push: " Junio C Hamano
@ 2012-08-27 23:12 ` Junio C Hamano
  2012-08-27 23:12 ` [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel Junio C Hamano
  4 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:12 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

The function get_merge_bases() needs to postprocess the result from
merge_bases_many() in order to make sure none of the commit is a
true ancestor of another commit, which is expensive.  However, when
checking if a commit is an ancestor of another commit, we only need
to see if the commit is a common ancestor between the two, and do
not have to care if other common ancestors merge_bases_many() finds
are true merge bases or an ancestor of another merge base.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 74ec5f1..f5211bd 100644
--- a/commit.c
+++ b/commit.c
@@ -806,7 +806,10 @@ int in_merge_bases(struct commit *commit, struct commit *reference)
 	struct commit_list *bases, *b;
 	int ret = 0;
 
-	bases = get_merge_bases(commit, reference, 1);
+	bases = merge_bases_many(commit, 1, &reference);
+	clear_commit_marks(commit, all_flags);
+	clear_commit_marks(reference, all_flags);
+
 	for (b = bases; b; b = b->next) {
 		if (!hashcmp(commit->object.sha1, b->item->object.sha1)) {
 			ret = 1;
-- 
1.7.12.116.g31e0100

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
  2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
                   ` (3 preceding siblings ...)
  2012-08-27 23:12 ` [PATCH 4/5] in_merge_bases(): omit unnecessary redundant common ancestor reduction Junio C Hamano
@ 2012-08-27 23:12 ` Junio C Hamano
  2012-08-28  1:25   ` Junio C Hamano
  4 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-27 23:12 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

After merge_bases_many() paints the graph in two colors to find
common ancestors, get_merge_bases_many() reduces the result by
excluding commits that can be reached from other commits in the
result.  We used to do N * (N-1) traversals for this, but we can
check if one commit is reachable from any of the other (N-1) commits
by a single traversal, and repeat it N times to find the answer.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 commit.c | 42 +++++++++++++++++++++++-------------------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/commit.c b/commit.c
index f5211bd..ccf2c5a 100644
--- a/commit.c
+++ b/commit.c
@@ -715,7 +715,7 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 					 int cleanup)
 {
 	struct commit_list *list;
-	struct commit **rslt;
+	struct commit **rslt, **others;
 	struct commit_list *result;
 	int cnt, i, j;
 
@@ -744,33 +744,37 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 	for (list = result, i = 0; list; list = list->next)
 		rslt[i++] = list->item;
 	free_commit_list(result);
+	result = NULL;
 
 	clear_commit_marks(one, all_flags);
 	for (i = 0; i < n; i++)
 		clear_commit_marks(twos[i], all_flags);
-	for (i = 0; i < cnt - 1; i++) {
-		for (j = i+1; j < cnt; j++) {
-			if (!rslt[i] || !rslt[j])
-				continue;
-			result = merge_bases_many(rslt[i], 1, &rslt[j]);
-			clear_commit_marks(rslt[i], all_flags);
-			clear_commit_marks(rslt[j], all_flags);
-			for (list = result; list; list = list->next) {
-				if (rslt[i] == list->item)
-					rslt[i] = NULL;
-				if (rslt[j] == list->item)
-					rslt[j] = NULL;
-			}
-		}
-	}
 
-	/* Surviving ones in rslt[] are the independent results */
-	result = NULL;
+	others = xcalloc(cnt - 1, sizeof(*others));
 	for (i = 0; i < cnt; i++) {
-		if (rslt[i])
+		/*
+		 * Is rslt[i] an ancestor of any of the others?
+		 * then it is not interesting to us.
+		 */
+		for (j = 0; j < i; j++)
+			others[j] = rslt[j];
+		for (j = 1 + 1; j < cnt; j++)
+			others[j - 1] = rslt[j];
+		list = merge_bases_many(rslt[i], cnt - 1, others);
+		clear_commit_marks(rslt[i], all_flags);
+		for (j = 0; j < cnt - 1; j++)
+			clear_commit_marks(others[j], all_flags);
+		while (list) {
+			if (rslt[i] == list->item)
+				break;
+			list = list->next;
+		}
+		if (!list)
 			commit_list_insert_by_date(rslt[i], &result);
+		free_commit_list(list);
 	}
 	free(rslt);
+	free(others);
 	return result;
 }
 
-- 
1.7.12.116.g31e0100

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
  2012-08-27 23:12 ` [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel Junio C Hamano
@ 2012-08-28  1:25   ` Junio C Hamano
  2012-08-28 21:35     ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-28  1:25 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

>  	for (i = 0; i < cnt; i++) {
> -		if (rslt[i])
> +		/*
> +		 * Is rslt[i] an ancestor of any of the others?
> +		 * then it is not interesting to us.
> +		 */
> +		for (j = 0; j < i; j++)
> +			others[j] = rslt[j];
> +		for (j = 1 + 1; j < cnt; j++)

s/1 + 1/i + 1/;

With that, all tests seem to pass ;-)

> +			others[j - 1] = rslt[j];
> +		list = merge_bases_many(rslt[i], cnt - 1, others);
> +		clear_commit_marks(rslt[i], all_flags);
> +		for (j = 0; j < cnt - 1; j++)
> +			clear_commit_marks(others[j], all_flags);
> +		while (list) {
> +			if (rslt[i] == list->item)
> +				break;
> +			list = list->next;
> +		}
> +		if (!list)
>  			commit_list_insert_by_date(rslt[i], &result);
> +		free_commit_list(list);
>  	}
>  	free(rslt);
> +	free(others);
>  	return result;
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
  2012-08-28  1:25   ` Junio C Hamano
@ 2012-08-28 21:35     ` Junio C Hamano
  2012-08-28 23:39       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-28 21:35 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>>  	for (i = 0; i < cnt; i++) {
>> -		if (rslt[i])
>> +		/*
>> +		 * Is rslt[i] an ancestor of any of the others?
>> +		 * then it is not interesting to us.
>> +		 */
>> +		for (j = 0; j < i; j++)
>> +			others[j] = rslt[j];
>> +		for (j = 1 + 1; j < cnt; j++)
>
> s/1 + 1/i + 1/;
>
> With that, all tests seem to pass ;-)

"git merge-base" itself seems to have more room for improvement.
Trying to recompute bases for recent 200 merges in the kernel
history with the attached script does not show any improvement with
or without the series on top of recent "master".  Correctnesswise it
seems to be OK, though---I get byte-for-byte identical output.

-- >8 --
#!/bin/sh

git rev-list --committer=torvalds@linux-foundation.org \
    --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL

cmd='
	while read result parent1 parent2
	do
		$GIT merge-base $parent1 $parent2
	done <RL
'

GIT="rungit master" time sh -c "$cmd" >:stock
GIT=../git.git/git time sh -c "$cmd" >:optim
cmp :stock :optim
-- 8< --

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
  2012-08-28 21:35     ` Junio C Hamano
@ 2012-08-28 23:39       ` Junio C Hamano
  2012-08-29 11:08         ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-28 23:39 UTC (permalink / raw)
  To: git; +Cc: Thomas Rast, Jeff King

Junio C Hamano <gitster@pobox.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>>  	for (i = 0; i < cnt; i++) {
>>> -		if (rslt[i])
>>> +		/*
>>> +		 * Is rslt[i] an ancestor of any of the others?
>>> +		 * then it is not interesting to us.
>>> +		 */
>>> +		for (j = 0; j < i; j++)
>>> +			others[j] = rslt[j];
>>> +		for (j = 1 + 1; j < cnt; j++)
>>
>> s/1 + 1/i + 1/;
>>
>> With that, all tests seem to pass ;-)
>
> "git merge-base" itself seems to have more room for improvement.
> Trying to recompute bases for recent 200 merges in the kernel
> history with the attached script does not show any improvement with
> or without the series on top of recent "master".  Correctnesswise it
> seems to be OK, though---I get byte-for-byte identical output.
>
> -- >8 --
> #!/bin/sh
>
> git rev-list --committer=torvalds@linux-foundation.org \
>     --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL
>
> cmd='
> 	while read result parent1 parent2
> 	do
> 		$GIT merge-base $parent1 $parent2
> 	done <RL
> '
>
> GIT="rungit master" time sh -c "$cmd" >:stock
> GIT=../git.git/git time sh -c "$cmd" >:optim
> cmp :stock :optim
> -- 8< --

I have this suspicion that it is spending most of its time in
insert-by-date.  Luckily, merge_bases_many() is totally outside of
the usual revision traversal and its use of the commit list is
pretty well isolated.

Peff, can I interest you into resurrecting your $gmane/174007 and
$gmane/174008 (we do not need pop_commit_from_queue() in the latter;
everything can stay as static to commit.c for now) to see how well
priority queue based approach would perform?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel
  2012-08-28 23:39       ` Junio C Hamano
@ 2012-08-29 11:08         ` Jeff King
  2012-08-29 11:10           ` [PATCH 1/2] basic priority queue implementation Jeff King
  2012-08-29 11:11           ` [PATCH 2/2] commit: use a priority queue in merge base functions Jeff King
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2012-08-29 11:08 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Tue, Aug 28, 2012 at 04:39:11PM -0700, Junio C Hamano wrote:

> > git rev-list --committer=torvalds@linux-foundation.org \
> >     --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 >RL
> >
> > cmd='
> > 	while read result parent1 parent2
> > 	do
> > 		$GIT merge-base $parent1 $parent2
> > 	done <RL
> > '
> 
> I have this suspicion that it is spending most of its time in
> insert-by-date.  Luckily, merge_bases_many() is totally outside of
> the usual revision traversal and its use of the commit list is
> pretty well isolated.
> 
> Peff, can I interest you into resurrecting your $gmane/174007 and
> $gmane/174008 (we do not need pop_commit_from_queue() in the latter;
> everything can stay as static to commit.c for now) to see how well
> priority queue based approach would perform?

I did a quick port of merge_bases_many and get_merge_bases_many to use
priority queues, but I didn't see any speedup at all on the above test
case.  According to perf, we spend most of our time in zlib, inflating
commits. Only 1% is spent on commit_list_insert_by_date, which is about
the same amount spent on queue insertion after my patches.

Patches follow, just for reference.

  [1/2]: basic priority queue implementation
  [2/2]: commit: use a priority queue in merge base functions

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/2] basic priority queue implementation
  2012-08-29 11:08         ` Jeff King
@ 2012-08-29 11:10           ` Jeff King
  2012-08-29 11:11           ` [PATCH 2/2] commit: use a priority queue in merge base functions Jeff King
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff King @ 2012-08-29 11:10 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

This can provide better algorithmic complexity for some of
the date-sorted commit list uses, among other things.

The queue is stored as a heap, allowing O(log) insertion and
top-element removal, and O(1) peeking at the top element.
Current commit lists are sorted linked lists, giving O(1)
peeking and top-element removal, but O(n^2) insertion.

Signed-off-by: Jeff King <peff@peff.net>
---
Mostly the same as $gmane/174007, but rebased on top of jc/merge-bases,
and adding queue_clear() to avoid leaking memory.

 .gitignore       |  1 +
 Makefile         |  3 +++
 queue.c          | 75 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 queue.h          | 18 ++++++++++++++
 t/t0007-queue.sh | 50 +++++++++++++++++++++++++++++++++++++
 test-queue.c     | 39 +++++++++++++++++++++++++++++
 6 files changed, 186 insertions(+)
 create mode 100644 queue.c
 create mode 100644 queue.h
 create mode 100755 t/t0007-queue.sh
 create mode 100644 test-queue.c

diff --git a/.gitignore b/.gitignore
index 3b7680e..f4539e8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -184,6 +184,7 @@
 /test-obj-pool
 /test-parse-options
 /test-path-utils
+/test-queue
 /test-run-command
 /test-sha1
 /test-sigchain
diff --git a/Makefile b/Makefile
index e4f8e0e..a37d962 100644
--- a/Makefile
+++ b/Makefile
@@ -476,6 +476,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-queue
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
@@ -597,6 +598,7 @@ LIB_H += prompt.h
 LIB_H += pkt-line.h
 LIB_H += progress.h
 LIB_H += prompt.h
+LIB_H += queue.h
 LIB_H += quote.h
 LIB_H += reflog-walk.h
 LIB_H += refs.h
@@ -709,6 +711,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += pretty.o
 LIB_OBJS += progress.o
 LIB_OBJS += prompt.o
+LIB_OBJS += queue.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
diff --git a/queue.c b/queue.c
new file mode 100644
index 0000000..7be6b86
--- /dev/null
+++ b/queue.c
@@ -0,0 +1,75 @@
+#include "queue.h"
+#include "cache.h"
+
+static inline void queue_swap(struct queue *pq, unsigned i, unsigned j)
+{
+	void *tmp = pq->items[i];
+	pq->items[i] = pq->items[j];
+	pq->items[j] = tmp;
+}
+
+static void queue_heapify_up(struct queue *pq)
+{
+	unsigned i = pq->nr - 1;
+	while (i > 0) {
+		int parent = (i-1)/2;
+
+		if (pq->cmp(pq->items[i], pq->items[parent]) <= 0)
+			return;
+
+		queue_swap(pq, i, parent);
+		i = parent;
+	}
+}
+
+void queue_insert(struct queue *pq, void *item)
+{
+	ALLOC_GROW(pq->items, pq->nr + 1, pq->alloc);
+	pq->items[pq->nr++] = item;
+	queue_heapify_up(pq);
+}
+
+static void queue_heapify_down(struct queue *pq)
+{
+	unsigned i = 0;
+	while (1) {
+		int largest = i, left = 2*i + 1, right = 2*i + 2;
+
+		if (left < pq->nr &&
+		    pq->cmp(pq->items[left], pq->items[largest]) > 0)
+			largest = left;
+		if (right < pq->nr &&
+		    pq->cmp(pq->items[right], pq->items[largest]) > 0)
+			largest = right;
+
+		if (largest == i)
+			return;
+
+		queue_swap(pq, i, largest);
+		i = largest;
+	}
+}
+
+void *queue_peek(struct queue *pq)
+{
+	return pq->nr ? pq->items[0] : NULL;
+}
+
+void *queue_pop(struct queue *pq)
+{
+	void *ret;
+
+	if (!pq->nr)
+		return NULL;
+	ret = pq->items[0];
+
+	pq->items[0] = pq->items[--pq->nr];
+	queue_heapify_down(pq);
+
+	return ret;
+}
+
+void queue_clear(struct queue *pq)
+{
+	free(pq->items);
+}
diff --git a/queue.h b/queue.h
new file mode 100644
index 0000000..cc471b5
--- /dev/null
+++ b/queue.h
@@ -0,0 +1,18 @@
+#ifndef QUEUE_H
+#define QUEUE_H
+
+typedef int (*queue_comparison_func_t)(const void *, const void *);
+
+struct queue {
+	queue_comparison_func_t cmp;
+	void **items;
+	unsigned nr;
+	unsigned alloc;
+};
+
+void queue_insert(struct queue *pq, void *item);
+void *queue_peek(struct queue *pq);
+void *queue_pop(struct queue *pq);
+void queue_clear(struct queue *pq);
+
+#endif /* QUEUE_H */
diff --git a/t/t0007-queue.sh b/t/t0007-queue.sh
new file mode 100755
index 0000000..ee357bb
--- /dev/null
+++ b/t/t0007-queue.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='basic tests for priority queue implementation'
+. ./test-lib.sh
+
+cat >expect <<'EOF'
+10
+9
+8
+7
+6
+5
+5
+4
+3
+2
+1
+EOF
+test_expect_success 'basic ordering' '
+	test-queue 2 6 3 10 9 5 7 4 5 8 1 dump >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+3
+5
+4
+6
+2
+1
+EOF
+test_expect_success 'mixed insert and pop' '
+	test-queue 1 2 3 pop 4 5 pop pop 6 dump >actual &&
+	test_cmp expect actual
+'
+
+cat >expect <<'EOF'
+2
+1
+NULL
+2
+1
+NULL
+EOF
+test_expect_success 'notice empty queue' '
+	test-queue 1 2 pop pop pop 1 2 pop pop pop >actual &&
+	test_cmp expect actual
+'
+
+test_done
diff --git a/test-queue.c b/test-queue.c
new file mode 100644
index 0000000..a1e92e2
--- /dev/null
+++ b/test-queue.c
@@ -0,0 +1,39 @@
+#include "cache.h"
+#include "queue.h"
+
+static int intcmp(const void *va, const void *vb)
+{
+	const int *a = va, *b = vb;
+	return *a - *b;
+}
+
+static void show(int *v)
+{
+	if (!v)
+		printf("NULL\n");
+	else
+		printf("%d\n", *v);
+	free(v);
+}
+
+int main(int argc, char **argv)
+{
+	struct queue pq = { intcmp };
+
+	while (*++argv) {
+		if (!strcmp(*argv, "pop"))
+			show(queue_pop(&pq));
+		else if (!strcmp(*argv, "dump")) {
+			int *v;
+			while ((v = queue_pop(&pq)))
+			       show(v);
+		}
+		else {
+			int *v = malloc(sizeof(*v));
+			*v = atoi(*argv);
+			queue_insert(&pq, v);
+		}
+	}
+
+	return 0;
+}
-- 
1.7.12.35.g38689e3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 11:08         ` Jeff King
  2012-08-29 11:10           ` [PATCH 1/2] basic priority queue implementation Jeff King
@ 2012-08-29 11:11           ` Jeff King
  2012-08-29 16:36             ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-29 11:11 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

The merge-base functions internally keep a commit lists and
insert by date, which causes a linear search of the commit
list for each insertion. Let's use a priority queue instead.

Unfortunately, from my experiments, this didn't actually
cause any speedup.

Signed-off-by: Jeff King <peff@peff.net>
---
I'd probably split this into a few commits if we were really going to
apply it, but since it doesn't actually make anything faster, I doubt
the code churn is worth it.

 commit.c | 78 ++++++++++++++++++++++++++++++++++++++--------------------------
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/commit.c b/commit.c
index 380f4ec..c64ef94 100644
--- a/commit.c
+++ b/commit.c
@@ -7,6 +7,7 @@
 #include "revision.h"
 #include "notes.h"
 #include "gpg-interface.h"
+#include "queue.h"
 
 int save_commit_buffer = 1;
 
@@ -360,6 +361,15 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list *
 	return new_list;
 }
 
+static void commit_list_append(struct commit *item, struct commit_list ***tail)
+{
+	struct commit_list *new_list = xmalloc(sizeof(*new_list));
+	new_list->item = item;
+	new_list->next = NULL;
+	**tail = new_list;
+	*tail = &new_list->next;
+}
+
 unsigned commit_list_count(const struct commit_list *l)
 {
 	unsigned c = 0;
@@ -422,6 +432,16 @@ struct commit *pop_most_recent_commit(struct commit_list **list,
 	return ret;
 }
 
+static int commit_datecmp(const void *va, const void *vb)
+{
+	const struct commit *a = va, *b = vb;
+	if (a->date < b->date)
+		return -1;
+	else if (a->date > b->date)
+		return 1;
+	return 0;
+}
+
 void clear_commit_marks(struct commit *commit, unsigned int mark)
 {
 	while (commit) {
@@ -569,22 +589,23 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
 
 static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
 
-static struct commit *interesting(struct commit_list *list)
+static int interesting(struct queue *q)
 {
-	while (list) {
-		struct commit *commit = list->item;
-		list = list->next;
+	int i;
+	for (i = 0; i < q->nr; i++) {
+		struct commit *commit = q->items[i];
 		if (commit->object.flags & STALE)
 			continue;
-		return commit;
+		return 1;
 	}
-	return NULL;
+	return 0;
 }
 
 static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos)
 {
-	struct commit_list *list = NULL;
-	struct commit_list *result = NULL;
+	struct queue result = { commit_datecmp };
+	struct queue list = { commit_datecmp };
+	struct commit_list *ret = NULL, **tail = &ret;
 	int i;
 
 	for (i = 0; i < n; i++) {
@@ -593,7 +614,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
 			 * We do not mark this even with RESULT so we do not
 			 * have to clean it up.
 			 */
-			return commit_list_insert(one, &result);
+			return commit_list_insert(one, &ret);
 	}
 
 	if (parse_commit(one))
@@ -604,28 +625,24 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
 	}
 
 	one->object.flags |= PARENT1;
-	commit_list_insert_by_date(one, &list);
+	queue_insert(&list, one);
 	for (i = 0; i < n; i++) {
 		twos[i]->object.flags |= PARENT2;
-		commit_list_insert_by_date(twos[i], &list);
+		queue_insert(&list, twos[i]);
 	}
 
-	while (interesting(list)) {
+	while (interesting(&list)) {
 		struct commit *commit;
 		struct commit_list *parents;
-		struct commit_list *next;
 		int flags;
 
-		commit = list->item;
-		next = list->next;
-		free(list);
-		list = next;
+		commit = queue_pop(&list);
 
 		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
 		if (flags == (PARENT1 | PARENT2)) {
 			if (!(commit->object.flags & RESULT)) {
 				commit->object.flags |= RESULT;
-				commit_list_insert_by_date(commit, &result);
+				queue_insert(&result, commit);
 			}
 			/* Mark parents of a found merge stale */
 			flags |= STALE;
@@ -639,21 +656,16 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
 			if (parse_commit(p))
 				return NULL;
 			p->object.flags |= flags;
-			commit_list_insert_by_date(p, &list);
+			queue_insert(&list, p);
 		}
 	}
 
 	/* Clean up the result to remove stale ones */
-	free_commit_list(list);
-	list = result; result = NULL;
-	while (list) {
-		struct commit_list *next = list->next;
-		if (!(list->item->object.flags & STALE))
-			commit_list_insert_by_date(list->item, &result);
-		free(list);
-		list = next;
-	}
-	return result;
+	while (result.nr)
+		commit_list_append(queue_pop(&result), &tail);
+	queue_clear(&result);
+	queue_clear(&list);
+	return ret;
 }
 
 struct commit_list *get_octopus_merge_bases(struct commit_list *in)
@@ -690,7 +702,8 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 {
 	struct commit_list *list;
 	struct commit **rslt, **others;
-	struct commit_list *result;
+	struct commit_list *result, **tail = &result;
+	struct queue commit_queue = { commit_datecmp };
 	int cnt, i, j;
 
 	result = merge_bases_many(one, n, twos);
@@ -744,11 +757,14 @@ struct commit_list *get_merge_bases_many(struct commit *one,
 			list = list->next;
 		}
 		if (!list)
-			commit_list_insert_by_date(rslt[i], &result);
+			queue_insert(&commit_queue, rslt[i]);
 		free_commit_list(list);
 	}
 	free(rslt);
 	free(others);
+	while (commit_queue.nr)
+		commit_list_append(queue_pop(&commit_queue), &tail);
+	queue_clear(&commit_queue);
 	return result;
 }
 
-- 
1.7.12.35.g38689e3

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 11:11           ` [PATCH 2/2] commit: use a priority queue in merge base functions Jeff King
@ 2012-08-29 16:36             ` Junio C Hamano
  2012-08-29 20:53               ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-29 16:36 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> The merge-base functions internally keep a commit lists and
> insert by date, which causes a linear search of the commit
> list for each insertion. Let's use a priority queue instead.
>
> Unfortunately, from my experiments, this didn't actually
> cause any speedup.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
> I'd probably split this into a few commits if we were really going to
> apply it, but since it doesn't actually make anything faster, I doubt
> the code churn is worth it.

Thanks.  This seems to break t6010-merge-base.sh for me, though...



>
>  commit.c | 78 ++++++++++++++++++++++++++++++++++++++--------------------------
>  1 file changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/commit.c b/commit.c
> index 380f4ec..c64ef94 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -7,6 +7,7 @@
>  #include "revision.h"
>  #include "notes.h"
>  #include "gpg-interface.h"
> +#include "queue.h"
>  
>  int save_commit_buffer = 1;
>  
> @@ -360,6 +361,15 @@ struct commit_list *commit_list_insert(struct commit *item, struct commit_list *
>  	return new_list;
>  }
>  
> +static void commit_list_append(struct commit *item, struct commit_list ***tail)
> +{
> +	struct commit_list *new_list = xmalloc(sizeof(*new_list));
> +	new_list->item = item;
> +	new_list->next = NULL;
> +	**tail = new_list;
> +	*tail = &new_list->next;
> +}
> +
>  unsigned commit_list_count(const struct commit_list *l)
>  {
>  	unsigned c = 0;
> @@ -422,6 +432,16 @@ struct commit *pop_most_recent_commit(struct commit_list **list,
>  	return ret;
>  }
>  
> +static int commit_datecmp(const void *va, const void *vb)
> +{
> +	const struct commit *a = va, *b = vb;
> +	if (a->date < b->date)
> +		return -1;
> +	else if (a->date > b->date)
> +		return 1;
> +	return 0;
> +}
> +
>  void clear_commit_marks(struct commit *commit, unsigned int mark)
>  {
>  	while (commit) {
> @@ -569,22 +589,23 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>  
>  static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
>  
> -static struct commit *interesting(struct commit_list *list)
> +static int interesting(struct queue *q)
>  {
> -	while (list) {
> -		struct commit *commit = list->item;
> -		list = list->next;
> +	int i;
> +	for (i = 0; i < q->nr; i++) {
> +		struct commit *commit = q->items[i];
>  		if (commit->object.flags & STALE)
>  			continue;
> -		return commit;
> +		return 1;
>  	}
> -	return NULL;
> +	return 0;
>  }
>  
>  static struct commit_list *merge_bases_many(struct commit *one, int n, struct commit **twos)
>  {
> -	struct commit_list *list = NULL;
> -	struct commit_list *result = NULL;
> +	struct queue result = { commit_datecmp };
> +	struct queue list = { commit_datecmp };
> +	struct commit_list *ret = NULL, **tail = &ret;
>  	int i;
>  
>  	for (i = 0; i < n; i++) {
> @@ -593,7 +614,7 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>  			 * We do not mark this even with RESULT so we do not
>  			 * have to clean it up.
>  			 */
> -			return commit_list_insert(one, &result);
> +			return commit_list_insert(one, &ret);
>  	}
>  
>  	if (parse_commit(one))
> @@ -604,28 +625,24 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>  	}
>  
>  	one->object.flags |= PARENT1;
> -	commit_list_insert_by_date(one, &list);
> +	queue_insert(&list, one);
>  	for (i = 0; i < n; i++) {
>  		twos[i]->object.flags |= PARENT2;
> -		commit_list_insert_by_date(twos[i], &list);
> +		queue_insert(&list, twos[i]);
>  	}
>  
> -	while (interesting(list)) {
> +	while (interesting(&list)) {
>  		struct commit *commit;
>  		struct commit_list *parents;
> -		struct commit_list *next;
>  		int flags;
>  
> -		commit = list->item;
> -		next = list->next;
> -		free(list);
> -		list = next;
> +		commit = queue_pop(&list);
>  
>  		flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
>  		if (flags == (PARENT1 | PARENT2)) {
>  			if (!(commit->object.flags & RESULT)) {
>  				commit->object.flags |= RESULT;
> -				commit_list_insert_by_date(commit, &result);
> +				queue_insert(&result, commit);
>  			}
>  			/* Mark parents of a found merge stale */
>  			flags |= STALE;
> @@ -639,21 +656,16 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
>  			if (parse_commit(p))
>  				return NULL;
>  			p->object.flags |= flags;
> -			commit_list_insert_by_date(p, &list);
> +			queue_insert(&list, p);
>  		}
>  	}
>  
>  	/* Clean up the result to remove stale ones */
> -	free_commit_list(list);
> -	list = result; result = NULL;
> -	while (list) {
> -		struct commit_list *next = list->next;
> -		if (!(list->item->object.flags & STALE))
> -			commit_list_insert_by_date(list->item, &result);
> -		free(list);
> -		list = next;
> -	}
> -	return result;
> +	while (result.nr)
> +		commit_list_append(queue_pop(&result), &tail);
> +	queue_clear(&result);
> +	queue_clear(&list);
> +	return ret;
>  }
>  
>  struct commit_list *get_octopus_merge_bases(struct commit_list *in)
> @@ -690,7 +702,8 @@ struct commit_list *get_merge_bases_many(struct commit *one,
>  {
>  	struct commit_list *list;
>  	struct commit **rslt, **others;
> -	struct commit_list *result;
> +	struct commit_list *result, **tail = &result;
> +	struct queue commit_queue = { commit_datecmp };
>  	int cnt, i, j;
>  
>  	result = merge_bases_many(one, n, twos);
> @@ -744,11 +757,14 @@ struct commit_list *get_merge_bases_many(struct commit *one,
>  			list = list->next;
>  		}
>  		if (!list)
> -			commit_list_insert_by_date(rslt[i], &result);
> +			queue_insert(&commit_queue, rslt[i]);
>  		free_commit_list(list);
>  	}
>  	free(rslt);
>  	free(others);
> +	while (commit_queue.nr)
> +		commit_list_append(queue_pop(&commit_queue), &tail);
> +	queue_clear(&commit_queue);
>  	return result;
>  }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 16:36             ` Junio C Hamano
@ 2012-08-29 20:53               ` Jeff King
  2012-08-29 20:55                 ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-29 20:53 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Wed, Aug 29, 2012 at 09:36:43AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The merge-base functions internally keep a commit lists and
> > insert by date, which causes a linear search of the commit
> > list for each insertion. Let's use a priority queue instead.
> >
> > Unfortunately, from my experiments, this didn't actually
> > cause any speedup.
> >
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> > I'd probably split this into a few commits if we were really going to
> > apply it, but since it doesn't actually make anything faster, I doubt
> > the code churn is worth it.
> 
> Thanks.  This seems to break t6010-merge-base.sh for me, though...

Interesting. It works fine here, even under --valgrind. Did you apply
the patches directly on top of 1251cc7?

Not that it matters _too_ much if we are just going to scrap it anyway,
but maybe it is an indication that I screwed up something that could
impact the timing (I did check that the timed merge-base calculations on
linux-2.6 yielded the same results though).

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 20:53               ` Jeff King
@ 2012-08-29 20:55                 ` Jeff King
  2012-08-29 21:00                   ` Jeff King
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-29 20:55 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Wed, Aug 29, 2012 at 04:53:32PM -0400, Jeff King wrote:

> > Thanks.  This seems to break t6010-merge-base.sh for me, though...
> 
> Interesting. It works fine here, even under --valgrind. Did you apply
> the patches directly on top of 1251cc7?

Hmm, this does seem to break t6024 for me, though.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 20:55                 ` Jeff King
@ 2012-08-29 21:00                   ` Jeff King
  2012-08-29 21:05                     ` Jeff King
  2012-08-29 21:18                     ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2012-08-29 21:00 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Wed, Aug 29, 2012 at 04:55:25PM -0400, Jeff King wrote:

> On Wed, Aug 29, 2012 at 04:53:32PM -0400, Jeff King wrote:
> 
> > > Thanks.  This seems to break t6010-merge-base.sh for me, though...
> > 
> > Interesting. It works fine here, even under --valgrind. Did you apply
> > the patches directly on top of 1251cc7?
> 
> Hmm, this does seem to break t6024 for me, though.

Probably because:

>  	/* Clean up the result to remove stale ones */
> -	free_commit_list(list);
> -	list = result; result = NULL;
> -	while (list) {
> -		struct commit_list *next = list->next;
> -		if (!(list->item->object.flags & STALE))
> -			commit_list_insert_by_date(list->item, &result);
> -		free(list);
> -		list = next;
> -	}
> -	return result;
> +	while (result.nr)
> +		commit_list_append(queue_pop(&result), &tail);
> +	queue_clear(&result);
> +	queue_clear(&list);
> +	return ret;

I forgot to to port the STALE flag handling here.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 21:00                   ` Jeff King
@ 2012-08-29 21:05                     ` Jeff King
  2012-08-30 12:54                       ` Jeff King
  2012-08-29 21:18                     ` Junio C Hamano
  1 sibling, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-29 21:05 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Wed, Aug 29, 2012 at 05:00:32PM -0400, Jeff King wrote:

> > Hmm, this does seem to break t6024 for me, though.
> 
> Probably because:
> 
> >  	/* Clean up the result to remove stale ones */
> > -	free_commit_list(list);
> > -	list = result; result = NULL;
> > -	while (list) {
> > -		struct commit_list *next = list->next;
> > -		if (!(list->item->object.flags & STALE))
> > -			commit_list_insert_by_date(list->item, &result);
> > -		free(list);
> > -		list = next;
> > -	}
> > -	return result;
> > +	while (result.nr)
> > +		commit_list_append(queue_pop(&result), &tail);
> > +	queue_clear(&result);
> > +	queue_clear(&list);
> > +	return ret;
> 
> I forgot to to port the STALE flag handling here.

You would want this on top:

diff --git a/commit.c b/commit.c
index c64ef94..8259871 100644
--- a/commit.c
+++ b/commit.c
@@ -661,8 +661,11 @@ static struct commit_list *merge_bases_many(struct commit *one, int n, struct co
 	}
 
 	/* Clean up the result to remove stale ones */
-	while (result.nr)
-		commit_list_append(queue_pop(&result), &tail);
+	while (result.nr) {
+		struct commit *commit = queue_pop(&result);
+		if (!(commit->object.flags & STALE))
+			commit_list_append(commit, &tail);
+	}
 	queue_clear(&result);
 	queue_clear(&list);
 	return ret;

but t6024 still fails (it clearly is finding a different merge base than
the test expects).  I'll trace through it, but it will have to be later
tonight.

-Peff

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 21:00                   ` Jeff King
  2012-08-29 21:05                     ` Jeff King
@ 2012-08-29 21:18                     ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-29 21:18 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

>> +	while (result.nr)
>> +		commit_list_append(queue_pop(&result), &tail);
>> +	queue_clear(&result);
>> +	queue_clear(&list);
>> +	return ret;
>
> I forgot to to port the STALE flag handling here.

Figures..  Thanks.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-29 21:05                     ` Jeff King
@ 2012-08-30 12:54                       ` Jeff King
  2012-08-30 13:03                         ` Jeff King
  2012-08-30 16:23                         ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Jeff King @ 2012-08-30 12:54 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Wed, Aug 29, 2012 at 05:05:40PM -0400, Jeff King wrote:

> You would want this on top:
> [...]
> but t6024 still fails (it clearly is finding a different merge base than
> the test expects).  I'll trace through it, but it will have to be later
> tonight.

The problem in t6024 is caused by the fact that the commit timestamps
for every commit are identical. The linear commit_list has the property
that we always insert a new commit at the end of a chain of commits with
the same timestamp. So it works as a stable priority queue in the sense
that items with the same priority are returned in insertion order.

But the heap-based priority queue does not. Nor can it do so without
extra storage requirements, as heaps are inherently unstable. The
simplest way to make it stable is to add an insertion counter to the
comparison function. The patch below does this, and it resolves the
issue. It does waste one int per queue element.

I think you could also make a priority queue based on a binary search
tree that would be stable. It's slightly less efficient to create an
initial queue (you can heapify in O(n), but building the tree takes O(n
lg n)).  But we do not care about that, as we always build the queue by
inserting elements, anyway.  The other downside of using a tree is that
you would want a self-balancing tree for good performance (especially
since we tend to insert commits in sorted order), which increases the
code complexity.

Anyway, since this isn't yielding any performance benefit, I'm not going
to go down that route. But stability of the queue is something that we
need to consider if we ever do replace commit_list with a different data
structure.

Here's the patch to make the existing priority queue stable (by wasting
space) in case we ever want to use it.

-Peff

diff --git a/commit.c b/commit.c
index 8259871..a99d909 100644
--- a/commit.c
+++ b/commit.c
@@ -593,7 +593,7 @@ static int interesting(struct queue *q)
 {
 	int i;
 	for (i = 0; i < q->nr; i++) {
-		struct commit *commit = q->items[i];
+		struct commit *commit = q->items[i].item;
 		if (commit->object.flags & STALE)
 			continue;
 		return 1;
diff --git a/queue.c b/queue.c
index 7be6b86..1bdd948 100644
--- a/queue.c
+++ b/queue.c
@@ -3,18 +3,28 @@ static void queue_heapify_up(struct queue *pq)
 
 static inline void queue_swap(struct queue *pq, unsigned i, unsigned j)
 {
-	void *tmp = pq->items[i];
+	struct queue_item tmp = pq->items[i];
 	pq->items[i] = pq->items[j];
 	pq->items[j] = tmp;
 }
 
+static inline int queue_cmp(struct queue *pq, unsigned i, unsigned j)
+{
+	int cmp = pq->cmp(pq->items[i].item, pq->items[j].item);
+	if (cmp)
+		return cmp;
+	if (pq->items[i].counter < pq->items[j].counter)
+		return 1;
+	return -1;
+}
+
 static void queue_heapify_up(struct queue *pq)
 {
 	unsigned i = pq->nr - 1;
 	while (i > 0) {
 		int parent = (i-1)/2;
 
-		if (pq->cmp(pq->items[i], pq->items[parent]) <= 0)
+		if (queue_cmp(pq, i, parent) <= 0)
 			return;
 
 		queue_swap(pq, i, parent);
@@ -25,7 +35,9 @@ void queue_insert(struct queue *pq, void *item)
 void queue_insert(struct queue *pq, void *item)
 {
 	ALLOC_GROW(pq->items, pq->nr + 1, pq->alloc);
-	pq->items[pq->nr++] = item;
+	pq->items[pq->nr].item = item;
+	pq->items[pq->nr].counter = pq->counter++;
+	pq->nr++;
 	queue_heapify_up(pq);
 }
 
@@ -35,11 +47,9 @@ static void queue_heapify_down(struct queue *pq)
 	while (1) {
 		int largest = i, left = 2*i + 1, right = 2*i + 2;
 
-		if (left < pq->nr &&
-		    pq->cmp(pq->items[left], pq->items[largest]) > 0)
+		if (left < pq->nr && queue_cmp(pq, left, largest) > 0)
 			largest = left;
-		if (right < pq->nr &&
-		    pq->cmp(pq->items[right], pq->items[largest]) > 0)
+		if (right < pq->nr && queue_cmp(pq, right, largest) > 0)
 			largest = right;
 
 		if (largest == i)
@@ -52,7 +62,7 @@ void *queue_peek(struct queue *pq)
 
 void *queue_peek(struct queue *pq)
 {
-	return pq->nr ? pq->items[0] : NULL;
+	return pq->nr ? pq->items[0].item : NULL;
 }
 
 void *queue_pop(struct queue *pq)
@@ -61,7 +71,7 @@ void *queue_pop(struct queue *pq)
 
 	if (!pq->nr)
 		return NULL;
-	ret = pq->items[0];
+	ret = pq->items[0].item;
 
 	pq->items[0] = pq->items[--pq->nr];
 	queue_heapify_down(pq);
diff --git a/queue.h b/queue.h
index cc471b5..a70f7d7 100644
--- a/queue.h
+++ b/queue.h
@@ -3,11 +3,17 @@ struct queue {
 
 typedef int (*queue_comparison_func_t)(const void *, const void *);
 
+struct queue_item {
+	void *item;
+	unsigned counter;
+};
+
 struct queue {
 	queue_comparison_func_t cmp;
-	void **items;
+	struct queue_item *items;
 	unsigned nr;
 	unsigned alloc;
+	unsigned counter;
 };
 
 void queue_insert(struct queue *pq, void *item);

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 12:54                       ` Jeff King
@ 2012-08-30 13:03                         ` Jeff King
  2012-08-30 13:24                           ` Jeff King
  2012-08-30 16:33                           ` Junio C Hamano
  2012-08-30 16:23                         ` Junio C Hamano
  1 sibling, 2 replies; 27+ messages in thread
From: Jeff King @ 2012-08-30 13:03 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Thu, Aug 30, 2012 at 08:54:21AM -0400, Jeff King wrote:

> On Wed, Aug 29, 2012 at 05:05:40PM -0400, Jeff King wrote:
> 
> > You would want this on top:
> > [...]
> > but t6024 still fails (it clearly is finding a different merge base than
> > the test expects).  I'll trace through it, but it will have to be later
> > tonight.
> 
> The problem in t6024 is caused by the fact that the commit timestamps
> for every commit are identical.

So I was able to have my queue behave just like commit_list by fixing
the stability issue. But I still have no clue what is going on in t6024.
It does this for each commit it makes:

  [...]
  GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 &&
  [...]
  GIT_AUTHOR_DATE="2006-12-12 23:00:01" git commit -m A a1 &&
  [...]

which is just bizarre. At first I thought it was buggy, and that it
really wanted to be setting COMMITTER_DATE (in which case it should
really just be using test_tick, anyway). But if you do that, the test
fails (even using a regular commit_list)!

So is the test buggy? Or are the identical commit timestamps part of the
intended effect? I can't see how that would be, since:

  1. You would need to set COMMITTER_DATE for that anyway, as you are
     otherwise creating a race condition.

  2. Why would you set AUTHOR_DATE? It's not used by the merge code at
     all.

The script originally comes from here:

  http://thread.gmane.org/gmane.comp.version-control.git/33566/focus=33852

and the discussion implies that the AUTHOR_DATEs were added to avoid a
race condition with the timestamps. But why would that ever have worked?

Confused...

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 13:03                         ` Jeff King
@ 2012-08-30 13:24                           ` Jeff King
  2012-08-30 16:33                           ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Jeff King @ 2012-08-30 13:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Thu, Aug 30, 2012 at 09:03:27AM -0400, Jeff King wrote:

> So I was able to have my queue behave just like commit_list by fixing
> the stability issue. But I still have no clue what is going on in t6024.
> It does this for each commit it makes:
> 
>   [...]
>   GIT_AUTHOR_DATE="2006-12-12 23:00:00" git commit -m 1 a1 &&
>   [...]
>   GIT_AUTHOR_DATE="2006-12-12 23:00:01" git commit -m A a1 &&
>   [...]
> 
> which is just bizarre. At first I thought it was buggy, and that it
> really wanted to be setting COMMITTER_DATE (in which case it should
> really just be using test_tick, anyway). But if you do that, the test
> fails (even using a regular commit_list)!
> 
> So is the test buggy? Or are the identical commit timestamps part of the
> intended effect? I can't see how that would be, since:
> 
>   1. You would need to set COMMITTER_DATE for that anyway, as you are
>      otherwise creating a race condition.
> 
>   2. Why would you set AUTHOR_DATE? It's not used by the merge code at
>      all.
> 
> The script originally comes from here:
> 
>   http://thread.gmane.org/gmane.comp.version-control.git/33566/focus=33852
> 
> and the discussion implies that the AUTHOR_DATEs were added to avoid a
> race condition with the timestamps. But why would that ever have worked?

That thread mentions that this is to fix "Shawn's bug", which I think is
this:

  http://article.gmane.org/gmane.comp.version-control.git/33559

IOW, the real thing that t6024 is trying to test is that we handle the
fake empty tree properly. And the AUTHOR_DATEs probably were just there
to try to fix the race condition, and they should really just be
test_ticks (and I can't see how they ever would have helped anything; I
suspect they were a placebo inserted at the same time as another change,
and got credited with fixing the race).

That still leaves the question of why the test fails when the commits
get distinct timestamps.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 12:54                       ` Jeff King
  2012-08-30 13:03                         ` Jeff King
@ 2012-08-30 16:23                         ` Junio C Hamano
  2012-08-30 21:31                           ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-30 16:23 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> Anyway, since this isn't yielding any performance benefit, I'm not going
> to go down that route. But stability of the queue is something that we
> need to consider if we ever do replace commit_list with a different data
> structure.
>
> Here's the patch to make the existing priority queue stable (by wasting
> space) in case we ever want to use it.

Thanks for being thorough and showing a good analysis.

If we want stability, the space to hold insertion counter is not
"wasted", by the way.

I actually think the generic "queue" implementation may want to
handle elements that are not just single pointers.  Just like
qsort() is not about sorting an array of pointers but it is about
sorting an array of elements of caller specified size, perhaps
"queue" can hold elements of size the caller specified (set in stone
when the queue is initialized).

Then, a caller that wants a stable priority queue of commits can
tell the queue to manage "struct { struct commit *c; int gen; }",
use the "gen" field for stability in its comparison callback, etc.,
while a caller that does not need stability can tell the queue to
manage "struct commit *".  That way, the generic queue layer does
not have to worry about wasting the insertion counter space, no?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 13:03                         ` Jeff King
  2012-08-30 13:24                           ` Jeff King
@ 2012-08-30 16:33                           ` Junio C Hamano
  2012-08-30 21:48                             ` Jeff King
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2012-08-30 16:33 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> The script originally comes from here:
>
>   http://thread.gmane.org/gmane.comp.version-control.git/33566/focus=33852
>
> and the discussion implies that the AUTHOR_DATEs were added to avoid a
> race condition with the timestamps. But why would that ever have worked?

I do not see how AUTHOR_DATE would affect anything there, either,
other than to give reprodusible object names.  The test only sets
committer-date upfront, so without setting author-date, you would
still get different object names on commits.  Which suggests me that
there may be something that tiebreaks based on object names?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 16:23                         ` Junio C Hamano
@ 2012-08-30 21:31                           ` Jeff King
  2012-08-30 21:59                             ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-30 21:31 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Thu, Aug 30, 2012 at 09:23:25AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > Anyway, since this isn't yielding any performance benefit, I'm not going
> > to go down that route. But stability of the queue is something that we
> > need to consider if we ever do replace commit_list with a different data
> > structure.
> >
> > Here's the patch to make the existing priority queue stable (by wasting
> > space) in case we ever want to use it.
> 
> Thanks for being thorough and showing a good analysis.
> 
> If we want stability, the space to hold insertion counter is not
> "wasted", by the way.

It is if there is another data structure that will do the same job with
the same performance characteristics and without using that space.

But I'm not even sure it is an issue in practice. There are ~320K
objects in linux-2.6. Even if we inserted _all_ of them at once into a
queue (which we don't; it's usually more like a couple dozen, with a few
pathological cases being equal to the number of refs we have, which is
usually in the hundreds), then we are talking about an extra 2.5M on a
64-bit system. Compared to dropping an O(n^2) operation to O(lg n),
that's probably not even going to be noticeable.

> I actually think the generic "queue" implementation may want to
> handle elements that are not just single pointers.  Just like
> qsort() is not about sorting an array of pointers but it is about
> sorting an array of elements of caller specified size, perhaps
> "queue" can hold elements of size the caller specified (set in stone
> when the queue is initialized).
> 
> Then, a caller that wants a stable priority queue of commits can
> tell the queue to manage "struct { struct commit *c; int gen; }",
> use the "gen" field for stability in its comparison callback, etc.,
> while a caller that does not need stability can tell the queue to
> manage "struct commit *".  That way, the generic queue layer does
> not have to worry about wasting the insertion counter space, no?

Yeah, that would be more generic. One wonky thing I didn't point out in
my implementation is this:

> +static inline int queue_cmp(struct queue *pq, unsigned i, unsigned j)
> +{
> +       int cmp = pq->cmp(pq->items[i].item, pq->items[j].item);
> +       if (cmp)
> +               return cmp;
> +       if (pq->items[i].counter < pq->items[j].counter)
> +               return 1;
> +       return -1;
> +}

Notice how the counter comparison is "backwards" to the regular
comparison. That's because the queue is actually a max-queue (I did it
that way since we are indexing on timestamp, and want to go in reverse
chronological order). But the counter part wants to prioritize minimums,
so you have to reverse it. You could also implement a min-queue and ask
the caller to reverse their comparison function.

But really, a caller could theoretically want to prioritize in either
direction (e.g., giving a LIFO behavior to elements with the same
priority). Pulling the counter out of the queue would allow that.

The reason I didn't, though, is that it would make the interface a huge
pain if the caller had to do this:

  struct stable_commit_queue_element qe;
  qe.commit = commit;
  qe.counter = counter++; /* who is holding the global counter? */
  queue_push(&q, &qe);

instead of:

  queue_push(&q, commit);

I suspect you could build a "queue" object, and then implement a
"stable queue" on top of it with some clever use of function pointers
for the comparison function.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 16:33                           ` Junio C Hamano
@ 2012-08-30 21:48                             ` Jeff King
  2012-08-30 22:16                               ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Jeff King @ 2012-08-30 21:48 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Thomas Rast

On Thu, Aug 30, 2012 at 09:33:48AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > The script originally comes from here:
> >
> >   http://thread.gmane.org/gmane.comp.version-control.git/33566/focus=33852
> >
> > and the discussion implies that the AUTHOR_DATEs were added to avoid a
> > race condition with the timestamps. But why would that ever have worked?
> 
> I do not see how AUTHOR_DATE would affect anything there, either,
> other than to give reprodusible object names.  The test only sets
> committer-date upfront, so without setting author-date, you would
> still get different object names on commits.  Which suggests me that
> there may be something that tiebreaks based on object names?

Hmm. I wouldn't think so. The order should come from timestamps, with
ties broken by order of insertion, which in turn comes from traversal,
which depends only on parent order. We do check some raw sha1s in the
expected output, but it is only for blobs, not commits.  I guess it
could be some weirdness inside merge-recursive, though, and
not part of the merge-base computation.

So I really don't know how AUTHOR_DATE would change anything. And
indeed, after removing them it still passes on my machine.

However, I think I may understand why it fails if you tweak the
committer dates.

When my unstable-queue was used, I noticed that the merge bases returned
were the same (as you would expect), but that they came in a different
order. Which makes sense, as the order of multiple bases would not
necessarily be deterministic (they do not have an ancestry relationship,
or they would not be merge bases).

So the issue is that when you do a recursive merge with multiple bases,
the order in which you visit the recursive bases is going to impact the
exact conflicts you see. In theory, after the merge is done, you're
going to be at the same state, but the conflicts you see along the way
will be different. And it is this conflicted state that the test is
looking at.

The current test expects a particular order of merge bases based on the
same-second commit timestamps. There is no race condition because of the
setting of GIT_COMMITTER_DATE at the beginning (and _that_ is the real
thing that fixed the race conditions Dscho saw in the thread above; the
AUTHOR_DATE was just a red herring).

So the test is not broken or racy, which is good. It is just testing
something that is somewhat of an implementation detail. We could switch
it to use test_tick, and then adjust the expected output to look for the
expected conflict that git happens to generate in that case. But that is
no better than the current behavior.

But I'm not sure there is a way to test what it wants to test (that we
hit a conflict that involves one of the recursive merge bases) without
relying on the implementation detail. So I'm inclined to just leave it
in place.

-Peff

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 21:31                           ` Jeff King
@ 2012-08-30 21:59                             ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-30 21:59 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> Compared to dropping an O(n^2) operation to O(lg n), that's
> probably not even going to be noticeable.

Yeah, that is something we would want to use when we eventually
update the main revision traverser, not this codepath localized to
the merge-base.

Thanks.  I like (and agree with) everything you wrote in this
message.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/2] commit: use a priority queue in merge base functions
  2012-08-30 21:48                             ` Jeff King
@ 2012-08-30 22:16                               ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2012-08-30 22:16 UTC (permalink / raw)
  To: Jeff King; +Cc: git, Thomas Rast

Jeff King <peff@peff.net> writes:

> So the issue is that when you do a recursive merge with multiple bases,
> the order in which you visit the recursive bases is going to impact the
> exact conflicts you see.

Yeah, that explains it.

> So the test is not broken or racy, which is good. It is just testing
> something that is somewhat of an implementation detail. We could switch
> it to use test_tick, and then adjust the expected output to look for the
> expected conflict that git happens to generate in that case. But that is
> no better than the current behavior.

True.

> But I'm not sure there is a way to test what it wants to test (that we
> hit a conflict that involves one of the recursive merge bases) without
> relying on the implementation detail. So I'm inclined to just leave it
> in place.

OK.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2012-08-30 22:17 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-27 23:11 [PATCH 0/5] optimize fast-forward checks Junio C Hamano
2012-08-27 23:11 ` [PATCH 1/5] in_merge_bases(): support only one "other" commit Junio C Hamano
2012-08-27 23:12 ` [PATCH 2/5] receive-pack: use in_merge_bases() for fast-forward check Junio C Hamano
2012-08-27 23:12 ` [PATCH 3/5] http-push: " Junio C Hamano
2012-08-27 23:12 ` [PATCH 4/5] in_merge_bases(): omit unnecessary redundant common ancestor reduction Junio C Hamano
2012-08-27 23:12 ` [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel Junio C Hamano
2012-08-28  1:25   ` Junio C Hamano
2012-08-28 21:35     ` Junio C Hamano
2012-08-28 23:39       ` Junio C Hamano
2012-08-29 11:08         ` Jeff King
2012-08-29 11:10           ` [PATCH 1/2] basic priority queue implementation Jeff King
2012-08-29 11:11           ` [PATCH 2/2] commit: use a priority queue in merge base functions Jeff King
2012-08-29 16:36             ` Junio C Hamano
2012-08-29 20:53               ` Jeff King
2012-08-29 20:55                 ` Jeff King
2012-08-29 21:00                   ` Jeff King
2012-08-29 21:05                     ` Jeff King
2012-08-30 12:54                       ` Jeff King
2012-08-30 13:03                         ` Jeff King
2012-08-30 13:24                           ` Jeff King
2012-08-30 16:33                           ` Junio C Hamano
2012-08-30 21:48                             ` Jeff King
2012-08-30 22:16                               ` Junio C Hamano
2012-08-30 16:23                         ` Junio C Hamano
2012-08-30 21:31                           ` Jeff King
2012-08-30 21:59                             ` Junio C Hamano
2012-08-29 21:18                     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).