All of lore.kernel.org
 help / color / mirror / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: "git@vger.kernel.org" <git@vger.kernel.org>
Cc: "peff@peff.net" <peff@peff.net>,
	"sbeller@google.com" <sbeller@google.com>,
	"jnareb@gmail.com" <jnareb@gmail.com>,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear
Date: Fri, 29 Jun 2018 16:13:00 +0000	[thread overview]
Message-ID: <20180629161223.229661-12-dstolee@microsoft.com> (raw)
In-Reply-To: <20180629161223.229661-1-dstolee@microsoft.com>

The can_all_from_reach_with_flags() algorithm is currently quadratic in
the worst case, because it calls the reachable() method for every 'from'
without tracking which commits have already been walked or which can
already reach a commit in 'to'.

Rewrite the algorithm to walk each commit a constant number of times.

We also add some optimizations that should work for the main consumer of
this method: fetch negotitation (haves/wants).

The first step includes using a depth-first-search (DFS) from each from
commit, sorted by ascending generation number. We do not walk beyond the
minimum generation number or the minimum commit date. This DFS is likely
to be faster than the existing reachable() method because we expect
previous ref values to be along the first-parent history.

If we find a target commit, then we mark everything in the DFS stack as
a RESULT. This expands the set of targets for the other from commits. We
also mark the visited commits using 'assign_flag' to prevent re-walking
the same code.

We still need to clear our flags at the end, which is why we will have a
total of three visits to each commit.

Performance was measured on the Linux repository using
'test-tool reach can_all_from_reach'. The input included rows seeded by
tag values. The "small" case included X-rows as v4.[0-9]* and Y-rows as
v3.[0-9]*. This mimics a (very large) fetch that says "I have all major
v3 releases and want all major v4 releases." The "large" case included
X-rows as "v4.*" and Y-rows as "v3.*". This adds all release-candidate
tags to the set, which does not greatly increase the number of objects
that are considered, but does increase the number of 'from' commits,
demonstrating the quadratic nature of the previous code.

Small Case
----------

Before: 1.45 s
 After: 0.34 s

Large Case
----------

Before: 5.83 s
 After: 0.37 s

Note how the time increases between the two cases in the two versions.
The new code increases relative to the number of commits that need to be
walked, but not directly relative to the number of 'from' commits.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-reach.c | 122 ++++++++++++++++++++++++++++++-------------------
 commit-reach.h |   3 +-
 upload-pack.c  |   5 +-
 3 files changed, 82 insertions(+), 48 deletions(-)

diff --git a/commit-reach.c b/commit-reach.c
index 992ad5cdc7..8e24455d9f 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -530,64 +530,88 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
 	return is_descendant_of(commit, list);
 }
 
-static int reachable(struct commit *from, int with_flag, int assign_flag, time_t min_commit_date)
+static int compare_commits_by_gen(const void *_a, const void *_b)
 {
-	struct prio_queue work = { compare_commits_by_commit_date };
+	const struct commit *a = (const struct commit *)_a;
+	const struct commit *b = (const struct commit *)_b;
 
-	prio_queue_put(&work, from);
-	while (work.nr) {
-		struct commit_list *list;
-		struct commit *commit = prio_queue_get(&work);
-
-		if (commit->object.flags & with_flag) {
-			from->object.flags |= assign_flag;
-			break;
-		}
-		if (!commit->object.parsed)
-			parse_object(&commit->object.oid);
-		if (commit->object.flags & TMP_MARK)
-			continue;
-		commit->object.flags |= TMP_MARK;
-		if (commit->date < min_commit_date)
-			continue;
-		for (list = commit->parents; list; list = list->next) {
-			struct commit *parent = list->item;
-			if (!(parent->object.flags & TMP_MARK))
-				prio_queue_put(&work, parent);
-		}
-	}
-	from->object.flags |= TMP_MARK;
-	clear_commit_marks(from, TMP_MARK);
-	clear_prio_queue(&work);
-	return (from->object.flags & assign_flag);
+	if (a->generation < b->generation)
+		return -1;
+	if (a->generation > b->generation)
+		return 1;
+	return 0;
 }
 
 int can_all_from_reach_with_flag(struct object_array *from,
 				 int with_flag, int assign_flag,
-				 time_t min_commit_date)
+				 time_t min_commit_date,
+				 uint32_t min_generation)
 {
+	struct commit **list = NULL;
 	int i;
+	int result = 1;
 
+	ALLOC_ARRAY(list, from->nr);
 	for (i = 0; i < from->nr; i++) {
-		struct object *from_one = from->objects[i].item;
+		list[i] = (struct commit *)from->objects[i].item;
 
-		if (from_one->flags & assign_flag)
-			continue;
-		from_one = deref_tag(from_one, "a from object", 0);
-		if (!from_one || from_one->type != OBJ_COMMIT) {
-			/* no way to tell if this is reachable by
-			 * looking at the ancestry chain alone, so
-			 * leave a note to ourselves not to worry about
-			 * this object anymore.
-			 */
-			from->objects[i].item->flags |= assign_flag;
-			continue;
-		}
-		if (!reachable((struct commit *)from_one, with_flag,
-			       assign_flag, min_commit_date))
+		parse_commit(list[i]);
+
+		if (list[i]->generation < min_generation)
 			return 0;
 	}
-	return 1;
+
+	QSORT(list, from->nr, compare_commits_by_gen);
+
+	for (i = 0; i < from->nr; i++) {
+		/* DFS from list[i] */
+		struct commit_list *stack = NULL;
+
+		list[i]->object.flags |= assign_flag;
+		commit_list_insert(list[i], &stack);
+
+		while (stack) {
+			struct commit_list *parent;
+
+			if (stack->item->object.flags & with_flag) {
+				pop_commit(&stack);
+				continue;
+			}
+
+			for (parent = stack->item->parents; parent; parent = parent->next) {
+				if (parent->item->object.flags & (with_flag | RESULT))
+					stack->item->object.flags |= RESULT;
+
+				if (!(parent->item->object.flags & assign_flag)) {
+					parent->item->object.flags |= assign_flag;
+
+					parse_commit(parent->item);
+
+					if (parent->item->date < min_commit_date ||
+					    parent->item->generation < min_generation)
+						continue;
+
+					commit_list_insert(parent->item, &stack);
+					break;
+				}
+			}
+
+			if (!parent)
+				pop_commit(&stack);
+		}
+
+		if (!(list[i]->object.flags & (with_flag | RESULT))) {
+			result = 0;
+			goto cleanup;
+		}
+	}
+
+cleanup:
+	for (i = 0; i < from->nr; i++) {
+		clear_commit_marks(list[i], RESULT);
+		clear_commit_marks(list[i], assign_flag);
+	}
+	return result;
 }
 
 int can_all_from_reach(struct commit_list *from, struct commit_list *to)
@@ -597,6 +621,7 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to)
 	struct commit_list *from_iter = from;
 	struct commit_list *to_iter = to;
 	int result;
+	uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
 	while (from_iter) {
 		add_object_array(&from_iter->item->object, NULL, &from_objs);
@@ -608,16 +633,21 @@ int can_all_from_reach(struct commit_list *from, struct commit_list *to)
 	}
 
 	while (to_iter) {
+		parse_commit(to_iter->item);
+
 		if (to_iter->item->date < min_commit_date)
 			min_commit_date = to_iter->item->date;
 
+		if (to_iter->item->generation < min_generation)
+			min_generation = to_iter->item->generation;
+
 		to_iter->item->object.flags |= PARENT2;
 
 		to_iter = to_iter->next;
 	}
 
 	result = can_all_from_reach_with_flag(&from_objs, PARENT2, PARENT1,
-					      min_commit_date);
+					      min_commit_date, min_generation);
 
 	while (from) {
 		clear_commit_marks(from->item, PARENT1);
diff --git a/commit-reach.h b/commit-reach.h
index 8ab06af2eb..3eb4c057e6 100644
--- a/commit-reach.h
+++ b/commit-reach.h
@@ -28,7 +28,8 @@ int commit_contains(struct ref_filter *filter, struct commit *commit,
  * such commits with 'assign_flag'.
  */
 int can_all_from_reach_with_flag(struct object_array *from, int with_flag,
-				 int assign_flag, time_t min_commit_date);
+				 int assign_flag, time_t min_commit_date,
+				 uint32_t min_generation);
 
 int can_all_from_reach(struct commit_list *from, struct commit_list *to);
 
diff --git a/upload-pack.c b/upload-pack.c
index e59fbca257..95ba077e2b 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -336,11 +336,14 @@ static int got_oid(const char *hex, struct object_id *oid)
 
 static int ok_to_give_up(void)
 {
+	uint32_t min_generation = GENERATION_NUMBER_ZERO;
+
 	if (!have_obj.nr)
 		return 0;
 
 	return can_all_from_reach_with_flag(&want_obj, THEY_HAVE,
-					    COMMON_KNOWN, oldest_have);
+					    COMMON_KNOWN, oldest_have,
+					    min_generation);
 }
 
 static int get_common_commits(void)
-- 
2.18.0.118.gd4f65b8d14


  parent reply	other threads:[~2018-06-29 16:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-06-29 16:12 [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 01/13] commit-reach: move walk methods from commit.c Derrick Stolee
2018-06-29 21:35   ` Stefan Beller
2018-06-29 21:52   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 02/13] commit-reach: move ref_newer from remote.c Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 03/13] commit-reach: move commit_contains from ref-filter Derrick Stolee
2018-06-29 21:38   ` Stefan Beller
2018-06-30  1:32     ` Derrick Stolee
2018-06-29 22:00   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 04/13] upload-pack: make reachable() more generic Derrick Stolee
2018-06-29 22:05   ` Junio C Hamano
2018-06-29 16:12 ` [RFC PATCH 05/13] upload-pack: refactor ok_to_give_up() Derrick Stolee
2018-06-29 21:44   ` Stefan Beller
2018-06-29 16:12 ` [RFC PATCH 06/13] commit-reach: move can_all_from_reach_with_flag() Derrick Stolee
2018-06-29 21:47   ` Stefan Beller
2018-06-30  1:35     ` Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 07/13] test-reach Derrick Stolee
2018-06-29 21:54   ` Stefan Beller
2018-06-30  1:40     ` Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 08/13] test-reach: test reduce_heads() Derrick Stolee
2018-06-29 22:06   ` Stefan Beller
2018-06-29 16:12 ` [RFC PATCH 09/13] commit-reach: test can_all_from_reach Derrick Stolee
2018-06-29 16:12 ` [RFC PATCH 10/13] commit-reach: test is_descendant_of Derrick Stolee
2018-06-29 16:13 ` Derrick Stolee [this message]
2018-06-29 23:18   ` [RFC PATCH 11/13] commit-reach: make can_all_from_reach... linear Stefan Beller
2018-06-29 16:13 ` [RFC PATCH 12/13] commit-reach: use is_descendant_of for ref_newer Derrick Stolee
2018-06-29 16:13 ` [RFC PATCH 13/13] commit-reach: use can_all_from_reach Derrick Stolee
2018-06-29 23:21   ` Stefan Beller
2018-06-29 17:33 ` [RFC PATCH 00/13] Consolidate reachability logic Derrick Stolee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180629161223.229661-12-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=git@vger.kernel.org \
    --cc=jnareb@gmail.com \
    --cc=peff@peff.net \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.