git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Derrick Stolee <dstolee@microsoft.com>
To: git@vger.kernel.org
Cc: avarab@gmail.com, sbeller@google.com, larsxschneider@gmail.com,
	Derrick Stolee <dstolee@microsoft.com>
Subject: [PATCH 3/3] commit-graph: lazy-load trees
Date: Tue,  3 Apr 2018 08:00:57 -0400	[thread overview]
Message-ID: <20180403120057.173849-4-dstolee@microsoft.com> (raw)
In-Reply-To: <20180403120057.173849-1-dstolee@microsoft.com>

The commit-graph file provides quick access to commit data, including
the OID of the root tree for each commit in the graph. When performing
a deep commit-graph walk, we may not need to load most of the trees
for these commits.

Delay loading the tree object for a commit loaded from the graph
until requested via get_commit_tree(). Do not lazy-load trees for
commits not in the graph, since that requires duplicate parsing
and the relative peformance improvement when trees are not needed
is small.

On the Linux repository, performance tests were run for the following
command:

	git log --graph --oneline -1000

Before: 0.83s
After:  0.65s
Rel %: -21.6%

Adding '-- kernel/' to the command requires loading the root tree
for every commit that is walked. There was no measureable performance
change as a result of this patch.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 commit-graph.c | 25 ++++++++++++++++++++++---
 commit-graph.h |  7 +++++++
 commit.c       | 10 ++++++++--
 3 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 3080a87940..a3eeb25f22 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -247,7 +247,6 @@ static struct commit_list **insert_parent_or_die(struct commit_graph *g,
 
 static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t pos)
 {
-	struct object_id oid;
 	uint32_t edge_value;
 	uint32_t *parent_data_ptr;
 	uint64_t date_low, date_high;
@@ -257,8 +256,7 @@ static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, uin
 	item->object.parsed = 1;
 	item->graph_pos = pos;
 
-	hashcpy(oid.hash, commit_data);
-	item->tree = lookup_tree(&oid);
+	item->tree = NULL;
 
 	date_high = get_be32(commit_data + g->hash_len + 8) & 0x3;
 	date_low = get_be32(commit_data + g->hash_len + 12);
@@ -317,6 +315,27 @@ int parse_commit_in_graph(struct commit *item)
 	return 0;
 }
 
+static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit *c)
+{
+	struct object_id oid;
+	const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len + 16) * (c->graph_pos);
+
+	hashcpy(oid.hash, commit_data);
+	c->tree = lookup_tree(&oid);
+
+	return c->tree;
+}
+
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+	if (c->tree)
+		return c->tree;
+	if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
+		BUG("get_commit_tree_in_graph called from non-commit-graph commit");
+
+	return load_tree_for_commit(commit_graph, (struct commit *)c);
+}
+
 static void write_graph_chunk_fanout(struct hashfile *f,
 				     struct commit **commits,
 				     int nr_commits)
diff --git a/commit-graph.h b/commit-graph.h
index e1d8580c98..3ab45818e2 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -17,6 +17,13 @@ char *get_commit_graph_filename(const char *obj_dir);
  */
 int parse_commit_in_graph(struct commit *item);
 
+/*
+ * For performance reasons, a commit loaded from the graph does not
+ * have a tree loaded until trying to consume it for the first time.
+ * Load that tree into the commit and return the object.
+ */
+struct tree *get_commit_tree_in_graph(const struct commit *c);
+
 struct commit_graph {
 	int graph_fd;
 
diff --git a/commit.c b/commit.c
index d65c7b3b47..d4293ae8f6 100644
--- a/commit.c
+++ b/commit.c
@@ -298,12 +298,18 @@ void free_commit_buffer(struct commit *commit)
 
 struct tree *get_commit_tree(const struct commit *commit)
 {
-	return commit->tree;
+	if (commit->tree || !commit->object.parsed)
+		return commit->tree;
+
+	if (commit->graph_pos == COMMIT_NOT_FROM_GRAPH)
+		BUG("commit has NULL tree, but was not loaded from commit-graph");
+
+	return get_commit_tree_in_graph(commit);
 }
 
 struct object_id *get_commit_tree_oid(const struct commit *commit)
 {
-	return &commit->tree->object.oid;
+	return &get_commit_tree(commit)->object.oid;
 }
 
 const void *detach_commit_buffer(struct commit *commit, unsigned long *sizep)
-- 
2.17.0.20.g9f30ba16e1


  parent reply	other threads:[~2018-04-03 12:01 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-03 12:00 [PATCH 0/3] Lazy-load trees when reading commit-graph Derrick Stolee
2018-04-03 12:00 ` [PATCH 1/3] commit: create get_commit_tree() method Derrick Stolee
2018-04-03 12:00 ` [PATCH 2/3] treewide: use get_commit_tree() for tree access Derrick Stolee
2018-04-03 12:00 ` Derrick Stolee [this message]
2018-04-03 18:00   ` [PATCH 3/3] commit-graph: lazy-load trees Stefan Beller
2018-04-03 18:22     ` Derrick Stolee
2018-04-03 18:37       ` Stefan Beller
2018-04-03 12:15 ` [PATCH 0/3] Lazy-load trees when reading commit-graph Derrick Stolee
2018-04-03 13:06 ` Jeff King
2018-04-03 13:14   ` Derrick Stolee
2018-04-03 20:20     ` Jeff King
2018-04-04 12:08       ` Derrick Stolee
2018-04-06 19:09 ` [PATCH v2 0/4] " Derrick Stolee
2018-04-06 19:09   ` [PATCH v2 1/4] treewide: rename tree to maybe_tree Derrick Stolee
2018-04-06 19:09   ` [PATCH v2 2/4] commit: create get_commit_tree() method Derrick Stolee
2018-04-06 19:09   ` [PATCH v2 3/4] treewide: replace maybe_tree with accessor methods Derrick Stolee
2018-04-06 19:09   ` [PATCH v2 4/4] commit-graph: lazy-load trees for commits Derrick Stolee
2018-04-06 19:21   ` [PATCH v2 0/4] Lazy-load trees when reading commit-graph Jeff King
2018-04-06 19:41     ` Derrick Stolee
2018-04-06 19:45     ` Stefan Beller
2018-04-08 23:18     ` Junio C Hamano
2018-04-09 13:15       ` Derrick Stolee
2018-04-09 17:25         ` Stefan Beller
2018-04-07 18:40   ` Jakub Narebski
2018-04-08  1:17     ` Derrick Stolee
2018-04-11 20:41       ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180403120057.173849-4-dstolee@microsoft.com \
    --to=dstolee@microsoft.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=larsxschneider@gmail.com \
    --cc=sbeller@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).