git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
	Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v1] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Thu,  6 Apr 2017 20:37:32 +0000	[thread overview]
Message-ID: <20170406203732.47714-2-git@jeffhostetler.com> (raw)
In-Reply-To: <20170406203732.47714-1-git@jeffhostetler.com>

From: Jeff Hostetler <jeffhost@microsoft.com>

Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.

In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1.  In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.

TODO This change is a first attempt to test that by comparing
TODO the hashes of name[i] and name[i-i] and simply copying
TODO the tree-descriptor data.  I was thinking of the n=2
TODO case here.  We may want to extend this to the n=3 case.

================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.

(avg) before: 22.699
(avg) after:  21.955
===============

================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:

before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.99(8.12+3.32)
0004.6: switch commit aliases (1003037)    11.95(8.20+3.14)

after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037)       11.17(7.84+2.76)
0004.6: switch commit aliases (1003037)    11.13(7.82+2.72)
================

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
 tree-walk.c    |  8 ++++++++
 tree-walk.h    |  1 +
 unpack-trees.c | 13 +++++++++----
 3 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index ff77605..3b82f0e 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -92,6 +92,14 @@ void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1)
 	return buf;
 }
 
+void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src)
+{
+	void *buf = xmalloc(src->size);
+	memcpy(buf, src->buffer, src->size);
+	init_tree_desc(dest, buf, src->size);
+	return buf;
+}
+
 static void entry_clear(struct name_entry *a)
 {
 	memset(a, 0, sizeof(*a));
diff --git a/tree-walk.h b/tree-walk.h
index 68bb78b..ca4032b 100644
--- a/tree-walk.h
+++ b/tree-walk.h
@@ -43,6 +43,7 @@ int tree_entry(struct tree_desc *, struct name_entry *);
 int tree_entry_gently(struct tree_desc *, struct name_entry *);
 
 void *fill_tree_descriptor(struct tree_desc *desc, const unsigned char *sha1);
+void *copy_tree_descriptor(struct tree_desc *dest, const struct tree_desc *src);
 
 struct traverse_info;
 typedef int (*traverse_callback_t)(int n, unsigned long mask, unsigned long dirmask, struct name_entry *entry, struct traverse_info *);
diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..50aacad 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -554,10 +554,15 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
 	newinfo.df_conflicts |= df_conflicts;
 
 	for (i = 0; i < n; i++, dirmask >>= 1) {
-		const unsigned char *sha1 = NULL;
-		if (dirmask & 1)
-			sha1 = names[i].oid->hash;
-		buf[i] = fill_tree_descriptor(t+i, sha1);
+		if (i > 0 && (dirmask & 1) && names[i].oid && names[i-1].oid &&
+			!hashcmp(names[i].oid->hash, names[i-1].oid->hash)) {
+			buf[i] = copy_tree_descriptor(&t[i], &t[i-1]);
+		} else {
+			const unsigned char *sha1 = NULL;
+			if (dirmask & 1)
+				sha1 = names[i].oid->hash;
+			buf[i] = fill_tree_descriptor(t+i, sha1);
+		}
 	}
 
 	bottom = switch_cache_bottom(&newinfo);
-- 
2.9.3


  reply	other threads:[~2017-04-06 20:37 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-04-06 20:37 [PATCH v1] WIP unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-06 20:37 ` git [this message]
2017-04-06 22:48   ` [PATCH v1] " Stefan Beller
2017-04-07  5:19     ` Jeff King
2017-04-07 13:51       ` Jeff Hostetler
2017-04-07 17:35     ` Jeff Hostetler
2017-04-07  0:32   ` René Scharfe
2017-04-07 13:57     ` Jeff Hostetler

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170406203732.47714-2-git@jeffhostetler.com \
    --to=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jeffhost@microsoft.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).