From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Fri, 7 Apr 2017 15:53:06 +0000 [thread overview]
Message-ID: <20170407155306.42375-2-git@jeffhostetler.com> (raw)
In-Reply-To: <20170407155306.42375-1-git@jeffhostetler.com>
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.
In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1. In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.
This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().
================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.
(avg) before: 22.699
(avg) after: 21.955
===============
================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:
before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32)
0004.6: switch commit aliases (1003037) 11.95(8.20+3.14)
after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.02(7.71+2.78)
0004.6: switch commit aliases (1003037) 10.95(7.57+2.82)
================
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
unpack-trees.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..143c5d9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
return ret;
}
+static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
+{
+ return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
+}
+
static int traverse_trees_recursive(int n, unsigned long dirmask,
unsigned long df_conflicts,
struct name_entry *names,
@@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
newinfo.df_conflicts |= df_conflicts;
for (i = 0; i < n; i++, dirmask >>= 1) {
- const unsigned char *sha1 = NULL;
- if (dirmask & 1)
- sha1 = names[i].oid->hash;
- buf[i] = fill_tree_descriptor(t+i, sha1);
+ if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
+ /* implicitly borrow buf[i-1] inside tree_desc[i] */
+ memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
+ buf[i] = NULL;
+ } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
+ /* implicitly borrow buf[i-2] inside tree_desc[i] */
+ memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
+ buf[i] = NULL;
+ } else {
+ const unsigned char *sha1 = NULL;
+ if (dirmask & 1)
+ sha1 = names[i].oid->hash;
+ buf[i] = fill_tree_descriptor(t+i, sha1);
+ }
}
bottom = switch_cache_bottom(&newinfo);
--
2.9.3
next prev parent reply other threads:[~2017-04-07 15:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-07 15:53 ` git [this message]
2017-04-08 14:06 ` René Scharfe
2017-04-10 20:55 ` Jeff King
2017-04-10 21:28 ` Jeff Hostetler
2017-04-10 21:26 ` Jeff Hostetler
2017-04-10 23:09 ` René Scharfe
2017-04-11 20:42 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170407155306.42375-2-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.