From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout
Date: Fri, 7 Apr 2017 15:53:06 +0000 [thread overview]
Message-ID: <20170407155306.42375-2-git@jeffhostetler.com> (raw)
In-Reply-To: <20170407155306.42375-1-git@jeffhostetler.com>
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach traverse_trees_recursive() to not do redundant ODB
lookups when both directories refer to the same OID.
In operations such as read-tree, checkout, and merge when
the differences between the commits are relatively small,
there will likely be many directories that have the same
SHA-1. In these cases we can avoid hitting the ODB multiple
times for the same SHA-1.
This patch handles n=2 and n=3 cases and simply copies the
data rather than repeating the fill_tree_descriptor().
================
On the Windows repo (500K trees, 3.1M files, 450MB index),
this reduced the overall time by 0.75 seconds when cycling
between 2 commits with a single file difference.
(avg) before: 22.699
(avg) after: 21.955
===============
================
Using the p0004-read-tree test (posted earlier this week)
with 1M files on Linux:
before:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.99(8.12+3.32)
0004.6: switch commit aliases (1003037) 11.95(8.20+3.14)
after:
$ ./p0004-read-tree.sh
0004.5: switch work1 work2 (1003037) 11.02(7.71+2.78)
0004.6: switch commit aliases (1003037) 10.95(7.57+2.82)
================
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
unpack-trees.c | 23 +++++++++++++++++++----
1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19..143c5d9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -531,6 +531,11 @@ static int switch_cache_bottom(struct traverse_info *info)
return ret;
}
+static inline int are_same_oid(struct name_entry *name_j, struct name_entry *name_k)
+{
+ return name_j->oid && name_k->oid && !oidcmp(name_j->oid, name_k->oid);
+}
+
static int traverse_trees_recursive(int n, unsigned long dirmask,
unsigned long df_conflicts,
struct name_entry *names,
@@ -554,10 +559,20 @@ static int traverse_trees_recursive(int n, unsigned long dirmask,
newinfo.df_conflicts |= df_conflicts;
for (i = 0; i < n; i++, dirmask >>= 1) {
- const unsigned char *sha1 = NULL;
- if (dirmask & 1)
- sha1 = names[i].oid->hash;
- buf[i] = fill_tree_descriptor(t+i, sha1);
+ if (i > 0 && are_same_oid(&names[i], &names[i-1])) {
+ /* implicitly borrow buf[i-1] inside tree_desc[i] */
+ memcpy(&t[i], &t[i-1], sizeof(struct tree_desc));
+ buf[i] = NULL;
+ } else if (i > 1 && are_same_oid(&names[i], &names[i-2])) {
+ /* implicitly borrow buf[i-2] inside tree_desc[i] */
+ memcpy(&t[i], &t[i-2], sizeof(struct tree_desc));
+ buf[i] = NULL;
+ } else {
+ const unsigned char *sha1 = NULL;
+ if (dirmask & 1)
+ sha1 = names[i].oid->hash;
+ buf[i] = fill_tree_descriptor(t+i, sha1);
+ }
}
bottom = switch_cache_bottom(&newinfo);
--
2.9.3
next prev parent reply other threads:[~2017-04-07 15:53 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-07 15:53 [PATCH v2] unpack-trees: avoid duplicate ODB lookups during checkout git
2017-04-07 15:53 ` git [this message]
2017-04-08 14:06 ` René Scharfe
2017-04-10 20:55 ` Jeff King
2017-04-10 21:28 ` Jeff Hostetler
2017-04-10 21:26 ` Jeff Hostetler
2017-04-10 23:09 ` René Scharfe
2017-04-11 20:42 ` Jeff Hostetler
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170407155306.42375-2-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).