From: git@jeffhostetler.com
To: git@vger.kernel.org
Cc: gitster@pobox.com, peff@peff.net,
Jeff Hostetler <jeffhost@microsoft.com>
Subject: [PATCH v12 4/5] read-cache: speed up has_dir_name (part 1)
Date: Wed, 19 Apr 2017 17:06:17 +0000 [thread overview]
Message-ID: <20170419170618.16535-5-git@jeffhostetler.com> (raw)
In-Reply-To: <20170419170618.16535-1-git@jeffhostetler.com>
From: Jeff Hostetler <jeffhost@microsoft.com>
Teach has_dir_name() to see if the path of the new item
is greater than the last path in the index array before
attempting to search for it.
has_dir_name() is looking for file/directory collisions
in the index and has to consider each sub-directory
prefix in turn. This can cause multiple binary searches
for each path.
During operations like checkout, merge_working_tree()
populates the new index in sorted order, so we expect
to be able to append in many cases.
This commit is part 1 of 2. This commit handles the top
of has_dir_name() and the trivial optimization.
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
---
read-cache.c | 45 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/read-cache.c b/read-cache.c
index 6a27688..9af0bd4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -910,6 +910,9 @@ int strcmp_offset(const char *s1, const char *s2, size_t *first_change)
/*
* Do we have another file with a pathname that is a proper
* subset of the name we're trying to add?
+ *
+ * That is, is there another file in the index with a path
+ * that matches a sub-directory in the given entry?
*/
static int has_dir_name(struct index_state *istate,
const struct cache_entry *ce, int pos, int ok_to_replace)
@@ -918,6 +921,48 @@ static int has_dir_name(struct index_state *istate,
int stage = ce_stage(ce);
const char *name = ce->name;
const char *slash = name + ce_namelen(ce);
+ size_t len_eq_last;
+ int cmp_last = 0;
+
+ /*
+ * We are frequently called during an iteration on a sorted
+ * list of pathnames and while building a new index. Therefore,
+ * there is a high probability that this entry will eventually
+ * be appended to the index, rather than inserted in the middle.
+ * If we can confirm that, we can avoid binary searches on the
+ * components of the pathname.
+ *
+ * Compare the entry's full path with the last path in the index.
+ */
+ if (istate->cache_nr > 0) {
+ cmp_last = strcmp_offset(name,
+ istate->cache[istate->cache_nr - 1]->name,
+ &len_eq_last);
+ if (cmp_last > 0) {
+ if (len_eq_last == 0) {
+ /*
+ * The entry sorts AFTER the last one in the
+ * index and their paths have no common prefix,
+ * so there cannot be a F/D conflict.
+ */
+ return retval;
+ } else {
+ /*
+ * The entry sorts AFTER the last one in the
+ * index, but has a common prefix. Fall through
+ * to the loop below to disect the entry's path
+ * and see where the difference is.
+ */
+ }
+ } else if (cmp_last == 0) {
+ /*
+ * The entry exactly matches the last one in the
+ * index, but because of multiple stage and CE_REMOVE
+ * items, we fall through and let the regular search
+ * code handle it.
+ */
+ }
+ }
for (;;) {
int len;
--
2.9.3
next prev parent reply other threads:[~2017-04-19 17:06 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-04-19 17:06 [PATCH v12 0/5] read-cache: speed up add_index_entry git
2017-04-19 17:06 ` [PATCH v12 1/5] read-cache: add strcmp_offset function git
2017-04-19 17:06 ` [PATCH v12 2/5] p0006-read-tree-checkout: perf test to time read-tree git
2017-04-19 17:06 ` [PATCH v12 3/5] read-cache: speed up add_index_entry during checkout git
2017-04-19 17:06 ` git [this message]
2017-04-19 17:06 ` [PATCH v12 5/5] read-cache: speed up has_dir_name (part 2) git
2020-07-04 17:27 ` SZEDER Gábor
2020-07-05 5:27 ` René Scharfe
2020-07-06 6:39 ` Junio C Hamano
2020-07-08 13:49 ` Jeff Hostetler
2020-07-16 17:11 ` René Scharfe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20170419170618.16535-5-git@jeffhostetler.com \
--to=git@jeffhostetler.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=jeffhost@microsoft.com \
--cc=peff@peff.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.