From: Junio C Hamano <gitster@pobox.com>
To: Jeff Hostetler <Jeff.Hostetler@microsoft.com>
Cc: "git\@vger.kernel.org" <git@vger.kernel.org>,
Johannes Schindelin <johannes.schindelin@gmx.de>
Subject: Re: [PATCH 3/5] name-hash: precompute hash values during preload-index
Date: Sun, 19 Feb 2017 13:45:20 -0800 [thread overview]
Message-ID: <xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com> (raw)
In-Reply-To: <MWHPR03MB295845950BB87BA9479E973E8A5F0@MWHPR03MB2958.namprd03.prod.outlook.com> (Jeff Hostetler's message of "Sun, 19 Feb 2017 00:19:58 +0000")
Jeff Hostetler <Jeff.Hostetler@microsoft.com> writes:
> I looked at doing this, but I didn't think the complexity and overhead to
> forward search for peers at the current level didn't warrant the limited gains.
It seems that I wasn't clear what I meant. I didn't mean anything
complex like what you said.
Just something simple, like this on top of yours, that passes and
compares with only the previous one. I do not know if that gives
any gain, though ;-).
cache.h | 2 +-
name-hash.c | 11 +++++++++--
preload-index.c | 4 +++-
3 files changed, 13 insertions(+), 4 deletions(-)
diff --git a/cache.h b/cache.h
index 390aa803df..bd2980f6e3 100644
--- a/cache.h
+++ b/cache.h
@@ -233,7 +233,7 @@ struct cache_entry {
#error "CE_EXTENDED_FLAGS out of range"
#endif
-void precompute_istate_hashes(struct cache_entry *ce);
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev);
/* Forward structure decls */
struct pathspec;
diff --git a/name-hash.c b/name-hash.c
index f95054f44c..5e09b79170 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -300,7 +300,7 @@ void free_name_hash(struct index_state *istate)
* non-skip-worktree items (since status should not observe skipped items), but
* because lazy_init_name_hash() hashes everything, we force it here.
*/
-void precompute_istate_hashes(struct cache_entry *ce)
+void precompute_istate_hashes(struct cache_entry *ce, struct cache_entry *prev)
{
int namelen = ce_namelen(ce);
@@ -312,7 +312,14 @@ void precompute_istate_hashes(struct cache_entry *ce)
ce->precomputed_hash.root_entry = 1;
} else {
namelen--;
- ce->precomputed_hash.dir = memihash(ce->name, namelen);
+
+ if (prev &&
+ prev->precomputed_hash.initialized &&
+ namelen <= ce_namelen(prev) &&
+ !memcmp(ce->name, prev->name, namelen))
+ ce->precomputed_hash.dir = prev->precomputed_hash.dir;
+ else
+ ce->precomputed_hash.dir = memihash(ce->name, namelen);
ce->precomputed_hash.name = memihash_continue(
ce->precomputed_hash.dir, ce->name + namelen,
ce_namelen(ce) - namelen);
diff --git a/preload-index.c b/preload-index.c
index 602737f9d0..784378ffac 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -37,6 +37,7 @@ static void *preload_thread(void *_data)
struct thread_data *p = _data;
struct index_state *index = p->index;
struct cache_entry **cep = index->cache + p->offset;
+ struct cache_entry *previous = NULL;
struct cache_def cache = CACHE_DEF_INIT;
nr = p->nr;
@@ -47,7 +48,8 @@ static void *preload_thread(void *_data)
struct cache_entry *ce = *cep++;
struct stat st;
- precompute_istate_hashes(ce);
+ precompute_istate_hashes(ce, previous);
+ previous = ce;
if (ce_stage(ce))
continue;
> (I was just looking at the complexity of clear_ce_flags_1() in unpack-trees.c
> and how hard it has to look to find the end of the current directory and the
> effect that that has on the recursion and it felt like too much work for the
> potential gain.)
>
> Whereas remembering the previous one was basically free. Granted, it only
> helps us for adjacent files in the index, so it's not perfect, but gives us the
> best bang for the buck.
>
> Jeff
next prev parent reply other threads:[~2017-02-19 21:54 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-02-14 11:31 [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Johannes Schindelin
2017-02-14 11:31 ` [PATCH 1/5] name-hash: eliminate duplicate memihash call Johannes Schindelin
2017-02-14 11:32 ` [PATCH 2/5] hashmap: allow memihash computation to be continued Johannes Schindelin
2017-02-18 5:35 ` Junio C Hamano
2017-02-20 12:43 ` Johannes Schindelin
2017-02-20 20:27 ` Junio C Hamano
2017-02-14 11:32 ` [PATCH 3/5] name-hash: precompute hash values during preload-index Johannes Schindelin
2017-02-18 5:47 ` Junio C Hamano
2017-02-19 0:19 ` Jeff Hostetler
2017-02-19 21:45 ` Junio C Hamano [this message]
2017-02-14 11:32 ` [PATCH 4/5] name-hash: specify initial size for istate.dir_hash table Johannes Schindelin
2017-02-14 11:32 ` [PATCH 5/5] name-hash: remember previous dir_entry during lazy_init_name_hash Johannes Schindelin
2017-02-14 22:03 ` [PATCH 0/5] A series of performance enhancements in the memihash and name-cache area Jeff King
2017-02-15 14:27 ` Jeff Hostetler
2017-02-15 16:44 ` Jeff King
2017-02-18 5:56 ` Junio C Hamano
2017-02-19 0:02 ` Jeff Hostetler
2017-02-18 5:58 ` Junio C Hamano
2017-02-18 6:29 ` Jeff King
2017-02-18 20:48 ` Junio C Hamano
2017-02-18 23:52 ` Jeff Hostetler
2017-02-19 21:50 ` Junio C Hamano
2017-03-02 21:11 ` Junio C Hamano
2017-03-02 21:18 ` Jeff Hostetler
2017-03-02 21:40 ` Junio C Hamano
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=xmqq1sutn9cf.fsf@gitster.mtv.corp.google.com \
--to=gitster@pobox.com \
--cc=Jeff.Hostetler@microsoft.com \
--cc=git@vger.kernel.org \
--cc=johannes.schindelin@gmx.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.