From: Junio C Hamano <gitster@pobox.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: I'm a total push-over..
Date: Tue, 22 Jan 2008 23:23:51 -0800 [thread overview]
Message-ID: <7vprvtngxk.fsf@gitster.siamese.dyndns.org> (raw)
In-Reply-To: <alpine.LFD.1.00.0801221844570.1741@woody.linux-foundation.org> (Linus Torvalds's message of "Tue, 22 Jan 2008 18:58:53 -0800 (PST)")
Linus Torvalds <torvalds@linux-foundation.org> writes:
> Basically, I dislike having two copies of the same data. If something can
> be computed from something else, then only the original data should exist,
> and the other thing should be recomputed.
Yes, I agree with that in principle. Storing computable values
makes sense only when it is expensive to recompute. We did not
have cache-tree for quite a long time until you noticed that it
was rather expensive and wasteful to recompute tree objects from
unchanged parts of the index every time.
It's the same argument; when the hashing performance starts to
become noticeable, we can think about storing and reusing it,
not before.
> I did consider doing the indexing only on demand, and we can certainly
> simply just "turn it off" when we know it's never going to get used (ie
> "git ls-files"). So in that sense, it's easy to get rid of the overhead,
> but it didn't really seem like the conceptual complexity (even if it's
> just a couple of lines) is really worth it. It's not like git ls-files is
> really performance-critical anyway.
Yes, ls-files is cheap. So is lstat(2) on Linux. It only
matters when you do it many many times.
In any case, the change does not look too bad. The best time
(real) of running git-ls-files in the kernel repository on my
box is 0.010s vs 0.011s (10% improvement, heh!, which is the
same as the master version) and empty commit is both 0.082s (no
change).
-- >8 --
[PATCH] lazy index hashing
This delays the hashing of index names until it becomes necessary for
the first time.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
cache.h | 1 +
read-cache.c | 26 +++++++++++++++++++++++---
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/cache.h b/cache.h
index 409738c..e4aeff0 100644
--- a/cache.h
+++ b/cache.h
@@ -191,6 +191,7 @@ struct index_state {
struct cache_tree *cache_tree;
time_t timestamp;
void *alloc;
+ unsigned name_hash_initialized : 1;
struct hash_table name_hash;
};
diff --git a/read-cache.c b/read-cache.c
index 9477c0b..e45f4b3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -34,12 +34,11 @@ static unsigned int hash_name(const char *name, int namelen)
return hash;
}
-static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
+static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
{
void **pos;
unsigned int hash = hash_name(ce->name, ce_namelen(ce));
- istate->cache[nr] = ce;
pos = insert_hash(hash, ce, &istate->name_hash);
if (pos) {
ce->next = *pos;
@@ -47,6 +46,24 @@ static void set_index_entry(struct index_state *istate, int nr, struct cache_ent
}
}
+static void lazy_init_name_hash(struct index_state *istate)
+{
+ int nr;
+
+ if (istate->name_hash_initialized)
+ return;
+ for (nr = 0; nr < istate->cache_nr; nr++)
+ hash_index_entry(istate, istate->cache[nr]);
+ istate->name_hash_initialized = 1;
+}
+
+static void set_index_entry(struct index_state *istate, int nr, struct cache_entry *ce)
+{
+ istate->cache[nr] = ce;
+ if (istate->name_hash_initialized)
+ hash_index_entry(istate, ce);
+}
+
/*
* We don't actually *remove* it, we can just mark it invalid so that
* we won't find it in lookups.
@@ -75,7 +92,10 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
int index_name_exists(struct index_state *istate, const char *name, int namelen)
{
unsigned int hash = hash_name(name, namelen);
- struct cache_entry *ce = lookup_hash(hash, &istate->name_hash);
+ struct cache_entry *ce;
+
+ lazy_init_name_hash(istate);
+ ce = lookup_hash(hash, &istate->name_hash);
while (ce) {
if (!(ce->ce_flags & CE_UNHASHED)) {
--
1.5.4.rc4.14.g6fc74
next prev parent reply other threads:[~2008-01-23 7:25 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-22 23:37 I'm a total push-over Linus Torvalds
2008-01-23 1:35 ` Kevin Ballard
2008-01-23 2:23 ` Junio C Hamano
2008-01-23 2:36 ` Junio C Hamano
2008-01-23 12:24 ` Johannes Schindelin
2008-01-23 12:28 ` David Kastrup
2008-01-23 12:56 ` Theodore Tso
2008-01-23 2:58 ` Linus Torvalds
2008-01-23 3:19 ` Linus Torvalds
2008-01-25 6:50 ` Junio C Hamano
2008-01-25 16:24 ` Linus Torvalds
2008-01-23 7:23 ` Junio C Hamano [this message]
2008-01-23 12:25 ` Johannes Schindelin
2008-01-23 16:25 ` Linus Torvalds
2008-01-23 16:34 ` Johannes Schindelin
2008-01-23 17:09 ` Linus Torvalds
2008-01-23 17:29 ` Linus Torvalds
2008-01-25 5:21 ` Jeremy Maitin-Shepard
2008-01-25 12:51 ` Johannes Schindelin
2008-01-25 18:19 ` Jeremy Maitin-Shepard
2008-01-25 18:24 ` Johannes Schindelin
2008-01-25 19:07 ` Junio C Hamano
2008-01-23 8:32 ` Andreas Ericsson
2008-01-23 9:15 ` Dmitry Potapov
2008-01-23 9:31 ` Andreas Ericsson
2008-01-23 14:01 ` Marko Kreen
2008-01-23 14:39 ` Andreas Ericsson
2008-01-24 6:51 ` Luke Lu
2008-01-24 10:24 ` Andreas Ericsson
2008-01-24 13:19 ` Marko Kreen
2008-01-24 16:00 ` Andreas Ericsson
2008-01-24 16:13 ` Marko Kreen
2008-01-24 16:28 ` Dmitry Potapov
2008-01-24 17:15 ` Linus Torvalds
2008-01-24 18:45 ` Dmitry Potapov
2008-01-24 19:08 ` Linus Torvalds
2008-01-25 20:52 ` Marko Kreen
2008-01-25 22:16 ` Linus Torvalds
2008-01-25 22:35 ` Linus Torvalds
2008-01-26 12:16 ` Marko Kreen
2008-01-27 6:51 ` Linus Torvalds
2008-01-27 8:21 ` Dmitry Potapov
2008-01-27 14:07 ` Johannes Schindelin
2008-01-27 14:48 ` Dmitry Potapov
2008-01-27 9:45 ` Marko Kreen
2008-01-27 15:06 ` Dmitry Potapov
2008-01-26 12:37 ` Marko Kreen
2008-01-25 20:08 ` Marko Kreen
2008-01-23 17:10 ` Dmitry Potapov
2008-01-24 10:39 ` Andreas Ericsson
2008-01-23 16:06 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=7vprvtngxk.fsf@gitster.siamese.dyndns.org \
--to=gitster@pobox.com \
--cc=git@vger.kernel.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).