From: Karsten Blees <karsten.blees@gmail.com>
To: Jeff King <peff@peff.net>
Cc: Duy Nguyen <pclouds@gmail.com>,
kusmabite@gmail.com, Ramkumar Ramachandra <artagnon@gmail.com>,
Robert Zeh <robert.allan.zeh@gmail.com>,
Junio C Hamano <gitster@pobox.com>,
Git List <git@vger.kernel.org>,
finnag@pvv.org
Subject: Re: inotify to minimize stat() calls
Date: Thu, 14 Feb 2013 01:48:20 +0100 [thread overview]
Message-ID: <511C3454.6080204@gmail.com> (raw)
In-Reply-To: <20130213225529.GA25353@sigill.intra.peff.net>
Am 13.02.2013 23:55, schrieb Jeff King:
> On Wed, Feb 13, 2013 at 09:25:59PM +0100, Karsten Blees wrote:
>
>> Alternatively, we could simply create normal cache_entries for the
>> directories that are linked via ce->next, but have a trailing '/' in
>> their name?
>>
>> Reference counting sounds good to me, at least better than allocating
>> directory entries per cache entry * parent dirs.
>
> I think that is more or less what my patch does, but it splits the
> ref-counted fake cache_entries out into a separate hash of "struct
> dir_hash_entry" rather than storing it in the regular hash. Which IMHO
> is a bit cleaner for two reasons:
>
> 1. You do not have to pay the memory price of storing fake
> cache_entries (the name+refcount struct for each directory is much
> smaller than a real cache_entry).
>
Yes, but considering the small number of directories compared to files, I think this is a relatively small price to pay.
> 2. It makes the code a bit simpler, as you do not have to do any
> "check for trailing /" magic on the result of index_name_exists to
> determine if it is a "real" name or just a fake dir entry.
>
True for dir.c. On the other hand, you need a lot of new find / find_or_create logic in name-hash.c.
Just to illustrate what I mean, here's a quick sketch (there's still a segfault somewhere, but I don't have time to debug right now...).
Note that hash_index_entry_directories works from right to left - if the immediate parent directory is there, there's no need to check the parent's parent.
cache_entry.dir points to the parent directory so that we don't need to lookup all path components for reference counting when adding / removing entries.
As directory entries are 'real' cache_entries, we can reuse the existing index_name_exists and hash_index_entry code.
I feel slightly guilty for abusing ce_size as reference counter...well :-)
---
cache.h | 4 +++-
name-hash.c | 80 ++++++++++++++++++++++++++++---------------------------------
2 files changed, 39 insertions(+), 45 deletions(-)
diff --git a/cache.h b/cache.h
index 665b512..2bc1372 100644
--- a/cache.h
+++ b/cache.h
@@ -131,7 +131,7 @@ struct cache_entry {
unsigned int ce_namelen;
unsigned char sha1[20];
struct cache_entry *next;
- struct cache_entry *dir_next;
+ struct cache_entry *dir;
char name[FLEX_ARRAY]; /* more */
};
@@ -285,6 +285,8 @@ extern void add_name_hash(struct index_state *istate, struct cache_entry *ce);
static inline void remove_name_hash(struct cache_entry *ce)
{
ce->ce_flags |= CE_UNHASHED;
+ if (ce->dir && !(--ce->dir->ce_size))
+ remove_name_hash(ce->dir);
}
diff --git a/name-hash.c b/name-hash.c
index d8d25c2..01e8320 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -32,6 +32,9 @@ static unsigned int hash_name(const char *name, int namelen)
return hash;
}
+static struct cache_entry *lookup_index_entry(struct index_state *istate, const char *name, int namelen, int icase);
+static void hash_index_entry(struct index_state *istate, struct cache_entry *ce);
+
static void hash_index_entry_directories(struct index_state *istate, struct cache_entry *ce)
{
/*
@@ -40,30 +43,25 @@ static void hash_index_entry_directories(struct index_state *istate, struct cach
* closing slash. Despite submodules being a directory, they never
* reach this point, because they are stored without a closing slash
* in the cache.
- *
- * Note that the cache_entry stored with the directory does not
- * represent the directory itself. It is a pointer to an existing
- * filename, and its only purpose is to represent existence of the
- * directory in the cache. It is very possible multiple directory
- * hash entries may point to the same cache_entry.
*/
- unsigned int hash;
- void **pos;
+ int len = ce_namelen(ce);
+ if (len && ce->name[len - 1] == '/')
+ len--;
+ while (len && ce->name[len - 1] != '/')
+ len--;
+ if (!len)
+ return;
- const char *ptr = ce->name;
- while (*ptr) {
- while (*ptr && *ptr != '/')
- ++ptr;
- if (*ptr == '/') {
- ++ptr;
- hash = hash_name(ce->name, ptr - ce->name);
- pos = insert_hash(hash, ce, &istate->name_hash);
- if (pos) {
- ce->dir_next = *pos;
- *pos = ce;
- }
- }
+ ce->dir = lookup_index_entry(istate, ce->name, len, ignore_case);
+ if (!ce->dir) {
+ ce->dir = xcalloc(1, cache_entry_size(len));
+ memcpy(ce->dir->name, ce->name, len);
+ ce->dir->ce_namelen = len;
+ ce->dir->name[len] = 0;
+ hash_index_entry(istate, ce->dir);
}
+ ce->dir->ce_flags &= ~CE_UNHASHED;
+ ce->dir->ce_size++;
}
static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
@@ -74,7 +72,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
if (ce->ce_flags & CE_HASHED)
return;
ce->ce_flags |= CE_HASHED;
- ce->next = ce->dir_next = NULL;
+ ce->next = ce->dir = NULL;
hash = hash_name(ce->name, ce_namelen(ce));
pos = insert_hash(hash, ce, &istate->name_hash);
if (pos) {
@@ -137,38 +135,32 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
if (!icase)
return 0;
- /*
- * If the entry we're comparing is a filename (no trailing slash), then compare
- * the lengths exactly.
- */
- if (name[namelen - 1] != '/')
- return slow_same_name(name, namelen, ce->name, len);
-
- /*
- * For a directory, we point to an arbitrary cache_entry filename. Just
- * make sure the directory portion matches.
- */
- return slow_same_name(name, namelen, ce->name, namelen < len ? namelen : len);
+ return slow_same_name(name, namelen, ce->name, len);
}
-struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
+static struct cache_entry *lookup_index_entry(struct index_state *istate, const char *name, int namelen, int icase)
{
unsigned int hash = hash_name(name, namelen);
- struct cache_entry *ce;
-
- lazy_init_name_hash(istate);
- ce = lookup_hash(hash, &istate->name_hash);
+ struct cache_entry *ce = lookup_hash(hash, &istate->name_hash);
while (ce) {
if (!(ce->ce_flags & CE_UNHASHED)) {
if (same_name(ce, name, namelen, icase))
return ce;
}
- if (icase && name[namelen - 1] == '/')
- ce = ce->dir_next;
- else
- ce = ce->next;
+ ce = ce->next;
}
+ return NULL;
+}
+
+struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
+{
+ struct cache_entry *ce;
+
+ lazy_init_name_hash(istate);
+ ce = lookup_index_entry(istate, name, namelen, icase);
+ if (ce)
+ return ce;
/*
* Might be a submodule. Despite submodules being directories,
@@ -182,7 +174,7 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
* true.
*/
if (icase && name[namelen - 1] == '/') {
- ce = index_name_exists(istate, name, namelen - 1, icase);
+ ce = lookup_index_entry(istate, name, namelen - 1, icase);
if (ce && S_ISGITLINK(ce->ce_mode))
return ce;
}
next prev parent reply other threads:[~2013-02-14 0:48 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-02-08 21:10 inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-08 22:15 ` Junio C Hamano
2013-02-08 22:45 ` Junio C Hamano
2013-02-09 2:10 ` Duy Nguyen
2013-02-09 2:37 ` Junio C Hamano
2013-02-09 2:56 ` Junio C Hamano
2013-02-09 3:36 ` Robert Zeh
2013-02-09 12:05 ` Ramkumar Ramachandra
2013-02-09 12:11 ` Ramkumar Ramachandra
2013-02-09 12:53 ` Ramkumar Ramachandra
2013-02-09 12:59 ` Duy Nguyen
2013-02-09 17:10 ` Ramkumar Ramachandra
2013-02-09 18:56 ` Ramkumar Ramachandra
2013-02-10 5:24 ` Duy Nguyen
2013-02-10 11:17 ` Duy Nguyen
2013-02-10 11:22 ` Duy Nguyen
2013-02-10 20:16 ` Junio C Hamano
2013-02-11 2:56 ` Duy Nguyen
2013-02-11 11:12 ` Duy Nguyen
2013-03-07 22:16 ` Torsten Bögershausen
2013-03-08 0:04 ` Junio C Hamano
2013-03-08 7:01 ` Torsten Bögershausen
2013-03-08 8:15 ` Junio C Hamano
2013-03-08 9:24 ` Torsten Bögershausen
2013-03-08 10:53 ` Duy Nguyen
2013-03-10 8:23 ` Ramkumar Ramachandra
2013-03-13 12:59 ` [PATCH] status: hint the user about -uno if read_directory takes too long Nguyễn Thái Ngọc Duy
2013-03-13 15:21 ` Torsten Bögershausen
2013-03-13 16:16 ` Junio C Hamano
2013-03-14 10:22 ` Duy Nguyen
2013-03-14 15:05 ` Junio C Hamano
2013-03-15 12:30 ` Duy Nguyen
2013-03-15 15:52 ` Torsten Bögershausen
2013-03-15 15:57 ` Ramkumar Ramachandra
2013-03-15 16:53 ` Junio C Hamano
2013-03-15 17:41 ` Torsten Bögershausen
2013-03-15 20:06 ` Junio C Hamano
2013-03-15 21:14 ` Torsten Bögershausen
2013-03-15 21:59 ` Junio C Hamano
2013-03-16 7:21 ` Torsten Bögershausen
2013-03-17 4:47 ` Junio C Hamano
2013-03-16 1:51 ` Duy Nguyen
2013-02-10 13:26 ` inotify to minimize stat() calls demerphq
2013-02-10 15:35 ` Duy Nguyen
2013-02-14 14:36 ` Magnus Bäck
2013-02-10 16:45 ` Ramkumar Ramachandra
2013-02-11 3:03 ` Duy Nguyen
2013-02-10 16:58 ` Erik Faye-Lund
2013-02-11 3:53 ` Duy Nguyen
2013-02-12 20:48 ` Karsten Blees
2013-02-13 10:06 ` Duy Nguyen
2013-02-13 12:15 ` Duy Nguyen
2013-02-13 18:18 ` Jeff King
2013-02-13 19:47 ` Jeff King
2013-02-13 20:25 ` Karsten Blees
2013-02-13 22:55 ` Jeff King
2013-02-14 0:48 ` Karsten Blees [this message]
2013-02-27 14:45 ` [PATCH] name-hash.c: fix endless loop with core.ignorecase=true Karsten Blees
2013-02-27 16:53 ` Junio C Hamano
2013-02-27 21:52 ` Karsten Blees
2013-02-27 23:57 ` [PATCH v2] " Karsten Blees
2013-02-28 0:27 ` Junio C Hamano
2013-02-19 9:49 ` inotify to minimize stat() calls Ramkumar Ramachandra
2013-02-19 14:25 ` Karsten Blees
2013-02-19 13:16 ` Drew Northup
2013-02-19 13:47 ` Duy Nguyen
2013-02-09 19:35 ` Junio C Hamano
2013-02-10 19:03 ` Robert Zeh
2013-02-10 19:26 ` Martin Fick
2013-02-10 20:18 ` Robert Zeh
2013-02-11 3:21 ` Duy Nguyen
2013-02-11 14:13 ` Robert Zeh
2013-02-19 9:57 ` Ramkumar Ramachandra
2013-04-24 17:20 ` [PATCH] " Robert Zeh
2013-04-24 21:32 ` Duy Nguyen
2013-04-25 19:44 ` Robert Zeh
2013-04-25 21:20 ` Duy Nguyen
2013-04-26 15:35 ` Robert Zeh
2013-04-25 8:18 ` Thomas Rast
2013-04-25 19:37 ` Robert Zeh
2013-04-25 19:59 ` Thomas Rast
2013-04-27 13:51 ` Thomas Rast
2013-04-27 23:56 ` Duy Nguyen
[not found] ` <CAKXa9=r2A7UeBV2s2H3wVGdPkS1zZ9huNJhtvTC-p0S5Ed12xA@mail.gmail.com>
2013-04-30 0:27 ` Duy Nguyen
2013-02-09 11:32 ` Ramkumar Ramachandra
2013-02-14 15:16 ` Ævar Arnfjörð Bjarmason
2013-02-14 16:31 ` Junio C Hamano
2013-02-19 9:40 ` Ramkumar Ramachandra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=511C3454.6080204@gmail.com \
--to=karsten.blees@gmail.com \
--cc=artagnon@gmail.com \
--cc=blees@dcon.de \
--cc=finnag@pvv.org \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
--cc=kusmabite@gmail.com \
--cc=pclouds@gmail.com \
--cc=peff@peff.net \
--cc=robert.allan.zeh@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.