From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Date: Sun, 17 Mar 2013 10:28:06 +0700 [thread overview]
Message-ID: <1363490886-29729-1-git-send-email-pclouds@gmail.com> (raw)
This avoids unnecessary re-allocations and reinsertions. On webkit.git
(i.e. about 182k inserts to the name hash table), this reduces about
100ms out of 3s user time.
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
nd/read-directory-recursive-optim reduces the number of input (from
182k to 11k on webkit) to exclude machinery that all patches in the
exclude optimization series I posted seem insignificant. So I won't
repost them for inclusion unless you think it has cleanup values.
This one is worth doing though. I think keeping "untracked index"
would help avoid looking up in name-hash, where all user-space CPU
cycles are spent. But I have nothing to show about that.
diffcore-rename.c | 1 +
hash.h | 7 +++++++
name-hash.c | 2 ++
3 files changed, 10 insertions(+)
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 512d0ac..8d3d9bb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,6 +389,7 @@ static int find_exact_renames(struct diff_options *options)
struct hash_table file_table;
init_hash(&file_table);
+ preallocate_hash(&file_table, (rename_src_nr + rename_dst_nr) * 2);
for (i = 0; i < rename_src_nr; i++)
insert_file_table(&file_table, -1, i, rename_src[i].p->one);
diff --git a/hash.h b/hash.h
index b875ce6..244d1fe 100644
--- a/hash.h
+++ b/hash.h
@@ -40,4 +40,11 @@ static inline void init_hash(struct hash_table *table)
table->array = NULL;
}
+static inline void preallocate_hash(struct hash_table *table, unsigned int size)
+{
+ assert(table->size == 0 && table->nr == 0 && table->array == NULL);
+ table->size = size;
+ table->array = xcalloc(sizeof(struct hash_table_entry), size);
+}
+
#endif
diff --git a/name-hash.c b/name-hash.c
index 942c459..12364d1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -92,6 +92,8 @@ static void lazy_init_name_hash(struct index_state *istate)
if (istate->name_hash_initialized)
return;
+ if (istate->cache_nr)
+ preallocate_hash(&istate->name_hash, istate->cache_nr * 2);
for (nr = 0; nr < istate->cache_nr; nr++)
hash_index_entry(istate, istate->cache[nr]);
istate->name_hash_initialized = 1;
--
1.8.2.83.gc99314b
next reply other threads:[~2013-03-17 3:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-17 3:28 Nguyễn Thái Ngọc Duy [this message]
2013-03-17 5:34 ` [PATCH] Preallocate hash tables when the number of inserts are known in advance Junio C Hamano
2013-03-17 5:39 ` Duy Nguyen
2013-03-17 6:18 ` Junio C Hamano
2013-03-17 8:51 ` Jeff King
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1363490886-29729-1-git-send-email-pclouds@gmail.com \
--to=pclouds@gmail.com \
--cc=git@vger.kernel.org \
--cc=gitster@pobox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.