All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
To: git@vger.kernel.org
Cc: "Junio C Hamano" <gitster@pobox.com>,
	"Nguyễn Thái Ngọc Duy" <pclouds@gmail.com>
Subject: [PATCH] Preallocate hash tables when the number of inserts are known in advance
Date: Sun, 17 Mar 2013 10:28:06 +0700	[thread overview]
Message-ID: <1363490886-29729-1-git-send-email-pclouds@gmail.com> (raw)

This avoids unnecessary re-allocations and reinsertions. On webkit.git
(i.e. about 182k inserts to the name hash table), this reduces about
100ms out of 3s user time.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
---
 nd/read-directory-recursive-optim reduces the number of input (from
 182k to 11k on webkit) to exclude machinery that all patches in the
 exclude optimization series I posted seem insignificant. So I won't
 repost them for inclusion unless you think it has cleanup values.

 This one is worth doing though. I think keeping "untracked index"
 would help avoid looking up in name-hash, where all user-space CPU
 cycles are spent. But I have nothing to show about that.

 diffcore-rename.c | 1 +
 hash.h            | 7 +++++++
 name-hash.c       | 2 ++
 3 files changed, 10 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 512d0ac..8d3d9bb 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -389,6 +389,7 @@ static int find_exact_renames(struct diff_options *options)
 	struct hash_table file_table;
 
 	init_hash(&file_table);
+	preallocate_hash(&file_table, (rename_src_nr + rename_dst_nr) * 2);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, -1, i, rename_src[i].p->one);
 
diff --git a/hash.h b/hash.h
index b875ce6..244d1fe 100644
--- a/hash.h
+++ b/hash.h
@@ -40,4 +40,11 @@ static inline void init_hash(struct hash_table *table)
 	table->array = NULL;
 }
 
+static inline void preallocate_hash(struct hash_table *table, unsigned int size)
+{
+	assert(table->size == 0 && table->nr == 0 && table->array == NULL);
+	table->size = size;
+	table->array = xcalloc(sizeof(struct hash_table_entry), size);
+}
+
 #endif
diff --git a/name-hash.c b/name-hash.c
index 942c459..12364d1 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -92,6 +92,8 @@ static void lazy_init_name_hash(struct index_state *istate)
 
 	if (istate->name_hash_initialized)
 		return;
+	if (istate->cache_nr)
+		preallocate_hash(&istate->name_hash, istate->cache_nr * 2);
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
-- 
1.8.2.83.gc99314b

             reply	other threads:[~2013-03-17  3:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-03-17  3:28 Nguyễn Thái Ngọc Duy [this message]
2013-03-17  5:34 ` [PATCH] Preallocate hash tables when the number of inserts are known in advance Junio C Hamano
2013-03-17  5:39   ` Duy Nguyen
2013-03-17  6:18     ` Junio C Hamano
2013-03-17  8:51 ` Jeff King

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1363490886-29729-1-git-send-email-pclouds@gmail.com \
    --to=pclouds@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.