All of lore.kernel.org
 help / color / mirror / Atom feed
From: Karsten Blees <karsten.blees@gmail.com>
To: Git List <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>
Cc: Thomas Rast <tr@thomasrast.ch>,
	Jens Lehmann <Jens.Lehmann@web.de>,
	Karsten Blees <karsten.blees@gmail.com>
Subject: [PATCH v5 00/14] New hash table implementation
Date: Thu, 14 Nov 2013 20:08:37 +0100	[thread overview]
Message-ID: <52851FB5.4050406@gmail.com> (raw)

Also here:
https://github.com/kblees/git/commits/kb/hashmap-v5

Changes since v3:
- changed table resizing and grow / shrink thresholds (#02)
- hashmap_free: changed free_function to boolean (#02, #06, #07, #09)
- swapped xcalloc args (num elements, element size) (#02)
- fixed pointer cast formatting "(type*)" -> "(type *)" (#02)
- removed function pointer casts "(*fn)(...)" -> "fn(...)" (#02)
- api-hashmap.txt: clarified removal / replacement of entries (#02)
- update-index.c: fixed const-casts (#13, #14)

Karsten


Jens Lehmann (1):
  submodule: don't access the .gitmodules cache entry after removing it

Karsten Blees (13):
  add a hashtable implementation that supports O(1) removal
  buitin/describe.c: use new hash map implementation
  diffcore-rename.c: move code around to prepare for the next patch
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: use new hash map implementation
  name-hash.c: use new hash map implementation for directories
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  remove old hash.[ch] implementation
  fix 'git update-index --verbose --again' output
  builtin/update-index.c: cleanup update_one
  read-cache.c: fix memory leaks caused by removed cache entries

 Documentation/technical/api-hash.txt    |  52 -------
 Documentation/technical/api-hashmap.txt | 235 +++++++++++++++++++++++++++++
 Makefile                                |   5 +-
 builtin/describe.c                      |  53 +++----
 builtin/rm.c                            |   2 +-
 builtin/update-index.c                  |  39 +++--
 cache.h                                 |  18 +--
 diffcore-rename.c                       | 185 ++++++++---------------
 hash.c                                  | 110 --------------
 hash.h                                  |  50 -------
 hashmap.c                               | 228 ++++++++++++++++++++++++++++
 hashmap.h                               |  71 +++++++++
 name-hash.c                             | 156 +++++++------------
 read-cache.c                            |  10 +-
 resolve-undo.c                          |   7 +-
 submodule.c                             |  25 +---
 t/t0011-hashmap.sh                      | 240 ++++++++++++++++++++++++++++++
 test-hashmap.c                          | 256 ++++++++++++++++++++++++++++++++
 unpack-trees.c                          |   3 +-
 19 files changed, 1216 insertions(+), 529 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 create mode 100644 Documentation/technical/api-hashmap.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c

--

git diff kb/hashmap-v4:

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
index fc46e56..b2280f1 100644
--- a/Documentation/technical/api-hashmap.txt
+++ b/Documentation/technical/api-hashmap.txt
@@ -42,11 +42,6 @@ parameters that have the same hash code. When looking up an entry, the `key`
 and `keydata` parameters to hashmap_get and hashmap_remove are always passed
 as second and third argument, respectively. Otherwise, `keydata` is NULL.
 
-`void (*hashmap_free_fn)(void *entry)`::
-
-	User-supplied function to free a hashmap_entry. If entries are simply
-	malloc'ed memory, use stdlib's free() function.
-
 Functions
 ---------
 
@@ -76,14 +71,14 @@ If the total number of entries is known in advance, the `initial_size`
 parameter may be used to preallocate a sufficiently large table and thus
 prevent expensive resizing. If 0, the table is dynamically resized.
 
-`void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)`::
+`void hashmap_free(struct hashmap *map, int free_entries)`::
 
 	Frees a hashmap structure and allocated memory.
 +
 `map` is the hashmap to free.
 +
-If `free_function` is specified (not NULL), it is called to free each
-hashmap_entry in the map. If entries are simply malloc'ed, use stdlib's free().
+If `free_entries` is true, each hashmap_entry in the map is freed as well
+(using stdlib's free()).
 
 `void hashmap_entry_init(void *entry, int hash)`::
 
@@ -127,17 +122,20 @@ call to `hashmap_get` or `hashmap_get_next`.
 
 `void *hashmap_put(struct hashmap *map, void *entry)`::
 
-	Adds or replaces a hashmap entry.
+	Adds or replaces a hashmap entry. If the hashmap contains duplicate
+	entries equal to the specified entry, only one of them will be replaced.
 +
 `map` is the hashmap structure.
 +
-`entry` is the entry to add or update.
+`entry` is the entry to add or replace.
 +
-Returns the previous entry, or NULL if not found (i.e. the entry was added).
+Returns the replaced entry, or NULL if not found (i.e. the entry was added).
 
 `void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
 
-	Removes a hashmap entry matching the specified key.
+	Removes a hashmap entry matching the specified key. If the hashmap
+	contains duplicate entries equal to the specified key, only one of
+	them will be removed.
 +
 `map` is the hashmap structure.
 +
@@ -183,14 +181,14 @@ static int long2double_cmp(const struct long2double *e1, const struct long2doubl
 	return !(e1->key == e2->key);
 }
 
-void long2double_init()
+void long2double_init(void)
 {
 	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
 }
 
-void long2double_free()
+void long2double_free(void)
 {
-	hashmap_free(&map, free);
+	hashmap_free(&map, 1);
 }
 
 static struct long2double *find_entry(long key)
diff --git a/builtin/update-index.c b/builtin/update-index.c
index acd992d..00313f3 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -559,7 +559,7 @@ static int do_reupdate(int ac, const char **av,
 		const struct cache_entry *ce = active_cache[pos];
 		struct cache_entry *old = NULL;
 		int save_nr;
-		const char *path;
+		char *path;
 
 		if (ce_stage(ce) || !ce_path_match(ce, &pathspec))
 			continue;
@@ -838,7 +838,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			update_one(p);
 			if (set_executable_bit)
 				chmod_path(set_executable_bit, p);
-			free(p);
+			free((char *)p);
 			ctx.argc--;
 			ctx.argv++;
 			break;
@@ -880,7 +880,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix)
 			update_one(p);
 			if (set_executable_bit)
 				chmod_path(set_executable_bit, p);
-			free(p);
+			free((char *)p);
 		}
 		strbuf_release(&nbuf);
 		strbuf_release(&buf);
diff --git a/diffcore-rename.c b/diffcore-rename.c
index d996c6a..9b4f068 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -343,7 +343,7 @@ static int find_exact_renames(struct diff_options *options)
 		renames += find_identical_files(&file_table, i, options);
 
 	/* Free the hash data structure and entries */
-	hashmap_free(&file_table, free);
+	hashmap_free(&file_table, 1);
 
 	return renames;
 }
diff --git a/hashmap.c b/hashmap.c
index 3a9f8c1..d1b8056 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -29,7 +29,7 @@ unsigned int strihash(const char *str)
 unsigned int memhash(const void *buf, size_t len)
 {
 	unsigned int hash = FNV32_BASE;
-	unsigned char *ucbuf = (unsigned char*) buf;
+	unsigned char *ucbuf = (unsigned char *) buf;
 	while (len--) {
 		unsigned int c = *ucbuf++;
 		hash = (hash * FNV32_PRIME) ^ c;
@@ -40,7 +40,7 @@ unsigned int memhash(const void *buf, size_t len)
 unsigned int memihash(const void *buf, size_t len)
 {
 	unsigned int hash = FNV32_BASE;
-	unsigned char *ucbuf = (unsigned char*) buf;
+	unsigned char *ucbuf = (unsigned char *) buf;
 	while (len--) {
 		unsigned int c = *ucbuf++;
 		if (c >= 'a' && c <= 'z')
@@ -52,17 +52,33 @@ unsigned int memihash(const void *buf, size_t len)
 
 #define HASHMAP_INITIAL_SIZE 64
 /* grow / shrink by 2^2 */
-#define HASHMAP_GROW 2
-/* grow if > 80% full (to 20%) */
-#define HASHMAP_GROW_AT 1.25
-/* shrink if < 16.6% full (to 66.6%) */
-#define HASHMAP_SHRINK_AT 6
+#define HASHMAP_RESIZE_BITS 2
+/* load factor in percent */
+#define HASHMAP_LOAD_FACTOR 80
+
+static void alloc_table(struct hashmap *map, unsigned int size)
+{
+	map->tablesize = size;
+	map->table = xcalloc(size, sizeof(struct hashmap_entry *));
+
+	/* calculate resize thresholds for new size */
+	map->grow_at = (unsigned int) ((uint64_t) size * HASHMAP_LOAD_FACTOR / 100);
+	if (size <= HASHMAP_INITIAL_SIZE)
+		map->shrink_at = 0;
+	else
+		/*
+		 * The shrink-threshold must be slightly smaller than
+		 * (grow-threshold / resize-factor) to prevent erratic resizing,
+		 * thus we divide by (resize-factor + 1).
+		 */
+		map->shrink_at = map->grow_at / ((1 << HASHMAP_RESIZE_BITS) + 1);
+}
 
 static inline int entry_equals(const struct hashmap *map,
 		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
 		const void *keydata)
 {
-	return (e1 == e2) || (e1->hash == e2->hash && !(*map->cmpfn)(e1, e2, keydata));
+	return (e1 == e2) || (e1->hash == e2->hash && !map->cmpfn(e1, e2, keydata));
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
@@ -76,8 +92,7 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 	unsigned int i, oldsize = map->tablesize;
 	struct hashmap_entry **oldtable = map->table;
 
-	map->tablesize = newsize;
-	map->table = xcalloc(sizeof(struct hashmap_entry*), map->tablesize);
+	alloc_table(map, newsize);
 	for (i = 0; i < oldsize; i++) {
 		struct hashmap_entry *e = oldtable[i];
 		while (e) {
@@ -108,26 +123,28 @@ static int always_equal(const void *unused1, const void *unused2, const void *un
 void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 		size_t initial_size)
 {
+	unsigned int size = HASHMAP_INITIAL_SIZE;
 	map->size = 0;
 	map->cmpfn = equals_function ? equals_function : always_equal;
+
 	/* calculate initial table size and allocate the table */
-	map->tablesize = HASHMAP_INITIAL_SIZE;
-	initial_size *= HASHMAP_GROW_AT;
-	while (initial_size > map->tablesize)
-		map->tablesize <<= HASHMAP_GROW;
-	map->table = xcalloc(sizeof(struct hashmap_entry*), map->tablesize);
+	initial_size = (unsigned int) ((uint64_t) initial_size * 100
+			/ HASHMAP_LOAD_FACTOR);
+	while (initial_size > size)
+		size <<= HASHMAP_RESIZE_BITS;
+	alloc_table(map, size);
 }
 
-void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)
+void hashmap_free(struct hashmap *map, int free_entries)
 {
 	if (!map || !map->table)
 		return;
-	if (free_function) {
+	if (free_entries) {
 		struct hashmap_iter iter;
 		struct hashmap_entry *e;
 		hashmap_iter_init(map, &iter);
 		while ((e = hashmap_iter_next(&iter)))
-			(*free_function)(e);
+			free(e);
 	}
 	free(map->table);
 	memset(map, 0, sizeof(*map));
@@ -140,7 +157,7 @@ void *hashmap_get(const struct hashmap *map, const void *key, const void *keydat
 
 void *hashmap_get_next(const struct hashmap *map, const void *entry)
 {
-	struct hashmap_entry *e = ((struct hashmap_entry*) entry)->next;
+	struct hashmap_entry *e = ((struct hashmap_entry *) entry)->next;
 	for (; e; e = e->next)
 		if (entry_equals(map, entry, e, NULL))
 			return e;
@@ -152,13 +169,13 @@ void hashmap_add(struct hashmap *map, void *entry)
 	unsigned int b = bucket(map, entry);
 
 	/* add entry */
-	((struct hashmap_entry*) entry)->next = map->table[b];
+	((struct hashmap_entry *) entry)->next = map->table[b];
 	map->table[b] = entry;
 
 	/* fix size and rehash if appropriate */
 	map->size++;
-	if (map->size * HASHMAP_GROW_AT > map->tablesize)
-		rehash(map, map->tablesize << HASHMAP_GROW);
+	if (map->size > map->grow_at)
+		rehash(map, map->tablesize << HASHMAP_RESIZE_BITS);
 }
 
 void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
@@ -175,9 +192,8 @@ void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
 
 	/* fix size and rehash if appropriate */
 	map->size--;
-	if (map->tablesize > HASHMAP_INITIAL_SIZE &&
-	    map->size * HASHMAP_SHRINK_AT < map->tablesize)
-		rehash(map, map->tablesize >> HASHMAP_GROW);
+	if (map->size < map->shrink_at)
+		rehash(map, map->tablesize >> HASHMAP_RESIZE_BITS);
 	return old;
 }
 
diff --git a/hashmap.h b/hashmap.h
index eea003a..f5b3b61 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -22,12 +22,11 @@ struct hashmap_entry {
 
 typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
 		const void *keydata);
-typedef void (*hashmap_free_fn)(void *entry);
 
 struct hashmap {
 	struct hashmap_entry **table;
 	hashmap_cmp_fn cmpfn;
-	unsigned int size, tablesize;
+	unsigned int size, tablesize, grow_at, shrink_at;
 };
 
 struct hashmap_iter {
@@ -40,7 +39,7 @@ struct hashmap_iter {
 
 extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 		size_t initial_size);
-extern void hashmap_free(struct hashmap *map, hashmap_free_fn free_function);
+extern void hashmap_free(struct hashmap *map, int free_entries);
 
 /* hashmap_entry functions */
 
diff --git a/name-hash.c b/name-hash.c
index 8871d8e..9a3bd3f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -240,6 +240,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	hashmap_free(&istate->name_hash, NULL);
-	hashmap_free(&istate->dir_hash, free);
+	hashmap_free(&istate->name_hash, 0);
+	hashmap_free(&istate->dir_hash, 1);
 }
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
index 6c699d5..391e2b6 100755
--- a/t/t0011-hashmap.sh
+++ b/t/t0011-hashmap.sh
@@ -221,13 +221,17 @@ test_expect_success 'grow / shrink' '
 	echo NULL >> expect
 	echo size >> in &&
 	echo 256 52 >> expect &&
-	for n in $(test_seq 10)
+	for n in $(test_seq 12)
 	do
 		echo remove key$n >> in &&
 		echo value$n >> expect
 	done &&
 	echo size >> in &&
-	echo 64 42 >> expect &&
+	echo 256 40 >> expect &&
+	echo remove key40 >> in &&
+	echo value40 >> expect &&
+	echo size >> in &&
+	echo 64 39 >> expect &&
 	cat in | test-hashmap > out &&
 	test_cmp expect out
 
diff --git a/test-hashmap.c b/test-hashmap.c
index 333b7b3..7e86f88 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -71,7 +71,7 @@ static unsigned int hash(unsigned int method, unsigned int i, const char *key)
 }
 
 /*
- * Test insert performance of hashmap.[ch]
+ * Test performance of hashmap.[ch]
  * Usage: time echo "perfhashmap method rounds" | test-hashmap
  */
 static void perf_hashmap(unsigned int method, unsigned int rounds)
@@ -82,7 +82,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 	unsigned int *hashes;
 	unsigned int i, j;
 
-	entries = malloc(TEST_SIZE * sizeof(struct test_entry*));
+	entries = malloc(TEST_SIZE * sizeof(struct test_entry *));
 	hashes = malloc(TEST_SIZE * sizeof(int));
 	for (i = 0; i < TEST_SIZE; i++) {
 		snprintf(buf, sizeof(buf), "%i", i);
@@ -101,7 +101,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 				hashmap_add(&map, entries[i]);
 			}
 
-			hashmap_free(&map, NULL);
+			hashmap_free(&map, 0);
 		}
 	} else {
 		/* test map lookups */
@@ -122,7 +122,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 			}
 		}
 
-		hashmap_free(&map, NULL);
+		hashmap_free(&map, 0);
 	}
 }
 
@@ -251,6 +251,6 @@ int main(int argc, char *argv[])
 		}
 	}
 
-	hashmap_free(&map, free);
+	hashmap_free(&map, 1);
 	return 0;
 }

             reply	other threads:[~2013-11-14 19:08 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-14 19:08 Karsten Blees [this message]
2013-11-14 19:17 ` [PATCH v5 01/14] submodule: don't access the .gitmodules cache entry after removing it Karsten Blees
2013-11-14 19:17 ` [PATCH v5 02/14] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-12-14  2:04   ` Jonathan Nieder
2013-12-14  2:05     ` [PATCH 1/2] Add test-hashmap to .gitignore Jonathan Nieder
2013-12-14  2:06     ` [PATCH 2/2] Drop unnecessary #includes from test-hashmap Jonathan Nieder
2013-12-18 13:11     ` [PATCH v5 02/14] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-11-14 19:18 ` [PATCH v5 03/14] buitin/describe.c: use new hash map implementation Karsten Blees
2013-11-14 19:19 ` [PATCH v5 04/14] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-11-14 19:19 ` [PATCH v5 05/14] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-11-14 19:20 ` [PATCH v5 06/14] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-11-14 19:20 ` [PATCH v5 07/14] name-hash.c: use new hash map implementation for directories Karsten Blees
2013-11-14 19:21 ` [PATCH v5 08/14] name-hash.c: remove unreferenced directory entries Karsten Blees
2013-11-14 19:21 ` [PATCH v5 09/14] name-hash.c: use new hash map implementation for cache entries Karsten Blees
2013-11-14 19:22 ` [PATCH v5 10/14] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
2013-11-14 19:23 ` [PATCH v5 11/14] remove old hash.[ch] implementation Karsten Blees
2013-11-14 19:23 ` [PATCH v5 12/14] fix 'git update-index --verbose --again' output Karsten Blees
2013-11-14 19:24 ` [PATCH v5 13/14] builtin/update-index.c: cleanup update_one Karsten Blees
2013-11-14 19:24 ` [PATCH v5 14/14] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
2013-11-14 22:07 ` [PATCH v5 00/14] New hash table implementation Karsten Blees

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52851FB5.4050406@gmail.com \
    --to=karsten.blees@gmail.com \
    --cc=Jens.Lehmann@web.de \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=tr@thomasrast.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.