[PATCH v3 00/11] New hash table implementation

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v3 00/11] New hash table implementation
@ 2013-10-01  9:33 Karsten Blees
  2013-10-01  9:34 ` [PATCH v3 01/11] add a hashtable implementation that supports O(1) removal Karsten Blees
                   ` (10 more replies)
  0 siblings, 11 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:33 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Also here:
https://github.com/kblees/git/tree/kb/hashmap-v3

Previous discussion:
http://thread.gmane.org/gmane.comp.version-control.git/234508


Changes since v2:
- move documentation from hashmap.h to Documentation/technical/api-hashmap.txt
- simpler way to deal with FLEX_ARRAY keys: instead of requiring a special
  key structure (and function hashmap_entry_is_key), hashmap_get and
  hashmap_remove now take an additional parameter that is passed through to
  the comparison function

Added since v2 (patches 06 - 11):
- convert name-hash.c to new hash map implementation
- fix memory leaks caused by not free()ing removed cache_entries
- remove now unused hash.[ch] implementation


Performance:
While the main motivation for the patch series wasn't performance, _actual_ git performance will most likely benefit from the name-hash.c changes. So here's the numbers for 10 git-status runs on the WebKit repo (~200k files, with core.preloadindex=true). Not exactly a quantum leap, but its measurable.

git status -s | hash  | hashmap |    gain
--------------+-------+---------+-------------
real avg      | 0.969 |   0.959 | 0.010 / 1.0%
real min      | 0.910 |   0.895 | 0.015 / 1.6%
real max      | 1.079 |   0.993 | 0.086 / 8.0%


Cheers,
Karsten

Karsten Blees (11):
  add a hashtable implementation that supports O(1) removal
  buitin/describe.c: use new hash map implementation
  diffcore-rename.c: move code around to prepare for the next patch
  diffcore-rename.c: simplify finding exact renames
  diffcore-rename.c: use new hash map implementation
  name-hash.c: use new hash map implementation for directories
  name-hash.c: remove unreferenced directory entries
  name-hash.c: use new hash map implementation for cache entries
  name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  read-cache.c: fix memory leaks caused by removed cache entries
  remove old hash.[ch] implementation

 Documentation/technical/api-hash.txt    |  52 -------
 Documentation/technical/api-hashmap.txt | 237 +++++++++++++++++++++++++++++
 Makefile                                |   5 +-
 builtin/describe.c                      |  53 +++----
 builtin/rm.c                            |   2 +-
 cache.h                                 |  18 +--
 diffcore-rename.c                       | 185 ++++++++---------------
 hash.c                                  | 110 --------------
 hash.h                                  |  50 -------
 hashmap.c                               | 212 ++++++++++++++++++++++++++
 hashmap.h                               |  72 +++++++++
 name-hash.c                             | 156 +++++++------------
 read-cache.c                            |   8 +-
 resolve-undo.c                          |   7 +-
 t/t0011-hashmap.sh                      | 236 +++++++++++++++++++++++++++++
 test-hashmap.c                          | 256 ++++++++++++++++++++++++++++++++
 unpack-trees.c                          |   3 +-
 17 files changed, 1179 insertions(+), 483 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 create mode 100644 Documentation/technical/api-hashmap.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c


---
Note: interdiff for patches 01 - 05 only (which have been part of v2):

git diff -p --stat kb/hashmap-v2..kb/hashmap-v3~6
 Documentation/technical/api-hashmap.txt | 237 ++++++++++++++++++++++++++++++++
 builtin/describe.c                      |  12 +-
 diffcore-rename.c                       |   6 +-
 hashmap.c                               |  23 ++--
 hashmap.h                               | 170 +++--------------------
 test-hashmap.c                          |  48 +++----
 6 files changed, 296 insertions(+), 200 deletions(-)

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
new file mode 100644
index 0000000..fc46e56
--- /dev/null
+++ b/Documentation/technical/api-hashmap.txt
@@ -0,0 +1,237 @@
+hashmap API
+===========
+
+The hashmap API is a generic implementation of hash-based key-value mappings.
+
+Data Structures
+---------------
+
+`struct hashmap`::
+
+	The hash table structure.
++
+The `size` member keeps track of the total number of entries. The `cmpfn`
+member is a function used to compare two entries for equality. The `table` and
+`tablesize` members store the hash table and its size, respectively.
+
+`struct hashmap_entry`::
+
+	An opaque structure representing an entry in the hash table, which must
+	be used as first member of user data structures. Ideally it should be
+	followed by an int-sized member to prevent unused memory on 64-bit
+	systems due to alignment.
++
+The `hash` member is the entry's hash code and the `next` member points to the
+next entry in case of collisions (i.e. if multiple entries map to the same
+bucket).
+
+`struct hashmap_iter`::
+
+	An iterator structure, to be used with hashmap_iter_* functions.
+
+Types
+-----
+
+`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
+
+	User-supplied function to test two hashmap entries for equality. Shall
+	return 0 if the entries are equal.
++
+This function is always called with non-NULL `entry` / `entry_or_key`
+parameters that have the same hash code. When looking up an entry, the `key`
+and `keydata` parameters to hashmap_get and hashmap_remove are always passed
+as second and third argument, respectively. Otherwise, `keydata` is NULL.
+
+`void (*hashmap_free_fn)(void *entry)`::
+
+	User-supplied function to free a hashmap_entry. If entries are simply
+	malloc'ed memory, use stdlib's free() function.
+
+Functions
+---------
+
+`unsigned int strhash(const char *buf)`::
+`unsigned int strihash(const char *buf)`::
+`unsigned int memhash(const void *buf, size_t len)`::
+`unsigned int memihash(const void *buf, size_t len)`::
+
+	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+	http://www.isthe.com/chongo/tech/comp/fnv).
++
+`strhash` and `strihash` take 0-terminated strings, while `memhash` and
+`memihash` operate on arbitrary-length memory.
++
+`strihash` and `memihash` are case insensitive versions.
+
+`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
+
+	Initializes a hashmap structure.
++
+`map` is the hashmap to initialize.
++
+The `equals_function` can be specified to compare two entries for equality.
+If NULL, entries are considered equal if their hash codes are equal.
++
+If the total number of entries is known in advance, the `initial_size`
+parameter may be used to preallocate a sufficiently large table and thus
+prevent expensive resizing. If 0, the table is dynamically resized.
+
+`void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)`::
+
+	Frees a hashmap structure and allocated memory.
++
+`map` is the hashmap to free.
++
+If `free_function` is specified (not NULL), it is called to free each
+hashmap_entry in the map. If entries are simply malloc'ed, use stdlib's free().
+
+`void hashmap_entry_init(void *entry, int hash)`::
+
+	Initializes a hashmap_entry structure.
++
+`entry` points to the entry to initialize.
++
+`hash` is the hash code of the entry.
+
+`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
+
+	Returns the hashmap entry for the specified key, or NULL if not found.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are passed
+to `hashmap_cmp_fn` to decide whether the entry matches the key.
+
+`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
+
+	Returns the next equal hashmap entry, or NULL if not found. This can be
+	used to iterate over duplicate entries (see `hashmap_add`).
++
+`map` is the hashmap structure.
++
+`entry` is the hashmap_entry to start the search from, obtained via a previous
+call to `hashmap_get` or `hashmap_get_next`.
+
+`void hashmap_add(struct hashmap *map, void *entry)`::
+
+	Adds a hashmap entry. This allows to add duplicate entries (i.e.
+	separate values with the same key according to hashmap_cmp_fn).
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add.
+
+`void *hashmap_put(struct hashmap *map, void *entry)`::
+
+	Adds or replaces a hashmap entry.
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add or update.
++
+Returns the previous entry, or NULL if not found (i.e. the entry was added).
+
+`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
+
+	Removes a hashmap entry matching the specified key.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are
+passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
++
+Returns the removed entry, or NULL if not found.
+
+`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
+`void *hashmap_iter_next(struct hashmap_iter *iter)`::
+`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
+
+	Used to iterate over all entries of a hashmap.
++
+`hashmap_iter_init` initializes a `hashmap_iter` structure.
++
+`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
+more entries.
++
+`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
+and returns the first entry, if any).
+
+Usage example
+-------------
+
+Here's a simple usage example that maps long keys to double values.
+[source,c]
+------------
+struct hashmap map;
+
+struct long2double {
+	struct hashmap_entry ent; /* must be the first member! */
+	long key;
+	double value;
+};
+
+static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
+{
+	return !(e1->key == e2->key);
+}
+
+void long2double_init()
+{
+	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
+}
+
+void long2double_free()
+{
+	hashmap_free(&map, free);
+}
+
+static struct long2double *find_entry(long key)
+{
+	struct long2double k;
+	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+	k.key = key;
+	return hashmap_get(&map, &k, NULL);
+}
+
+double get_value(long key)
+{
+	struct long2double *e = find_entry(key);
+	return e ? e->value : 0;
+}
+
+void set_value(long key, double value)
+{
+	struct long2double *e = find_entry(key);
+	if (!e) {
+		e = malloc(sizeof(struct long2double));
+		hashmap_entry_init(e, memhash(&key, sizeof(long)));
+		e->key = key;
+		hashmap_add(&map, e);
+	}
+	e->value = value;
+}
+------------
+
+Using variable-sized keys
+-------------------------
+
+The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
+`hashmap_entry` structure as key to find the correct entry. If the key data is
+variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
+to create a full-fledged entry structure on the heap and copy all the key data
+into the structure.
+
+In this case, the `keydata` parameter can be used to pass
+variable-sized key data directly to the comparison function, and the `key`
+parameter can be a stripped-down, fixed size entry structure allocated on the
+stack.
+
+See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/builtin/describe.c b/builtin/describe.c
index 5db5d89..ae8d32c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -50,9 +50,10 @@ static const char *prio_names[] = {
 	"head", "lightweight", "annotated",
 };
 
-static int commit_name_cmp(struct commit_name *cn1, struct commit_name *cn2)
+static int commit_name_cmp(const struct commit_name *cn1,
+		const struct commit_name *cn2, const void *peeled)
 {
-	return hashcmp(cn1->peeled, cn2->peeled);
+	return hashcmp(cn1->peeled, peeled ? peeled : cn2->peeled);
 }
 
 static inline unsigned int hash_sha1(const unsigned char *sha1)
@@ -65,9 +66,8 @@ static inline unsigned int hash_sha1(const unsigned char *sha1)
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
 	struct commit_name key;
-	hashmap_entry_init(&key, hash_sha1(peeled), 0);
-	hashcpy(key.peeled, peeled);
-	return hashmap_get(&names, &key);
+	hashmap_entry_init(&key, hash_sha1(peeled));
+	return hashmap_get(&names, &key, peeled);
 }
 
 static int replace_name(struct commit_name *e,
@@ -114,7 +114,7 @@ static void add_to_known_names(const char *path,
 		if (!e) {
 			e = xmalloc(sizeof(struct commit_name));
 			hashcpy(e->peeled, peeled);
-			hashmap_entry_init(e, hash_sha1(peeled), 0);
+			hashmap_entry_init(e, hash_sha1(peeled));
 			hashmap_add(&names, e);
 			e->path = NULL;
 		}
diff --git a/diffcore-rename.c b/diffcore-rename.c
index 2e70d31..c6bd776 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -274,8 +274,8 @@ static int find_identical_files(struct hashmap *srcs,
 	 * Find the best source match for specified destination.
 	 */
 	best = NULL;
-	hashmap_entry_init(&dst, hash_filespec(target), 0);
-	for (p = hashmap_get(srcs, &dst); p; p = hashmap_get_next(srcs, p)) {
+	hashmap_entry_init(&dst, hash_filespec(target));
+	for (p = hashmap_get(srcs, &dst, NULL); p; p = hashmap_get_next(srcs, p)) {
 		int score;
 		struct diff_filespec *source = p->filespec;
 
@@ -317,7 +317,7 @@ static void insert_file_table(struct hashmap *table, int index, struct diff_file
 	entry->index = index;
 	entry->filespec = filespec;
 
-	hashmap_entry_init(entry, hash_filespec(filespec), 0);
+	hashmap_entry_init(entry, hash_filespec(filespec));
 	hashmap_add(table, entry);
 }
 
diff --git a/hashmap.c b/hashmap.c
index 75a8578..3a9f8c1 100644
--- a/hashmap.c
+++ b/hashmap.c
@@ -59,9 +59,10 @@ unsigned int memihash(const void *buf, size_t len)
 #define HASHMAP_SHRINK_AT 6
 
 static inline int entry_equals(const struct hashmap *map,
-		const struct hashmap_entry *e1, const struct hashmap_entry *e2)
+		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
+		const void *keydata)
 {
-	return (e1 == e2) || (e1->hash == e2->hash && !(*map->cmpfn)(e1, e2));
+	return (e1 == e2) || (e1->hash == e2->hash && !(*map->cmpfn)(e1, e2, keydata));
 }
 
 static inline unsigned int bucket(const struct hashmap *map,
@@ -91,15 +92,15 @@ static void rehash(struct hashmap *map, unsigned int newsize)
 }
 
 static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
-		const struct hashmap_entry *key)
+		const struct hashmap_entry *key, const void *keydata)
 {
 	struct hashmap_entry **e = &map->table[bucket(map, key)];
-	while (*e && !entry_equals(map, *e, key))
+	while (*e && !entry_equals(map, *e, key, keydata))
 		e = &(*e)->next;
 	return e;
 }
 
-static int always_equal(const void *unused1, const void *unused2)
+static int always_equal(const void *unused1, const void *unused2, const void *unused3)
 {
 	return 0;
 }
@@ -132,16 +133,16 @@ void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)
 	memset(map, 0, sizeof(*map));
 }
 
-void *hashmap_get(const struct hashmap *map, const void *key)
+void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)
 {
-	return *find_entry_ptr(map, key);
+	return *find_entry_ptr(map, key, keydata);
 }
 
 void *hashmap_get_next(const struct hashmap *map, const void *entry)
 {
 	struct hashmap_entry *e = ((struct hashmap_entry*) entry)->next;
 	for (; e; e = e->next)
-		if (entry_equals(map, entry, e))
+		if (entry_equals(map, entry, e, NULL))
 			return e;
 	return NULL;
 }
@@ -160,10 +161,10 @@ void hashmap_add(struct hashmap *map, void *entry)
 		rehash(map, map->tablesize << HASHMAP_GROW);
 }
 
-void *hashmap_remove(struct hashmap *map, const void *key)
+void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
 {
 	struct hashmap_entry *old;
-	struct hashmap_entry **e = find_entry_ptr(map, key);
+	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
 	if (!*e)
 		return NULL;
 
@@ -182,7 +183,7 @@ void *hashmap_remove(struct hashmap *map, const void *key)
 
 void *hashmap_put(struct hashmap *map, void *entry)
 {
-	struct hashmap_entry *old = hashmap_remove(map, entry);
+	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
 	hashmap_add(map, entry);
 	return old;
 }
diff --git a/hashmap.h b/hashmap.h
index 98c4ebc..eea003a 100644
--- a/hashmap.h
+++ b/hashmap.h
@@ -2,194 +2,66 @@
 #define HASHMAP_H
 
 /*
- * Generic implementation of hash-based key value mappings.
- * Supports basic operations get, add/put, remove and iteration.
- *
- * Also contains a set of ready-to-use hash functions for strings, using the
- * FNV-1 algorithm (see http://www.isthe.com/chongo/tech/comp/fnv).
+ * Generic implementation of hash-based key-value mappings.
+ * See Documentation/technical/api-hashmap.txt.
  */
 
-/*
- * Case-sensitive FNV-1 hash of 0-terminated string.
- * str: the string
- * returns hash code
- */
-extern unsigned int strhash(const char *buf);
+/* FNV-1 functions */
 
-/*
- * Case-insensitive FNV-1 hash of 0-terminated string.
- * str: the string
- * returns hash code
- */
+extern unsigned int strhash(const char *buf);
 extern unsigned int strihash(const char *buf);
-
-/*
- * Case-sensitive FNV-1 hash of a memory block.
- * buf: start of the memory block
- * len: length of the memory block
- * returns hash code
- */
 extern unsigned int memhash(const void *buf, size_t len);
-
-/*
- * Case-insensitive FNV-1 hash of a memory block.
- * buf: start of the memory block
- * len: length of the memory block
- * returns hash code
- */
 extern unsigned int memihash(const void *buf, size_t len);
 
-/*
- * Hashmap entry data structure, must be used as first member of user data
- * structures. Consists of a pointer and an int. Ideally it should be followed
- * by an int-sized member to prevent unused memory on 64-bit systems due to
- * alignment.
- */
+/* data structures */
+
 struct hashmap_entry {
 	struct hashmap_entry *next;
 	unsigned int hash;
 };
 
-/*
- * User-supplied function to test two hashmap entries for equality, shall
- * return 0 if the entries are equal. This function is always called with
- * non-NULL parameters that have the same hash code. When looking up an entry,
- * the key parameter to hashmap_get and hashmap_remove is always passed as
- * second argument.
- */
-typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key);
-
-/*
- * User-supplied function to free a hashmap entry.
- */
+typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
+		const void *keydata);
 typedef void (*hashmap_free_fn)(void *entry);
 
-/*
- * Hashmap data structure, use with hashmap_* functions.
- */
 struct hashmap {
 	struct hashmap_entry **table;
 	hashmap_cmp_fn cmpfn;
 	unsigned int size, tablesize;
 };
 
-/*
- * Hashmap iterator data structure, use with hasmap_iter_* functions.
- */
 struct hashmap_iter {
 	struct hashmap *map;
 	struct hashmap_entry *next;
 	unsigned int tablepos;
 };
 
-/*
- * Initializes a hashmap_entry structure.
- * entry: pointer to the entry to initialize
- * hash: hash code of the entry
- * key_only: true if entry is a key-only structure, see hashmap_entry_is_key
- */
-static inline void hashmap_entry_init(void *entry, int hash, int key_only)
-{
-	struct hashmap_entry *e = entry;
-	e->hash = hash;
-	e->next = key_only ? (struct hashmap_entry*) -1 : NULL;
-}
-
-/*
- * Checks if hashmap_entry was initialized with the key_only flag. This is
- * useful if the entry structure is variable-sized (e.g. ending in a FLEX_ARRAY)
- * and the key is part of the variable portion. To prevent dynamic allocation of
- * a full-fledged entry structure for each lookup, a smaller, statically sized
- * structure can be used as key (i.e. replacing the FLEX_ARRAY member with a
- * char pointer). The hashmap_cmp_fn comparison function can then check whether
- * entry_or_key is a full-fledged entry or a key-only structure.
- * entry: pointer to the entry to check
- * returns 0 for key-value entries and non-0 for key-only entries
- */
-static inline int hashmap_entry_is_key(const void *entry)
-{
-	const struct hashmap_entry *e = entry;
-	return e->next == (struct hashmap_entry*) -1;
-}
+/* hashmap functions */
 
-/*
- * Initializes a hashmap structure.
- * map: hashmap to initialize
- * equals_function: optional function to test equality of hashmap entries. If
- *  NULL, entries are considered equal if their hash codes are equal.
- * initial_size: optional number of initial entries, 0 if unknown
- */
 extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
 		size_t initial_size);
-
-/*
- * Frees a hashmap structure and allocated memory.
- * map: hashmap to free
- * free_function: optional function to free the hashmap entries
- */
 extern void hashmap_free(struct hashmap *map, hashmap_free_fn free_function);
 
-/*
- * Returns the hashmap entry for the specified key, or NULL if not found.
- * map: the hashmap
- * key: key of the entry to look up
- * returns matching hashmap entry, or NULL if not found
- */
-extern void *hashmap_get(const struct hashmap *map, const void *key);
+/* hashmap_entry functions */
 
-/*
- * Returns the next equal hashmap entry if the map contains duplicates (see
- * hashmap_add).
- * map: the hashmap
- * entry: current entry, obtained via hashmap_get or hashmap_get_next
- * returns next equal hashmap entry, or NULL if not found
- */
+static inline void hashmap_entry_init(void *entry, int hash)
+{
+	struct hashmap_entry *e = entry;
+	e->hash = hash;
+	e->next = NULL;
+}
+extern void *hashmap_get(const struct hashmap *map, const void *key,
+		const void *keydata);
 extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
-
-/*
- * Adds a hashmap entry. This allows to add duplicate entries (i.e. separate
- * values with the same key according to hashmap_cmp_fn).
- * map: the hashmap
- * entry: the entry to add
- */
 extern void hashmap_add(struct hashmap *map, void *entry);
-
-/*
- * Adds or replaces a hashmap entry.
- * map: the hashmap
- * entry: the entry to add or replace
- * returns previous entry, or NULL if the entry is new
- */
 extern void *hashmap_put(struct hashmap *map, void *entry);
+extern void *hashmap_remove(struct hashmap *map, const void *key,
+		const void *keydata);
 
-/*
- * Removes a hashmap entry matching the specified key.
- * map: the hashmap
- * key: key of the entry to remove
- * returns removed entry, or NULL if not found
- */
-extern void *hashmap_remove(struct hashmap *map, const void *key);
+/* hashmap_iter functions */
 
-/*
- * Initializes a hashmap iterator structure.
- * map: the hashmap
- * iter: hashmap iterator structure
- */
 extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
-
-/**
- * Returns the next hashmap entry.
- * iter: hashmap iterator
- * returns next entry, or NULL if there are no more entries
- */
 extern void *hashmap_iter_next(struct hashmap_iter *iter);
-
-/**
- * Initializes a hashmap iterator and returns the first hashmap entry.
- * map: the hashmap
- * iter: hashmap iterator
- * returns first entry, or NULL if there are no entries
- */
 static inline void *hashmap_iter_first(struct hashmap *map,
 		struct hashmap_iter *iter)
 {
diff --git a/test-hashmap.c b/test-hashmap.c
index de94c6d..2d2b834 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -9,32 +9,21 @@ struct test_entry
 	char key[FLEX_ARRAY];
 };
 
-struct test_key
-{
-	struct hashmap_entry ent;
-	char *key;
-};
-
-static const char *get_key(const struct test_entry *e)
-{
-	return hashmap_entry_is_key(e) ? ((struct test_key*) e)->key : e->key;
-}
-
 static const char *get_value(const struct test_entry *e)
 {
 	return e->key + strlen(e->key) + 1;
 }
 
 static int test_entry_cmp(const struct test_entry *e1,
-		const struct test_entry *e2)
+		const struct test_entry *e2, const char* key)
 {
-	return strcmp(e1->key, get_key(e2));
+	return strcmp(e1->key, key ? key : e2->key);
 }
 
 static int test_entry_cmp_icase(const struct test_entry *e1,
-		const struct test_entry *e2)
+		const struct test_entry *e2, const char* key)
 {
-	return strcasecmp(e1->key, get_key(e2));
+	return strcasecmp(e1->key, key ? key : e2->key);
 }
 
 static struct test_entry *alloc_test_entry(int hash, char *key, int klen,
@@ -42,7 +31,7 @@ static struct test_entry *alloc_test_entry(int hash, char *key, int klen,
 {
 	struct test_entry *entry = malloc(sizeof(struct test_entry) + klen
 			+ vlen + 2);
-	hashmap_entry_init(entry, hash, 0);
+	hashmap_entry_init(entry, hash);
 	memcpy(entry->key, key, klen + 1);
 	memcpy(entry->key + klen + 1, value, vlen + 1);
 	return entry;
@@ -108,7 +97,7 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 
 			/* add entries */
 			for (i = 0; i < TEST_SIZE; i++) {
-				hashmap_entry_init(entries[i], hashes[i], 0);
+				hashmap_entry_init(entries[i], hashes[i]);
 				hashmap_add(&map, entries[i]);
 			}
 
@@ -121,16 +110,15 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 		/* fill the map (sparsely if specified) */
 		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
 		for (i = 0; i < j; i++) {
-			hashmap_entry_init(entries[i], hashes[i], 0);
+			hashmap_entry_init(entries[i], hashes[i]);
 			hashmap_add(&map, entries[i]);
 		}
 
 		for (j = 0; j < rounds; j++) {
 			for (i = 0; i < TEST_SIZE; i++) {
-				struct test_key k;
-				hashmap_entry_init(&k, hashes[i], 1);
-				k.key = entries[i]->key;
-				hashmap_get(&map, &k);
+				struct hashmap_entry key;
+				hashmap_entry_init(&key, hashes[i]);
+				hashmap_get(&map, &key, entries[i]->key);
 			}
 		}
 
@@ -293,12 +281,11 @@ int main(int argc, char *argv[])
 		} else if (!strcmp("get", cmd) && l1) {
 
 			/* setup static key */
-			struct test_key key;
-			hashmap_entry_init(&key, hash, 1);
-			key.key = p1;
+			struct hashmap_entry key;
+			hashmap_entry_init(&key, hash);
 
 			/* lookup entry in hashmap */
-			entry = hashmap_get(&map, &key);
+			entry = hashmap_get(&map, &key, p1);
 
 			/* print result */
 			if (!entry)
@@ -311,12 +298,11 @@ int main(int argc, char *argv[])
 		} else if (!strcmp("remove", cmd) && l1) {
 
 			/* setup static key */
-			struct test_key key;
-			hashmap_entry_init(&key, hash, 1);
-			key.key = p1;
+			struct hashmap_entry key;
+			hashmap_entry_init(&key, hash);
 
 			/* remove entry from hashmap */
-			entry = hashmap_remove(&map, &key);
+			entry = hashmap_remove(&map, &key, p1);
 
 			/* print result and free entry*/
 			puts(entry ? get_value(entry) : "NULL");
@@ -327,7 +313,7 @@ int main(int argc, char *argv[])
 			struct hashmap_iter iter;
 			hashmap_iter_init(&map, &iter);
 			while ((entry = hashmap_iter_next(&iter)))
-				printf("%s %s\n", get_key(entry), get_value(entry));
+				printf("%s %s\n", entry->key, get_value(entry));
 
 		} else if (!strcmp("size", cmd)) {
 

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 01/11] add a hashtable implementation that supports O(1) removal
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
@ 2013-10-01  9:34 ` Karsten Blees
  2013-10-01  9:35 ` [PATCH v3 02/11] buitin/describe.c: use new hash map implementation Karsten Blees
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:34 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.

Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.

Supports all basic operations init, free, get, add, remove and iteration.

Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).

The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.

The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.

Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 Documentation/technical/api-hashmap.txt | 237 ++++++++++++++++++++++
 Makefile                                |   3 +
 hashmap.c                               | 212 ++++++++++++++++++++
 hashmap.h                               |  72 +++++++
 t/t0011-hashmap.sh                      | 236 ++++++++++++++++++++++
 test-hashmap.c                          | 340 ++++++++++++++++++++++++++++++++
 6 files changed, 1100 insertions(+)
 create mode 100644 Documentation/technical/api-hashmap.txt
 create mode 100644 hashmap.c
 create mode 100644 hashmap.h
 create mode 100755 t/t0011-hashmap.sh
 create mode 100644 test-hashmap.c

diff --git a/Documentation/technical/api-hashmap.txt b/Documentation/technical/api-hashmap.txt
new file mode 100644
index 0000000..fc46e56
--- /dev/null
+++ b/Documentation/technical/api-hashmap.txt
@@ -0,0 +1,237 @@
+hashmap API
+===========
+
+The hashmap API is a generic implementation of hash-based key-value mappings.
+
+Data Structures
+---------------
+
+`struct hashmap`::
+
+	The hash table structure.
++
+The `size` member keeps track of the total number of entries. The `cmpfn`
+member is a function used to compare two entries for equality. The `table` and
+`tablesize` members store the hash table and its size, respectively.
+
+`struct hashmap_entry`::
+
+	An opaque structure representing an entry in the hash table, which must
+	be used as first member of user data structures. Ideally it should be
+	followed by an int-sized member to prevent unused memory on 64-bit
+	systems due to alignment.
++
+The `hash` member is the entry's hash code and the `next` member points to the
+next entry in case of collisions (i.e. if multiple entries map to the same
+bucket).
+
+`struct hashmap_iter`::
+
+	An iterator structure, to be used with hashmap_iter_* functions.
+
+Types
+-----
+
+`int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key, const void *keydata)`::
+
+	User-supplied function to test two hashmap entries for equality. Shall
+	return 0 if the entries are equal.
++
+This function is always called with non-NULL `entry` / `entry_or_key`
+parameters that have the same hash code. When looking up an entry, the `key`
+and `keydata` parameters to hashmap_get and hashmap_remove are always passed
+as second and third argument, respectively. Otherwise, `keydata` is NULL.
+
+`void (*hashmap_free_fn)(void *entry)`::
+
+	User-supplied function to free a hashmap_entry. If entries are simply
+	malloc'ed memory, use stdlib's free() function.
+
+Functions
+---------
+
+`unsigned int strhash(const char *buf)`::
+`unsigned int strihash(const char *buf)`::
+`unsigned int memhash(const void *buf, size_t len)`::
+`unsigned int memihash(const void *buf, size_t len)`::
+
+	Ready-to-use hash functions for strings, using the FNV-1 algorithm (see
+	http://www.isthe.com/chongo/tech/comp/fnv).
++
+`strhash` and `strihash` take 0-terminated strings, while `memhash` and
+`memihash` operate on arbitrary-length memory.
++
+`strihash` and `memihash` are case insensitive versions.
+
+`void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function, size_t initial_size)`::
+
+	Initializes a hashmap structure.
++
+`map` is the hashmap to initialize.
++
+The `equals_function` can be specified to compare two entries for equality.
+If NULL, entries are considered equal if their hash codes are equal.
++
+If the total number of entries is known in advance, the `initial_size`
+parameter may be used to preallocate a sufficiently large table and thus
+prevent expensive resizing. If 0, the table is dynamically resized.
+
+`void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)`::
+
+	Frees a hashmap structure and allocated memory.
++
+`map` is the hashmap to free.
++
+If `free_function` is specified (not NULL), it is called to free each
+hashmap_entry in the map. If entries are simply malloc'ed, use stdlib's free().
+
+`void hashmap_entry_init(void *entry, int hash)`::
+
+	Initializes a hashmap_entry structure.
++
+`entry` points to the entry to initialize.
++
+`hash` is the hash code of the entry.
+
+`void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)`::
+
+	Returns the hashmap entry for the specified key, or NULL if not found.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are passed
+to `hashmap_cmp_fn` to decide whether the entry matches the key.
+
+`void *hashmap_get_next(const struct hashmap *map, const void *entry)`::
+
+	Returns the next equal hashmap entry, or NULL if not found. This can be
+	used to iterate over duplicate entries (see `hashmap_add`).
++
+`map` is the hashmap structure.
++
+`entry` is the hashmap_entry to start the search from, obtained via a previous
+call to `hashmap_get` or `hashmap_get_next`.
+
+`void hashmap_add(struct hashmap *map, void *entry)`::
+
+	Adds a hashmap entry. This allows to add duplicate entries (i.e.
+	separate values with the same key according to hashmap_cmp_fn).
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add.
+
+`void *hashmap_put(struct hashmap *map, void *entry)`::
+
+	Adds or replaces a hashmap entry.
++
+`map` is the hashmap structure.
++
+`entry` is the entry to add or update.
++
+Returns the previous entry, or NULL if not found (i.e. the entry was added).
+
+`void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)`::
+
+	Removes a hashmap entry matching the specified key.
++
+`map` is the hashmap structure.
++
+`key` is a hashmap_entry structure (or user data structure that starts with
+hashmap_entry) that has at least been initialized with the proper hash code
+(via `hashmap_entry_init`).
++
+If an entry with matching hash code is found, `key` and `keydata` are
+passed to `hashmap_cmp_fn` to decide whether the entry matches the key.
++
+Returns the removed entry, or NULL if not found.
+
+`void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)`::
+`void *hashmap_iter_next(struct hashmap_iter *iter)`::
+`void *hashmap_iter_first(struct hashmap *map, struct hashmap_iter *iter)`::
+
+	Used to iterate over all entries of a hashmap.
++
+`hashmap_iter_init` initializes a `hashmap_iter` structure.
++
+`hashmap_iter_next` returns the next hashmap_entry, or NULL if there are no
+more entries.
++
+`hashmap_iter_first` is a combination of both (i.e. initializes the iterator
+and returns the first entry, if any).
+
+Usage example
+-------------
+
+Here's a simple usage example that maps long keys to double values.
+[source,c]
+------------
+struct hashmap map;
+
+struct long2double {
+	struct hashmap_entry ent; /* must be the first member! */
+	long key;
+	double value;
+};
+
+static int long2double_cmp(const struct long2double *e1, const struct long2double *e2, const void *unused)
+{
+	return !(e1->key == e2->key);
+}
+
+void long2double_init()
+{
+	hashmap_init(&map, (hashmap_cmp_fn) long2double_cmp, 0);
+}
+
+void long2double_free()
+{
+	hashmap_free(&map, free);
+}
+
+static struct long2double *find_entry(long key)
+{
+	struct long2double k;
+	hashmap_entry_init(&k, memhash(&key, sizeof(long)));
+	k.key = key;
+	return hashmap_get(&map, &k, NULL);
+}
+
+double get_value(long key)
+{
+	struct long2double *e = find_entry(key);
+	return e ? e->value : 0;
+}
+
+void set_value(long key, double value)
+{
+	struct long2double *e = find_entry(key);
+	if (!e) {
+		e = malloc(sizeof(struct long2double));
+		hashmap_entry_init(e, memhash(&key, sizeof(long)));
+		e->key = key;
+		hashmap_add(&map, e);
+	}
+	e->value = value;
+}
+------------
+
+Using variable-sized keys
+-------------------------
+
+The `hashmap_entry_get` and `hashmap_entry_remove` functions expect an ordinary
+`hashmap_entry` structure as key to find the correct entry. If the key data is
+variable-sized (e.g. a FLEX_ARRAY string) or quite large, it is undesirable
+to create a full-fledged entry structure on the heap and copy all the key data
+into the structure.
+
+In this case, the `keydata` parameter can be used to pass
+variable-sized key data directly to the comparison function, and the `key`
+parameter can be a stripped-down, fixed size entry structure allocated on the
+stack.
+
+See test-hashmap.c for an example using arbitrary-length strings as keys.
diff --git a/Makefile b/Makefile
index 3588ca1..e6ad011 100644
--- a/Makefile
+++ b/Makefile
@@ -562,6 +562,7 @@ TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-genrandom
+TEST_PROGRAMS_NEED_X += test-hashmap
 TEST_PROGRAMS_NEED_X += test-index-version
 TEST_PROGRAMS_NEED_X += test-line-buffer
 TEST_PROGRAMS_NEED_X += test-match-trees
@@ -681,6 +682,7 @@ LIB_H += gpg-interface.h
 LIB_H += graph.h
 LIB_H += grep.h
 LIB_H += hash.h
+LIB_H += hashmap.h
 LIB_H += help.h
 LIB_H += http.h
 LIB_H += kwset.h
@@ -811,6 +813,7 @@ LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
 LIB_OBJS += grep.o
 LIB_OBJS += hash.o
+LIB_OBJS += hashmap.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
 LIB_OBJS += ident.o
diff --git a/hashmap.c b/hashmap.c
new file mode 100644
index 0000000..3a9f8c1
--- /dev/null
+++ b/hashmap.c
@@ -0,0 +1,212 @@
+/*
+ * Generic implementation of hash-based key value mappings.
+ */
+#include "cache.h"
+#include "hashmap.h"
+
+#define FNV32_BASE ((unsigned int) 0x811c9dc5)
+#define FNV32_PRIME ((unsigned int) 0x01000193)
+
+unsigned int strhash(const char *str)
+{
+	unsigned int c, hash = FNV32_BASE;
+	while ((c = (unsigned char) *str++))
+		hash = (hash * FNV32_PRIME) ^ c;
+	return hash;
+}
+
+unsigned int strihash(const char *str)
+{
+	unsigned int c, hash = FNV32_BASE;
+	while ((c = (unsigned char) *str++)) {
+		if (c >= 'a' && c <= 'z')
+			c -= 'a' - 'A';
+		hash = (hash * FNV32_PRIME) ^ c;
+	}
+	return hash;
+}
+
+unsigned int memhash(const void *buf, size_t len)
+{
+	unsigned int hash = FNV32_BASE;
+	unsigned char *ucbuf = (unsigned char*) buf;
+	while (len--) {
+		unsigned int c = *ucbuf++;
+		hash = (hash * FNV32_PRIME) ^ c;
+	}
+	return hash;
+}
+
+unsigned int memihash(const void *buf, size_t len)
+{
+	unsigned int hash = FNV32_BASE;
+	unsigned char *ucbuf = (unsigned char*) buf;
+	while (len--) {
+		unsigned int c = *ucbuf++;
+		if (c >= 'a' && c <= 'z')
+			c -= 'a' - 'A';
+		hash = (hash * FNV32_PRIME) ^ c;
+	}
+	return hash;
+}
+
+#define HASHMAP_INITIAL_SIZE 64
+/* grow / shrink by 2^2 */
+#define HASHMAP_GROW 2
+/* grow if > 80% full (to 20%) */
+#define HASHMAP_GROW_AT 1.25
+/* shrink if < 16.6% full (to 66.6%) */
+#define HASHMAP_SHRINK_AT 6
+
+static inline int entry_equals(const struct hashmap *map,
+		const struct hashmap_entry *e1, const struct hashmap_entry *e2,
+		const void *keydata)
+{
+	return (e1 == e2) || (e1->hash == e2->hash && !(*map->cmpfn)(e1, e2, keydata));
+}
+
+static inline unsigned int bucket(const struct hashmap *map,
+		const struct hashmap_entry *key)
+{
+	return key->hash & (map->tablesize - 1);
+}
+
+static void rehash(struct hashmap *map, unsigned int newsize)
+{
+	unsigned int i, oldsize = map->tablesize;
+	struct hashmap_entry **oldtable = map->table;
+
+	map->tablesize = newsize;
+	map->table = xcalloc(sizeof(struct hashmap_entry*), map->tablesize);
+	for (i = 0; i < oldsize; i++) {
+		struct hashmap_entry *e = oldtable[i];
+		while (e) {
+			struct hashmap_entry *next = e->next;
+			unsigned int b = bucket(map, e);
+			e->next = map->table[b];
+			map->table[b] = e;
+			e = next;
+		}
+	}
+	free(oldtable);
+}
+
+static inline struct hashmap_entry **find_entry_ptr(const struct hashmap *map,
+		const struct hashmap_entry *key, const void *keydata)
+{
+	struct hashmap_entry **e = &map->table[bucket(map, key)];
+	while (*e && !entry_equals(map, *e, key, keydata))
+		e = &(*e)->next;
+	return e;
+}
+
+static int always_equal(const void *unused1, const void *unused2, const void *unused3)
+{
+	return 0;
+}
+
+void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
+		size_t initial_size)
+{
+	map->size = 0;
+	map->cmpfn = equals_function ? equals_function : always_equal;
+	/* calculate initial table size and allocate the table */
+	map->tablesize = HASHMAP_INITIAL_SIZE;
+	initial_size *= HASHMAP_GROW_AT;
+	while (initial_size > map->tablesize)
+		map->tablesize <<= HASHMAP_GROW;
+	map->table = xcalloc(sizeof(struct hashmap_entry*), map->tablesize);
+}
+
+void hashmap_free(struct hashmap *map, hashmap_free_fn free_function)
+{
+	if (!map || !map->table)
+		return;
+	if (free_function) {
+		struct hashmap_iter iter;
+		struct hashmap_entry *e;
+		hashmap_iter_init(map, &iter);
+		while ((e = hashmap_iter_next(&iter)))
+			(*free_function)(e);
+	}
+	free(map->table);
+	memset(map, 0, sizeof(*map));
+}
+
+void *hashmap_get(const struct hashmap *map, const void *key, const void *keydata)
+{
+	return *find_entry_ptr(map, key, keydata);
+}
+
+void *hashmap_get_next(const struct hashmap *map, const void *entry)
+{
+	struct hashmap_entry *e = ((struct hashmap_entry*) entry)->next;
+	for (; e; e = e->next)
+		if (entry_equals(map, entry, e, NULL))
+			return e;
+	return NULL;
+}
+
+void hashmap_add(struct hashmap *map, void *entry)
+{
+	unsigned int b = bucket(map, entry);
+
+	/* add entry */
+	((struct hashmap_entry*) entry)->next = map->table[b];
+	map->table[b] = entry;
+
+	/* fix size and rehash if appropriate */
+	map->size++;
+	if (map->size * HASHMAP_GROW_AT > map->tablesize)
+		rehash(map, map->tablesize << HASHMAP_GROW);
+}
+
+void *hashmap_remove(struct hashmap *map, const void *key, const void *keydata)
+{
+	struct hashmap_entry *old;
+	struct hashmap_entry **e = find_entry_ptr(map, key, keydata);
+	if (!*e)
+		return NULL;
+
+	/* remove existing entry */
+	old = *e;
+	*e = old->next;
+	old->next = NULL;
+
+	/* fix size and rehash if appropriate */
+	map->size--;
+	if (map->tablesize > HASHMAP_INITIAL_SIZE &&
+	    map->size * HASHMAP_SHRINK_AT < map->tablesize)
+		rehash(map, map->tablesize >> HASHMAP_GROW);
+	return old;
+}
+
+void *hashmap_put(struct hashmap *map, void *entry)
+{
+	struct hashmap_entry *old = hashmap_remove(map, entry, NULL);
+	hashmap_add(map, entry);
+	return old;
+}
+
+void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter)
+{
+	iter->map = map;
+	iter->tablepos = 0;
+	iter->next = NULL;
+}
+
+void *hashmap_iter_next(struct hashmap_iter *iter)
+{
+	struct hashmap_entry *current = iter->next;
+	for (;;) {
+		if (current) {
+			iter->next = current->next;
+			return current;
+		}
+
+		if (iter->tablepos >= iter->map->tablesize)
+			return NULL;
+
+		current = iter->map->table[iter->tablepos++];
+	}
+}
diff --git a/hashmap.h b/hashmap.h
new file mode 100644
index 0000000..eea003a
--- /dev/null
+++ b/hashmap.h
@@ -0,0 +1,72 @@
+#ifndef HASHMAP_H
+#define HASHMAP_H
+
+/*
+ * Generic implementation of hash-based key-value mappings.
+ * See Documentation/technical/api-hashmap.txt.
+ */
+
+/* FNV-1 functions */
+
+extern unsigned int strhash(const char *buf);
+extern unsigned int strihash(const char *buf);
+extern unsigned int memhash(const void *buf, size_t len);
+extern unsigned int memihash(const void *buf, size_t len);
+
+/* data structures */
+
+struct hashmap_entry {
+	struct hashmap_entry *next;
+	unsigned int hash;
+};
+
+typedef int (*hashmap_cmp_fn)(const void *entry, const void *entry_or_key,
+		const void *keydata);
+typedef void (*hashmap_free_fn)(void *entry);
+
+struct hashmap {
+	struct hashmap_entry **table;
+	hashmap_cmp_fn cmpfn;
+	unsigned int size, tablesize;
+};
+
+struct hashmap_iter {
+	struct hashmap *map;
+	struct hashmap_entry *next;
+	unsigned int tablepos;
+};
+
+/* hashmap functions */
+
+extern void hashmap_init(struct hashmap *map, hashmap_cmp_fn equals_function,
+		size_t initial_size);
+extern void hashmap_free(struct hashmap *map, hashmap_free_fn free_function);
+
+/* hashmap_entry functions */
+
+static inline void hashmap_entry_init(void *entry, int hash)
+{
+	struct hashmap_entry *e = entry;
+	e->hash = hash;
+	e->next = NULL;
+}
+extern void *hashmap_get(const struct hashmap *map, const void *key,
+		const void *keydata);
+extern void *hashmap_get_next(const struct hashmap *map, const void *entry);
+extern void hashmap_add(struct hashmap *map, void *entry);
+extern void *hashmap_put(struct hashmap *map, void *entry);
+extern void *hashmap_remove(struct hashmap *map, const void *key,
+		const void *keydata);
+
+/* hashmap_iter functions */
+
+extern void hashmap_iter_init(struct hashmap *map, struct hashmap_iter *iter);
+extern void *hashmap_iter_next(struct hashmap_iter *iter);
+static inline void *hashmap_iter_first(struct hashmap *map,
+		struct hashmap_iter *iter)
+{
+	hashmap_iter_init(map, iter);
+	return hashmap_iter_next(iter);
+}
+
+#endif
diff --git a/t/t0011-hashmap.sh b/t/t0011-hashmap.sh
new file mode 100755
index 0000000..6c699d5
--- /dev/null
+++ b/t/t0011-hashmap.sh
@@ -0,0 +1,236 @@
+#!/bin/sh
+
+test_description='test hashmap and string hash functions'
+. ./test-lib.sh
+
+test_hashmap() {
+	echo "$1" | test-hashmap $3 > actual &&
+	echo "$2" > expect &&
+	test_cmp expect actual
+}
+
+test_expect_success 'hash functions' '
+
+test_hashmap "hash key1" "2215982743 2215982743 116372151 116372151" &&
+test_hashmap "hash key2" "2215982740 2215982740 116372148 116372148" &&
+test_hashmap "hash fooBarFrotz" "1383912807 1383912807 3189766727 3189766727" &&
+test_hashmap "hash foobarfrotz" "2862305959 2862305959 3189766727 3189766727"
+
+'
+
+test_expect_success 'put' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+size" "NULL
+NULL
+NULL
+NULL
+64 4"
+
+'
+
+test_expect_success 'put (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+size" "NULL
+NULL
+NULL
+64 3" ignorecase
+
+'
+
+test_expect_success 'replace' '
+
+test_hashmap "put key1 value1
+put key1 value2
+put fooBarFrotz value3
+put fooBarFrotz value4
+size" "NULL
+value1
+NULL
+value3
+64 2"
+
+'
+
+test_expect_success 'replace (case insensitive)' '
+
+test_hashmap "put key1 value1
+put Key1 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+size" "NULL
+value1
+NULL
+value3
+64 2" ignorecase
+
+'
+
+test_expect_success 'get' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+put foobarfrotz value4
+get key1
+get key2
+get fooBarFrotz
+get notInMap" "NULL
+NULL
+NULL
+NULL
+value1
+value2
+value3
+NULL"
+
+'
+
+test_expect_success 'get (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+get Key1
+get keY2
+get foobarfrotz
+get notInMap" "NULL
+NULL
+NULL
+value1
+value2
+value3
+NULL" ignorecase
+
+'
+
+test_expect_success 'add' '
+
+test_hashmap "add key1 value1
+add key1 value2
+add fooBarFrotz value3
+add fooBarFrotz value4
+get key1
+get fooBarFrotz
+get notInMap" "value2
+value1
+value4
+value3
+NULL"
+
+'
+
+test_expect_success 'add (case insensitive)' '
+
+test_hashmap "add key1 value1
+add Key1 value2
+add fooBarFrotz value3
+add foobarfrotz value4
+get key1
+get Foobarfrotz
+get notInMap" "value2
+value1
+value4
+value3
+NULL" ignorecase
+
+'
+
+test_expect_success 'remove' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+remove key1
+remove key2
+remove notInMap
+size" "NULL
+NULL
+NULL
+value1
+value2
+NULL
+64 1"
+
+'
+
+test_expect_success 'remove (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+remove Key1
+remove keY2
+remove notInMap
+size" "NULL
+NULL
+NULL
+value1
+value2
+NULL
+64 1" ignorecase
+
+'
+
+test_expect_success 'iterate' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+iterate" "NULL
+NULL
+NULL
+key2 value2
+key1 value1
+fooBarFrotz value3"
+
+'
+
+test_expect_success 'iterate (case insensitive)' '
+
+test_hashmap "put key1 value1
+put key2 value2
+put fooBarFrotz value3
+iterate" "NULL
+NULL
+NULL
+fooBarFrotz value3
+key2 value2
+key1 value1" ignorecase
+
+'
+
+test_expect_success 'grow / shrink' '
+
+	rm -f in &&
+	rm -f expect &&
+	for n in $(test_seq 51)
+	do
+		echo put key$n value$n >> in &&
+		echo NULL >> expect
+	done &&
+	echo size >> in &&
+	echo 64 51 >> expect &&
+	echo put key52 value52 >> in &&
+	echo NULL >> expect
+	echo size >> in &&
+	echo 256 52 >> expect &&
+	for n in $(test_seq 10)
+	do
+		echo remove key$n >> in &&
+		echo value$n >> expect
+	done &&
+	echo size >> in &&
+	echo 64 42 >> expect &&
+	cat in | test-hashmap > out &&
+	test_cmp expect out
+
+'
+
+test_done
diff --git a/test-hashmap.c b/test-hashmap.c
new file mode 100644
index 0000000..2d2b834
--- /dev/null
+++ b/test-hashmap.c
@@ -0,0 +1,340 @@
+#include "cache.h"
+#include "hashmap.h"
+#include <stdio.h>
+
+struct test_entry
+{
+	struct hashmap_entry ent;
+	/* key and value as two \0-terminated strings */
+	char key[FLEX_ARRAY];
+};
+
+static const char *get_value(const struct test_entry *e)
+{
+	return e->key + strlen(e->key) + 1;
+}
+
+static int test_entry_cmp(const struct test_entry *e1,
+		const struct test_entry *e2, const char* key)
+{
+	return strcmp(e1->key, key ? key : e2->key);
+}
+
+static int test_entry_cmp_icase(const struct test_entry *e1,
+		const struct test_entry *e2, const char* key)
+{
+	return strcasecmp(e1->key, key ? key : e2->key);
+}
+
+static struct test_entry *alloc_test_entry(int hash, char *key, int klen,
+		char *value, int vlen)
+{
+	struct test_entry *entry = malloc(sizeof(struct test_entry) + klen
+			+ vlen + 2);
+	hashmap_entry_init(entry, hash);
+	memcpy(entry->key, key, klen + 1);
+	memcpy(entry->key + klen + 1, value, vlen + 1);
+	return entry;
+}
+
+#define HASH_METHOD_FNV 0
+#define HASH_METHOD_I 1
+#define HASH_METHOD_IDIV10 2
+#define HASH_METHOD_0 3
+#define HASH_METHOD_X2 4
+#define TEST_SPARSE 8
+#define TEST_ADD 16
+#define TEST_SIZE 100000
+
+static unsigned int hash(unsigned int method, unsigned int i, const char *key)
+{
+	unsigned int hash;
+	switch (method & 3)
+	{
+	case HASH_METHOD_FNV:
+		hash = strhash(key);
+		break;
+	case HASH_METHOD_I:
+		hash = i;
+		break;
+	case HASH_METHOD_IDIV10:
+		hash = i / 10;
+		break;
+	case HASH_METHOD_0:
+		hash = 0;
+		break;
+	}
+
+	if (method & HASH_METHOD_X2)
+		hash = 2 * hash;
+	return hash;
+}
+
+/*
+ * Test insert performance of hashmap.[ch]
+ * Usage: time echo "perfhashmap method rounds" | test-hashmap
+ */
+static void perf_hashmap(unsigned int method, unsigned int rounds)
+{
+	struct hashmap map;
+	char buf[16];
+	struct test_entry **entries;
+	unsigned int *hashes;
+	unsigned int i, j;
+
+	entries = malloc(TEST_SIZE * sizeof(struct test_entry*));
+	hashes = malloc(TEST_SIZE * sizeof(int));
+	for (i = 0; i < TEST_SIZE; i++) {
+		snprintf(buf, sizeof(buf), "%i", i);
+		entries[i] = alloc_test_entry(0, buf, strlen(buf), "", 0);
+		hashes[i] = hash(method, i, entries[i]->key);
+	}
+
+	if (method & TEST_ADD) {
+		/* test adding to the map */
+		for (j = 0; j < rounds; j++) {
+			hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+
+			/* add entries */
+			for (i = 0; i < TEST_SIZE; i++) {
+				hashmap_entry_init(entries[i], hashes[i]);
+				hashmap_add(&map, entries[i]);
+			}
+
+			hashmap_free(&map, NULL);
+		}
+	} else {
+		/* test map lookups */
+		hashmap_init(&map, (hashmap_cmp_fn) test_entry_cmp, 0);
+
+		/* fill the map (sparsely if specified) */
+		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
+		for (i = 0; i < j; i++) {
+			hashmap_entry_init(entries[i], hashes[i]);
+			hashmap_add(&map, entries[i]);
+		}
+
+		for (j = 0; j < rounds; j++) {
+			for (i = 0; i < TEST_SIZE; i++) {
+				struct hashmap_entry key;
+				hashmap_entry_init(&key, hashes[i]);
+				hashmap_get(&map, &key, entries[i]->key);
+			}
+		}
+
+		hashmap_free(&map, NULL);
+	}
+}
+
+struct hash_entry
+{
+	struct hash_entry *next;
+	char key[FLEX_ARRAY];
+};
+
+/*
+ * Test insert performance of hash.[ch]
+ * Usage: time echo "perfhash method rounds" | test-hashmap
+ */
+static void perf_hash(unsigned int method, unsigned int rounds)
+{
+	struct hash_table map;
+	char buf[16];
+	struct hash_entry **entries, **res, *entry;
+	unsigned int *hashes;
+	unsigned int i, j;
+
+	entries = malloc(TEST_SIZE * sizeof(struct hash_entry*));
+	hashes = malloc(TEST_SIZE * sizeof(int));
+	for (i = 0; i < TEST_SIZE; i++) {
+		snprintf(buf, sizeof(buf), "%i", i);
+		entries[i] = malloc(sizeof(struct hash_entry) + strlen(buf) + 1);
+		strcpy(entries[i]->key, buf);
+		hashes[i] = hash(method, i, entries[i]->key);
+	}
+
+	if (method & TEST_ADD) {
+		/* test adding to the map */
+		for (j = 0; j < rounds; j++) {
+			init_hash(&map);
+
+			/* add entries */
+			for (i = 0; i < TEST_SIZE; i++) {
+				res = (struct hash_entry**) insert_hash(
+						hashes[i], entries[i], &map);
+				if (res) {
+					entries[i]->next = *res;
+					*res = entries[i];
+				} else {
+					entries[i]->next = NULL;
+				}
+			}
+
+			free_hash(&map);
+		}
+	} else {
+		/* test map lookups */
+		init_hash(&map);
+
+		/* fill the map (sparsely if specified) */
+		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
+		for (i = 0; i < j; i++) {
+			res = (struct hash_entry**) insert_hash(hashes[i],
+					entries[i], &map);
+			if (res) {
+				entries[i]->next = *res;
+				*res = entries[i];
+			} else {
+				entries[i]->next = NULL;
+			}
+		}
+
+		for (j = 0; j < rounds; j++) {
+			for (i = 0; i < TEST_SIZE; i++) {
+				entry = lookup_hash(hashes[i], &map);
+				while (entry) {
+					if (!strcmp(entries[i]->key, entry->key))
+						break;
+					entry = entry->next;
+				}
+			}
+		}
+
+		free_hash(&map);
+
+	}
+}
+
+#define DELIM " \t\r\n"
+
+/*
+ * Read stdin line by line and print result of commands to stdout:
+ *
+ * hash key -> strhash(key) memhash(key) strihash(key) memihash(key)
+ * put key value -> NULL / old value
+ * get key -> NULL / value
+ * remove key -> NULL / old value
+ * iterate -> key1 value1\nkey2 value2\n...
+ * size -> tablesize numentries
+ *
+ * perfhashmap method rounds -> test hashmap.[ch] performance
+ * perfhash method rounds -> test hash.[ch] performance
+ */
+int main(int argc, char *argv[])
+{
+	char line[1024];
+	struct hashmap map;
+	int icase;
+
+	/* init hash map */
+	icase = argc > 1 && !strcmp("ignorecase", argv[1]);
+	hashmap_init(&map, (hashmap_cmp_fn) (icase ? test_entry_cmp_icase
+			: test_entry_cmp), 0);
+
+	/* process commands from stdin */
+	while (fgets(line, sizeof(line), stdin)) {
+		char *cmd, *p1 = NULL, *p2 = NULL;
+		int l1 = 0, l2 = 0, hash = 0;
+		struct test_entry *entry;
+
+		/* break line into command and up to two parameters */
+		cmd = strtok(line, DELIM);
+		/* ignore empty lines */
+		if (!cmd || *cmd == '#')
+			continue;
+
+		p1 = strtok(NULL, DELIM);
+		if (p1) {
+			l1 = strlen(p1);
+			hash = icase ? strihash(p1) : strhash(p1);
+			p2 = strtok(NULL, DELIM);
+			if (p2)
+				l2 = strlen(p2);
+		}
+
+		if (!strcmp("hash", cmd) && l1) {
+
+			/* print results of different hash functions */
+			printf("%u %u %u %u\n", strhash(p1), memhash(p1, l1),
+					strihash(p1), memihash(p1, l1));
+
+		} else if (!strcmp("add", cmd) && l1 && l2) {
+
+			/* create entry with key = p1, value = p2 */
+			entry = alloc_test_entry(hash, p1, l1, p2, l2);
+
+			/* add to hashmap */
+			hashmap_add(&map, entry);
+
+		} else if (!strcmp("put", cmd) && l1 && l2) {
+
+			/* create entry with key = p1, value = p2 */
+			entry = alloc_test_entry(hash, p1, l1, p2, l2);
+
+			/* add / replace entry */
+			entry = hashmap_put(&map, entry);
+
+			/* print and free replaced entry, if any */
+			puts(entry ? get_value(entry) : "NULL");
+			free(entry);
+
+		} else if (!strcmp("get", cmd) && l1) {
+
+			/* setup static key */
+			struct hashmap_entry key;
+			hashmap_entry_init(&key, hash);
+
+			/* lookup entry in hashmap */
+			entry = hashmap_get(&map, &key, p1);
+
+			/* print result */
+			if (!entry)
+				puts("NULL");
+			while (entry) {
+				puts(get_value(entry));
+				entry = hashmap_get_next(&map, entry);
+			}
+
+		} else if (!strcmp("remove", cmd) && l1) {
+
+			/* setup static key */
+			struct hashmap_entry key;
+			hashmap_entry_init(&key, hash);
+
+			/* remove entry from hashmap */
+			entry = hashmap_remove(&map, &key, p1);
+
+			/* print result and free entry*/
+			puts(entry ? get_value(entry) : "NULL");
+			free(entry);
+
+		} else if (!strcmp("iterate", cmd)) {
+
+			struct hashmap_iter iter;
+			hashmap_iter_init(&map, &iter);
+			while ((entry = hashmap_iter_next(&iter)))
+				printf("%s %s\n", entry->key, get_value(entry));
+
+		} else if (!strcmp("size", cmd)) {
+
+			/* print table sizes */
+			printf("%u %u\n", map.tablesize, map.size);
+
+		} else if (!strcmp("perfhashmap", cmd) && l1 && l2) {
+
+			perf_hashmap(atoi(p1), atoi(p2));
+
+		} else if (!strcmp("perfhash", cmd) && l1 && l2) {
+
+			perf_hash(atoi(p1), atoi(p2));
+
+		} else {
+
+			printf("Unknown command %s\n", cmd);
+
+		}
+	}
+
+	hashmap_free(&map, free);
+	return 0;
+}
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 02/11] buitin/describe.c: use new hash map implementation
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
  2013-10-01  9:34 ` [PATCH v3 01/11] add a hashtable implementation that supports O(1) removal Karsten Blees
@ 2013-10-01  9:35 ` Karsten Blees
  2013-10-01  9:36 ` [PATCH v3 03/11] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:35 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 builtin/describe.c | 53 ++++++++++++++++++++++++-----------------------------
 1 file changed, 24 insertions(+), 29 deletions(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 7d73722..ae8d32c 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -6,7 +6,7 @@
 #include "exec_cmd.h"
 #include "parse-options.h"
 #include "diff.h"
-#include "hash.h"
+#include "hashmap.h"
 #include "argv-array.h"
 
 #define SEEN		(1u<<0)
@@ -25,7 +25,7 @@ static int longformat;
 static int first_parent;
 static int abbrev = -1; /* unspecified */
 static int max_candidates = 10;
-static struct hash_table names;
+static struct hashmap names;
 static int have_util;
 static const char *pattern;
 static int always;
@@ -38,7 +38,7 @@ static const char *diff_index_args[] = {
 
 
 struct commit_name {
-	struct commit_name *next;
+	struct hashmap_entry entry;
 	unsigned char peeled[20];
 	struct tag *tag;
 	unsigned prio:2; /* annotated tag = 2, tag = 1, head = 0 */
@@ -50,6 +50,12 @@ static const char *prio_names[] = {
 	"head", "lightweight", "annotated",
 };
 
+static int commit_name_cmp(const struct commit_name *cn1,
+		const struct commit_name *cn2, const void *peeled)
+{
+	return hashcmp(cn1->peeled, peeled ? peeled : cn2->peeled);
+}
+
 static inline unsigned int hash_sha1(const unsigned char *sha1)
 {
 	unsigned int hash;
@@ -59,21 +65,9 @@ static inline unsigned int hash_sha1(const unsigned char *sha1)
 
 static inline struct commit_name *find_commit_name(const unsigned char *peeled)
 {
-	struct commit_name *n = lookup_hash(hash_sha1(peeled), &names);
-	while (n && !!hashcmp(peeled, n->peeled))
-		n = n->next;
-	return n;
-}
-
-static int set_util(void *chain, void *data)
-{
-	struct commit_name *n;
-	for (n = chain; n; n = n->next) {
-		struct commit *c = lookup_commit_reference_gently(n->peeled, 1);
-		if (c)
-			c->util = n;
-	}
-	return 0;
+	struct commit_name key;
+	hashmap_entry_init(&key, hash_sha1(peeled));
+	return hashmap_get(&names, &key, peeled);
 }
 
 static int replace_name(struct commit_name *e,
@@ -118,16 +112,10 @@ static void add_to_known_names(const char *path,
 	struct tag *tag = NULL;
 	if (replace_name(e, prio, sha1, &tag)) {
 		if (!e) {
-			void **pos;
 			e = xmalloc(sizeof(struct commit_name));
 			hashcpy(e->peeled, peeled);
-			pos = insert_hash(hash_sha1(peeled), e, &names);
-			if (pos) {
-				e->next = *pos;
-				*pos = e;
-			} else {
-				e->next = NULL;
-			}
+			hashmap_entry_init(e, hash_sha1(peeled));
+			hashmap_add(&names, e);
 			e->path = NULL;
 		}
 		e->tag = tag;
@@ -292,7 +280,14 @@ static void describe(const char *arg, int last_one)
 		fprintf(stderr, _("searching to describe %s\n"), arg);
 
 	if (!have_util) {
-		for_each_hash(&names, set_util, NULL);
+		struct hashmap_iter iter;
+		struct commit *c;
+		struct commit_name *n = hashmap_iter_first(&names, &iter);
+		for (; n; n = hashmap_iter_next(&iter)) {
+			c = lookup_commit_reference_gently(n->peeled, 1);
+			if (c)
+				c->util = n;
+		}
 		have_util = 1;
 	}
 
@@ -463,9 +458,9 @@ int cmd_describe(int argc, const char **argv, const char *prefix)
 		return cmd_name_rev(args.argc, args.argv, prefix);
 	}
 
-	init_hash(&names);
+	hashmap_init(&names, (hashmap_cmp_fn) commit_name_cmp, 0);
 	for_each_rawref(get_name, NULL);
-	if (!names.nr && !always)
+	if (!names.size && !always)
 		die(_("No names found, cannot describe anything."));
 
 	if (argc == 0) {
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 03/11] diffcore-rename.c: move code around to prepare for the next patch
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
  2013-10-01  9:34 ` [PATCH v3 01/11] add a hashtable implementation that supports O(1) removal Karsten Blees
  2013-10-01  9:35 ` [PATCH v3 02/11] buitin/describe.c: use new hash map implementation Karsten Blees
@ 2013-10-01  9:36 ` Karsten Blees
  2013-10-01  9:36 ` [PATCH v3 04/11] diffcore-rename.c: simplify finding exact renames Karsten Blees
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:36 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

No actual code changes, just move hash_filespec up and outdent part of
find_identical_files.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 diffcore-rename.c | 98 +++++++++++++++++++++++++++----------------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 6c7a72f..008a60c 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -248,6 +248,18 @@ struct file_similarity {
 	struct file_similarity *next;
 };
 
+static unsigned int hash_filespec(struct diff_filespec *filespec)
+{
+	unsigned int hash;
+	if (!filespec->sha1_valid) {
+		if (diff_populate_filespec(filespec, 0))
+			return 0;
+		hash_sha1_file(filespec->data, filespec->size, "blob", filespec->sha1);
+	}
+	memcpy(&hash, filespec->sha1, sizeof(hash));
+	return hash;
+}
+
 static int find_identical_files(struct file_similarity *src,
 				struct file_similarity *dst,
 				struct diff_options *options)
@@ -258,46 +270,46 @@ static int find_identical_files(struct file_similarity *src,
 	 * Walk over all the destinations ...
 	 */
 	do {
-		struct diff_filespec *target = dst->filespec;
-		struct file_similarity *p, *best;
-		int i = 100, best_score = -1;
-
-		/*
-		 * .. to find the best source match
-		 */
-		best = NULL;
-		for (p = src; p; p = p->next) {
-			int score;
-			struct diff_filespec *source = p->filespec;
-
-			/* False hash collision? */
-			if (hashcmp(source->sha1, target->sha1))
-				continue;
-			/* Non-regular files? If so, the modes must match! */
-			if (!S_ISREG(source->mode) || !S_ISREG(target->mode)) {
-				if (source->mode != target->mode)
-					continue;
-			}
-			/* Give higher scores to sources that haven't been used already */
-			score = !source->rename_used;
-			if (source->rename_used && options->detect_rename != DIFF_DETECT_COPY)
-				continue;
-			score += basename_same(source, target);
-			if (score > best_score) {
-				best = p;
-				best_score = score;
-				if (score == 2)
-					break;
-			}
+	struct diff_filespec *target = dst->filespec;
+	struct file_similarity *p, *best;
+	int i = 100, best_score = -1;
 
-			/* Too many identical alternatives? Pick one */
-			if (!--i)
-				break;
+	/*
+	 * .. to find the best source match
+	 */
+	best = NULL;
+	for (p = src; p; p = p->next) {
+		int score;
+		struct diff_filespec *source = p->filespec;
+
+		/* False hash collision? */
+		if (hashcmp(source->sha1, target->sha1))
+			continue;
+		/* Non-regular files? If so, the modes must match! */
+		if (!S_ISREG(source->mode) || !S_ISREG(target->mode)) {
+			if (source->mode != target->mode)
+				continue;
 		}
-		if (best) {
-			record_rename_pair(dst->index, best->index, MAX_SCORE);
-			renames++;
+		/* Give higher scores to sources that haven't been used already */
+		score = !source->rename_used;
+		if (source->rename_used && options->detect_rename != DIFF_DETECT_COPY)
+			continue;
+		score += basename_same(source, target);
+		if (score > best_score) {
+			best = p;
+			best_score = score;
+			if (score == 2)
+				break;
 		}
+
+		/* Too many identical alternatives? Pick one */
+		if (!--i)
+			break;
+	}
+	if (best) {
+		record_rename_pair(dst->index, best->index, MAX_SCORE);
+		renames++;
+	}
 	} while ((dst = dst->next) != NULL);
 	return renames;
 }
@@ -343,18 +355,6 @@ static int find_same_files(void *ptr, void *data)
 	return ret;
 }
 
-static unsigned int hash_filespec(struct diff_filespec *filespec)
-{
-	unsigned int hash;
-	if (!filespec->sha1_valid) {
-		if (diff_populate_filespec(filespec, 0))
-			return 0;
-		hash_sha1_file(filespec->data, filespec->size, "blob", filespec->sha1);
-	}
-	memcpy(&hash, filespec->sha1, sizeof(hash));
-	return hash;
-}
-
 static void insert_file_table(struct hash_table *table, int src_dst, int index, struct diff_filespec *filespec)
 {
 	void **pos;
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 04/11] diffcore-rename.c: simplify finding exact renames
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (2 preceding siblings ...)
  2013-10-01  9:36 ` [PATCH v3 03/11] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
@ 2013-10-01  9:36 ` Karsten Blees
  2013-10-01  9:37 ` [PATCH v3 05/11] diffcore-rename.c: use new hash map implementation Karsten Blees
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:36 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

The find_exact_renames function currently only uses the hash table for
grouping, i.e.:

1. add sources
2. add destinations
3. iterate all buckets, per bucket:
4. split sources from destinations
5. iterate destinations, per destination:
6. iterate sources to find best match

This can be simplified by utilizing the lookup functionality of the hash
table, i.e.:

1. add sources
2. iterate destinations, per destination:
3. lookup sources matching the current destination
4. iterate sources to find best match

This saves several iterations and file_similarity allocations for the
destinations.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 diffcore-rename.c | 75 +++++++++++++++----------------------------------------
 1 file changed, 20 insertions(+), 55 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 008a60c..82b7975 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -243,7 +243,7 @@ static int score_compare(const void *a_, const void *b_)
 }
 
 struct file_similarity {
-	int src_dst, index;
+	int index;
 	struct diff_filespec *filespec;
 	struct file_similarity *next;
 };
@@ -260,25 +260,21 @@ static unsigned int hash_filespec(struct diff_filespec *filespec)
 	return hash;
 }
 
-static int find_identical_files(struct file_similarity *src,
-				struct file_similarity *dst,
+static int find_identical_files(struct hash_table *srcs,
+				int dst_index,
 				struct diff_options *options)
 {
 	int renames = 0;
 
-	/*
-	 * Walk over all the destinations ...
-	 */
-	do {
-	struct diff_filespec *target = dst->filespec;
+	struct diff_filespec *target = rename_dst[dst_index].two;
 	struct file_similarity *p, *best;
 	int i = 100, best_score = -1;
 
 	/*
-	 * .. to find the best source match
+	 * Find the best source match for specified destination.
 	 */
 	best = NULL;
-	for (p = src; p; p = p->next) {
+	for (p = lookup_hash(hash_filespec(target), srcs); p; p = p->next) {
 		int score;
 		struct diff_filespec *source = p->filespec;
 
@@ -307,61 +303,28 @@ static int find_identical_files(struct file_similarity *src,
 			break;
 	}
 	if (best) {
-		record_rename_pair(dst->index, best->index, MAX_SCORE);
+		record_rename_pair(dst_index, best->index, MAX_SCORE);
 		renames++;
 	}
-	} while ((dst = dst->next) != NULL);
 	return renames;
 }
 
-static void free_similarity_list(struct file_similarity *p)
+static int free_similarity_list(void *p, void *unused)
 {
 	while (p) {
 		struct file_similarity *entry = p;
-		p = p->next;
+		p = entry->next;
 		free(entry);
 	}
+	return 0;
 }
 
-static int find_same_files(void *ptr, void *data)
-{
-	int ret;
-	struct file_similarity *p = ptr;
-	struct file_similarity *src = NULL, *dst = NULL;
-	struct diff_options *options = data;
-
-	/* Split the hash list up into sources and destinations */
-	do {
-		struct file_similarity *entry = p;
-		p = p->next;
-		if (entry->src_dst < 0) {
-			entry->next = src;
-			src = entry;
-		} else {
-			entry->next = dst;
-			dst = entry;
-		}
-	} while (p);
-
-	/*
-	 * If we have both sources *and* destinations, see if
-	 * we can match them up
-	 */
-	ret = (src && dst) ? find_identical_files(src, dst, options) : 0;
-
-	/* Free the hashes and return the number of renames found */
-	free_similarity_list(src);
-	free_similarity_list(dst);
-	return ret;
-}
-
-static void insert_file_table(struct hash_table *table, int src_dst, int index, struct diff_filespec *filespec)
+static void insert_file_table(struct hash_table *table, int index, struct diff_filespec *filespec)
 {
 	void **pos;
 	unsigned int hash;
 	struct file_similarity *entry = xmalloc(sizeof(*entry));
 
-	entry->src_dst = src_dst;
 	entry->index = index;
 	entry->filespec = filespec;
 	entry->next = NULL;
@@ -385,24 +348,26 @@ static void insert_file_table(struct hash_table *table, int src_dst, int index,
  */
 static int find_exact_renames(struct diff_options *options)
 {
-	int i;
+	int i, renames;
 	struct hash_table file_table;
 
+	/* Add all sources to the hash table */
 	init_hash(&file_table);
-	preallocate_hash(&file_table, rename_src_nr + rename_dst_nr);
+	preallocate_hash(&file_table, rename_src_nr);
 	for (i = 0; i < rename_src_nr; i++)
-		insert_file_table(&file_table, -1, i, rename_src[i].p->one);
+		insert_file_table(&file_table, i, rename_src[i].p->one);
 
+	/* Walk the destinations and find best source match */
 	for (i = 0; i < rename_dst_nr; i++)
-		insert_file_table(&file_table, 1, i, rename_dst[i].two);
+		renames += find_identical_files(&file_table, i, options);
 
-	/* Find the renames */
-	i = for_each_hash(&file_table, find_same_files, options);
+	/* Free source file_similarity chains */
+	for_each_hash(&file_table, free_similarity_list, options);
 
 	/* .. and free the hash data structure */
 	free_hash(&file_table);
 
-	return i;
+	return renames;
 }
 
 #define NUM_CANDIDATE_PER_DST 4
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 05/11] diffcore-rename.c: use new hash map implementation
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (3 preceding siblings ...)
  2013-10-01  9:36 ` [PATCH v3 04/11] diffcore-rename.c: simplify finding exact renames Karsten Blees
@ 2013-10-01  9:37 ` Karsten Blees
  2013-10-01  9:37 ` [PATCH v3 06/11] name-hash.c: use new hash map implementation for directories Karsten Blees
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:37 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 diffcore-rename.c | 48 +++++++++++++-----------------------------------
 1 file changed, 13 insertions(+), 35 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index 82b7975..c6bd776 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -4,7 +4,7 @@
 #include "cache.h"
 #include "diff.h"
 #include "diffcore.h"
-#include "hash.h"
+#include "hashmap.h"
 #include "progress.h"
 
 /* Table of rename/copy destinations */
@@ -243,9 +243,9 @@ static int score_compare(const void *a_, const void *b_)
 }
 
 struct file_similarity {
+	struct hashmap_entry entry;
 	int index;
 	struct diff_filespec *filespec;
-	struct file_similarity *next;
 };
 
 static unsigned int hash_filespec(struct diff_filespec *filespec)
@@ -260,21 +260,22 @@ static unsigned int hash_filespec(struct diff_filespec *filespec)
 	return hash;
 }
 
-static int find_identical_files(struct hash_table *srcs,
+static int find_identical_files(struct hashmap *srcs,
 				int dst_index,
 				struct diff_options *options)
 {
 	int renames = 0;
 
 	struct diff_filespec *target = rename_dst[dst_index].two;
-	struct file_similarity *p, *best;
+	struct file_similarity *p, *best, dst;
 	int i = 100, best_score = -1;
 
 	/*
 	 * Find the best source match for specified destination.
 	 */
 	best = NULL;
-	for (p = lookup_hash(hash_filespec(target), srcs); p; p = p->next) {
+	hashmap_entry_init(&dst, hash_filespec(target));
+	for (p = hashmap_get(srcs, &dst, NULL); p; p = hashmap_get_next(srcs, p)) {
 		int score;
 		struct diff_filespec *source = p->filespec;
 
@@ -309,34 +310,15 @@ static int find_identical_files(struct hash_table *srcs,
 	return renames;
 }
 
-static int free_similarity_list(void *p, void *unused)
+static void insert_file_table(struct hashmap *table, int index, struct diff_filespec *filespec)
 {
-	while (p) {
-		struct file_similarity *entry = p;
-		p = entry->next;
-		free(entry);
-	}
-	return 0;
-}
-
-static void insert_file_table(struct hash_table *table, int index, struct diff_filespec *filespec)
-{
-	void **pos;
-	unsigned int hash;
 	struct file_similarity *entry = xmalloc(sizeof(*entry));
 
 	entry->index = index;
 	entry->filespec = filespec;
-	entry->next = NULL;
-
-	hash = hash_filespec(filespec);
-	pos = insert_hash(hash, entry, table);
 
-	/* We already had an entry there? */
-	if (pos) {
-		entry->next = *pos;
-		*pos = entry;
-	}
+	hashmap_entry_init(entry, hash_filespec(filespec));
+	hashmap_add(table, entry);
 }
 
 /*
@@ -349,11 +331,10 @@ static void insert_file_table(struct hash_table *table, int index, struct diff_f
 static int find_exact_renames(struct diff_options *options)
 {
 	int i, renames;
-	struct hash_table file_table;
+	struct hashmap file_table;
 
 	/* Add all sources to the hash table */
-	init_hash(&file_table);
-	preallocate_hash(&file_table, rename_src_nr);
+	hashmap_init(&file_table, NULL, rename_src_nr);
 	for (i = 0; i < rename_src_nr; i++)
 		insert_file_table(&file_table, i, rename_src[i].p->one);
 
@@ -361,11 +342,8 @@ static int find_exact_renames(struct diff_options *options)
 	for (i = 0; i < rename_dst_nr; i++)
 		renames += find_identical_files(&file_table, i, options);
 
-	/* Free source file_similarity chains */
-	for_each_hash(&file_table, free_similarity_list, options);
-
-	/* .. and free the hash data structure */
-	free_hash(&file_table);
+	/* Free the hash data structure and entries */
+	hashmap_free(&file_table, free);
 
 	return renames;
 }
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 06/11] name-hash.c: use new hash map implementation for directories
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (4 preceding siblings ...)
  2013-10-01  9:37 ` [PATCH v3 05/11] diffcore-rename.c: use new hash map implementation Karsten Blees
@ 2013-10-01  9:37 ` Karsten Blees
  2013-10-01  9:38 ` [PATCH v3 07/11] name-hash.c: remove unreferenced directory entries Karsten Blees
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:37 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 cache.h     |  3 ++-
 name-hash.c | 77 +++++++++++++++----------------------------------------------
 2 files changed, 20 insertions(+), 60 deletions(-)

diff --git a/cache.h b/cache.h
index 85b544f..bf6e2f0 100644
--- a/cache.h
+++ b/cache.h
@@ -4,6 +4,7 @@
 #include "git-compat-util.h"
 #include "strbuf.h"
 #include "hash.h"
+#include "hashmap.h"
 #include "advice.h"
 #include "gettext.h"
 #include "convert.h"
@@ -276,7 +277,7 @@ struct index_state {
 	unsigned name_hash_initialized : 1,
 		 initialized : 1;
 	struct hash_table name_hash;
-	struct hash_table dir_hash;
+	struct hashmap dir_hash;
 };
 
 extern struct index_state the_index;
diff --git a/name-hash.c b/name-hash.c
index 617c86c..aa57666 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -8,49 +8,28 @@
 #define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
 
-/*
- * This removes bit 5 if bit 6 is set.
- *
- * That will make US-ASCII characters hash to their upper-case
- * equivalent. We could easily do this one whole word at a time,
- * but that's for future worries.
- */
-static inline unsigned char icase_hash(unsigned char c)
-{
-	return c & ~((c & 0x40) >> 1);
-}
-
-static unsigned int hash_name(const char *name, int namelen)
-{
-	unsigned int hash = 0x123;
-
-	while (namelen--) {
-		unsigned char c = *name++;
-		c = icase_hash(c);
-		hash = hash*101 + c;
-	}
-	return hash;
-}
-
 struct dir_entry {
-	struct dir_entry *next;
+	struct hashmap_entry ent;
 	struct dir_entry *parent;
 	struct cache_entry *ce;
 	int nr;
 	unsigned int namelen;
 };
 
+static int dir_entry_cmp(const struct dir_entry *e1,
+		const struct dir_entry *e2, const char *name)
+{
+	return e1->namelen != e2->namelen || strncasecmp(e1->ce->name,
+			name ? name : e2->ce->name, e1->namelen);
+}
+
 static struct dir_entry *find_dir_entry(struct index_state *istate,
 		const char *name, unsigned int namelen)
 {
-	unsigned int hash = hash_name(name, namelen);
-	struct dir_entry *dir;
-
-	for (dir = lookup_hash(hash, &istate->dir_hash); dir; dir = dir->next)
-		if (dir->namelen == namelen &&
-		    !strncasecmp(dir->ce->name, name, namelen))
-			return dir;
-	return NULL;
+	struct dir_entry key;
+	hashmap_entry_init(&key, memihash(name, namelen));
+	key.namelen = namelen;
+	return hashmap_get(&istate->dir_hash, &key, name);
 }
 
 static struct dir_entry *hash_dir_entry(struct index_state *istate,
@@ -83,18 +62,11 @@ static struct dir_entry *hash_dir_entry(struct index_state *istate,
 	dir = find_dir_entry(istate, ce->name, namelen);
 	if (!dir) {
 		/* not found, create it and add to hash table */
-		void **pdir;
-		unsigned int hash = hash_name(ce->name, namelen);
-
 		dir = xcalloc(1, sizeof(struct dir_entry));
+		hashmap_entry_init(dir, memihash(ce->name, namelen));
 		dir->namelen = namelen;
 		dir->ce = ce;
-
-		pdir = insert_hash(hash, dir, &istate->dir_hash);
-		if (pdir) {
-			dir->next = *pdir;
-			*pdir = dir;
-		}
+		hashmap_add(&istate->dir_hash, dir);
 
 		/* recursively add missing parent directories */
 		dir->parent = hash_dir_entry(istate, ce, namelen - 1);
@@ -133,7 +105,7 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 		return;
 	ce->ce_flags |= CE_HASHED;
 	ce->next = NULL;
-	hash = hash_name(ce->name, ce_namelen(ce));
+	hash = memihash(ce->name, ce_namelen(ce));
 	pos = insert_hash(hash, ce, &istate->name_hash);
 	if (pos) {
 		ce->next = *pos;
@@ -152,6 +124,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 		return;
 	if (istate->cache_nr)
 		preallocate_hash(&istate->name_hash, istate->cache_nr);
+	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
 	istate->name_hash_initialized = 1;
@@ -224,7 +197,7 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
 
 struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
 {
-	unsigned int hash = hash_name(name, namelen);
+	unsigned int hash = memihash(name, namelen);
 	struct cache_entry *ce;
 
 	lazy_init_name_hash(istate);
@@ -263,26 +236,12 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
 	return NULL;
 }
 
-static int free_dir_entry(void *entry, void *unused)
-{
-	struct dir_entry *dir = entry;
-	while (dir) {
-		struct dir_entry *next = dir->next;
-		free(dir);
-		dir = next;
-	}
-	return 0;
-}
-
 void free_name_hash(struct index_state *istate)
 {
 	if (!istate->name_hash_initialized)
 		return;
 	istate->name_hash_initialized = 0;
-	if (ignore_case)
-		/* free directory entries */
-		for_each_hash(&istate->dir_hash, free_dir_entry, NULL);
 
 	free_hash(&istate->name_hash);
-	free_hash(&istate->dir_hash);
+	hashmap_free(&istate->dir_hash, free);
 }
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 07/11] name-hash.c: remove unreferenced directory entries
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (5 preceding siblings ...)
  2013-10-01  9:37 ` [PATCH v3 06/11] name-hash.c: use new hash map implementation for directories Karsten Blees
@ 2013-10-01  9:38 ` Karsten Blees
  2013-10-01  9:39 ` [PATCH v3 08/11] name-hash.c: use new hash map implementation for cache entries Karsten Blees
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:38 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

The new hashmap implementation supports remove, so remove and free
directory entries that are no longer referenced by active cache entries.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 name-hash.c | 15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/name-hash.c b/name-hash.c
index aa57666..ce54fb3 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -85,15 +85,16 @@ static void add_dir_entry(struct index_state *istate, struct cache_entry *ce)
 static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
 {
 	/*
-	 * Release reference to the directory entry (and parents if 0).
-	 *
-	 * Note: we do not remove / free the entry because there's no
-	 * hash.[ch]::remove_hash and dir->next may point to other entries
-	 * that are still valid, so we must not free the memory.
+	 * Release reference to the directory entry. If 0, remove and continue
+	 * with parent directory.
 	 */
 	struct dir_entry *dir = hash_dir_entry(istate, ce, ce_namelen(ce));
-	while (dir && dir->nr && !(--dir->nr))
-		dir = dir->parent;
+	while (dir && !(--dir->nr)) {
+		struct dir_entry *parent = dir->parent;
+		hashmap_remove(&istate->dir_hash, dir, NULL);
+		free(dir);
+		dir = parent;
+	}
 }
 
 static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 08/11] name-hash.c: use new hash map implementation for cache entries
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (6 preceding siblings ...)
  2013-10-01  9:38 ` [PATCH v3 07/11] name-hash.c: remove unreferenced directory entries Karsten Blees
@ 2013-10-01  9:39 ` Karsten Blees
  2013-10-01  9:39 ` [PATCH v3 09/11] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:39 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Note: the "ce->next = NULL;" in unpack-trees.c::do_add_entry can safely be
removed, as ce->next (now ce->ent.next) is always properly initialized in
name-hash.c::hash_index_entry.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 cache.h        |  8 +++++---
 name-hash.c    | 24 ++++++++----------------
 unpack-trees.c |  1 -
 3 files changed, 13 insertions(+), 20 deletions(-)

diff --git a/cache.h b/cache.h
index bf6e2f0..46309c6 100644
--- a/cache.h
+++ b/cache.h
@@ -131,12 +131,12 @@ struct stat_data {
 };
 
 struct cache_entry {
+	struct hashmap_entry ent;
 	struct stat_data ce_stat_data;
 	unsigned int ce_mode;
 	unsigned int ce_flags;
 	unsigned int ce_namelen;
 	unsigned char sha1[20];
-	struct cache_entry *next;
 	char name[FLEX_ARRAY]; /* more */
 };
 
@@ -201,7 +201,9 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 	unsigned int state = dst->ce_flags & CE_STATE_MASK;
 
 	/* Don't copy hash chain and name */
-	memcpy(dst, src, offsetof(struct cache_entry, next));
+	memcpy(&dst->ce_stat_data, &src->ce_stat_data,
+			offsetof(struct cache_entry, name) -
+			offsetof(struct cache_entry, ce_stat_data));
 
 	/* Restore the hash state */
 	dst->ce_flags = (dst->ce_flags & ~CE_STATE_MASK) | state;
@@ -276,7 +278,7 @@ struct index_state {
 	struct cache_time timestamp;
 	unsigned name_hash_initialized : 1,
 		 initialized : 1;
-	struct hash_table name_hash;
+	struct hashmap name_hash;
 	struct hashmap dir_hash;
 };
 
diff --git a/name-hash.c b/name-hash.c
index ce54fb3..4879ce4 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -99,19 +99,11 @@ static void remove_dir_entry(struct index_state *istate, struct cache_entry *ce)
 
 static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 {
-	void **pos;
-	unsigned int hash;
-
 	if (ce->ce_flags & CE_HASHED)
 		return;
 	ce->ce_flags |= CE_HASHED;
-	ce->next = NULL;
-	hash = memihash(ce->name, ce_namelen(ce));
-	pos = insert_hash(hash, ce, &istate->name_hash);
-	if (pos) {
-		ce->next = *pos;
-		*pos = ce;
-	}
+	hashmap_entry_init(ce, memihash(ce->name, ce_namelen(ce)));
+	hashmap_add(&istate->name_hash, ce);
 
 	if (ignore_case && !(ce->ce_flags & CE_UNHASHED))
 		add_dir_entry(istate, ce);
@@ -123,8 +115,7 @@ static void lazy_init_name_hash(struct index_state *istate)
 
 	if (istate->name_hash_initialized)
 		return;
-	if (istate->cache_nr)
-		preallocate_hash(&istate->name_hash, istate->cache_nr);
+	hashmap_init(&istate->name_hash, NULL, istate->cache_nr);
 	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
@@ -198,18 +189,19 @@ static int same_name(const struct cache_entry *ce, const char *name, int namelen
 
 struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int icase)
 {
-	unsigned int hash = memihash(name, namelen);
 	struct cache_entry *ce;
+	struct hashmap_entry key;
 
 	lazy_init_name_hash(istate);
-	ce = lookup_hash(hash, &istate->name_hash);
 
+	hashmap_entry_init(&key, memihash(name, namelen));
+	ce = hashmap_get(&istate->name_hash, &key, NULL);
 	while (ce) {
 		if (!(ce->ce_flags & CE_UNHASHED)) {
 			if (same_name(ce, name, namelen, icase))
 				return ce;
 		}
-		ce = ce->next;
+		ce = hashmap_get_next(&istate->name_hash, ce);
 	}
 
 	/*
@@ -243,6 +235,6 @@ void free_name_hash(struct index_state *istate)
 		return;
 	istate->name_hash_initialized = 0;
 
-	free_hash(&istate->name_hash);
+	hashmap_free(&istate->name_hash, NULL);
 	hashmap_free(&istate->dir_hash, free);
 }
diff --git a/unpack-trees.c b/unpack-trees.c
index bf01717..9c108a2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -110,7 +110,6 @@ static void do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 	if (set & CE_REMOVE)
 		set |= CE_WT_REMOVE;
 
-	ce->next = NULL;
 	ce->ce_flags = (ce->ce_flags & ~clear) | set;
 	add_index_entry(&o->result, ce,
 			ADD_CACHE_OK_TO_ADD | ADD_CACHE_OK_TO_REPLACE);
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 09/11] name-hash.c: remove cache entries instead of marking them CE_UNHASHED
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (7 preceding siblings ...)
  2013-10-01  9:39 ` [PATCH v3 08/11] name-hash.c: use new hash map implementation for cache entries Karsten Blees
@ 2013-10-01  9:39 ` Karsten Blees
  2013-10-01  9:40 ` [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
  2013-10-01  9:40 ` [PATCH v3 11/11] remove old hash.[ch] implementation Karsten Blees
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:39 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

The new hashmap implementation supports remove, so really remove unused
cache entries from the name hashmap instead of just marking them.

The CE_UNHASHED flag and CE_STATE_MASK are no longer needed.

Keep the CE_HASHED flag to prevent adding entries twice.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 cache.h        |  6 ++----
 name-hash.c    | 46 ++++++++++++++++++++++------------------------
 read-cache.c   |  2 +-
 unpack-trees.c |  2 +-
 4 files changed, 26 insertions(+), 30 deletions(-)

diff --git a/cache.h b/cache.h
index 46309c6..8833944 100644
--- a/cache.h
+++ b/cache.h
@@ -160,7 +160,6 @@ struct cache_entry {
 #define CE_ADDED             (1 << 19)
 
 #define CE_HASHED            (1 << 20)
-#define CE_UNHASHED          (1 << 21)
 #define CE_WT_REMOVE         (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED        (1 << 23)
 
@@ -194,11 +193,10 @@ struct cache_entry {
  * Copy the sha1 and stat state of a cache entry from one to
  * another. But we never change the name, or the hash state!
  */
-#define CE_STATE_MASK (CE_HASHED | CE_UNHASHED)
 static inline void copy_cache_entry(struct cache_entry *dst,
 				    const struct cache_entry *src)
 {
-	unsigned int state = dst->ce_flags & CE_STATE_MASK;
+	unsigned int state = dst->ce_flags & CE_HASHED;
 
 	/* Don't copy hash chain and name */
 	memcpy(&dst->ce_stat_data, &src->ce_stat_data,
@@ -206,7 +204,7 @@ static inline void copy_cache_entry(struct cache_entry *dst,
 			offsetof(struct cache_entry, ce_stat_data));
 
 	/* Restore the hash state */
-	dst->ce_flags = (dst->ce_flags & ~CE_STATE_MASK) | state;
+	dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state;
 }
 
 static inline unsigned create_ce_flags(unsigned stage)
diff --git a/name-hash.c b/name-hash.c
index 4879ce4..4aac33f 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -105,17 +105,29 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce)
 	hashmap_entry_init(ce, memihash(ce->name, ce_namelen(ce)));
 	hashmap_add(&istate->name_hash, ce);
 
-	if (ignore_case && !(ce->ce_flags & CE_UNHASHED))
+	if (ignore_case)
 		add_dir_entry(istate, ce);
 }
 
+static int cache_entry_cmp(const struct cache_entry *ce1,
+		const struct cache_entry *ce2, const void *remove)
+{
+	/*
+	 * For remove_name_hash, find the exact entry (pointer equality); for
+	 * index_name_exists, find all entries with matching hash code and
+	 * decide whether the entry matches in same_name.
+	 */
+	return remove ? !(ce1 == ce2) : 0;
+}
+
 static void lazy_init_name_hash(struct index_state *istate)
 {
 	int nr;
 
 	if (istate->name_hash_initialized)
 		return;
-	hashmap_init(&istate->name_hash, NULL, istate->cache_nr);
+	hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp,
+			istate->cache_nr);
 	hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0);
 	for (nr = 0; nr < istate->cache_nr; nr++)
 		hash_index_entry(istate, istate->cache[nr]);
@@ -124,31 +136,19 @@ static void lazy_init_name_hash(struct index_state *istate)
 
 void add_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
-	/* if already hashed, add reference to directory entries */
-	if (ignore_case && (ce->ce_flags & CE_STATE_MASK) == CE_STATE_MASK)
-		add_dir_entry(istate, ce);
-
-	ce->ce_flags &= ~CE_UNHASHED;
 	if (istate->name_hash_initialized)
 		hash_index_entry(istate, ce);
 }
 
-/*
- * We don't actually *remove* it, we can just mark it invalid so that
- * we won't find it in lookups.
- *
- * Not only would we have to search the lists (simple enough), but
- * we'd also have to rehash other hash buckets in case this makes the
- * hash bucket empty (common). So it's much better to just mark
- * it.
- */
 void remove_name_hash(struct index_state *istate, struct cache_entry *ce)
 {
-	/* if already hashed, release reference to directory entries */
-	if (ignore_case && (ce->ce_flags & CE_STATE_MASK) == CE_HASHED)
-		remove_dir_entry(istate, ce);
+	if (!istate->name_hash_initialized || !(ce->ce_flags & CE_HASHED))
+		return;
+	ce->ce_flags &= ~CE_HASHED;
+	hashmap_remove(&istate->name_hash, ce, ce);
 
-	ce->ce_flags |= CE_UNHASHED;
+	if (ignore_case)
+		remove_dir_entry(istate, ce);
 }
 
 static int slow_same_name(const char *name1, int len1, const char *name2, int len2)
@@ -197,10 +197,8 @@ struct cache_entry *index_name_exists(struct index_state *istate, const char *na
 	hashmap_entry_init(&key, memihash(name, namelen));
 	ce = hashmap_get(&istate->name_hash, &key, NULL);
 	while (ce) {
-		if (!(ce->ce_flags & CE_UNHASHED)) {
-			if (same_name(ce, name, namelen, icase))
-				return ce;
-		}
+		if (same_name(ce, name, namelen, icase))
+			return ce;
 		ce = hashmap_get_next(&istate->name_hash, ce);
 	}
 
diff --git a/read-cache.c b/read-cache.c
index c3d5e35..0b5c44b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -58,7 +58,7 @@ void rename_index_entry_at(struct index_state *istate, int nr, const char *new_n
 
 	new = xmalloc(cache_entry_size(namelen));
 	copy_cache_entry(new, old);
-	new->ce_flags &= ~CE_STATE_MASK;
+	new->ce_flags &= ~CE_HASHED;
 	new->ce_namelen = namelen;
 	memcpy(new->name, new_name, namelen + 1);
 
diff --git a/unpack-trees.c b/unpack-trees.c
index 9c108a2..92c7bc4 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -105,7 +105,7 @@ void setup_unpack_trees_porcelain(struct unpack_trees_options *opts,
 static void do_add_entry(struct unpack_trees_options *o, struct cache_entry *ce,
 			 unsigned int set, unsigned int clear)
 {
-	clear |= CE_HASHED | CE_UNHASHED;
+	clear |= CE_HASHED;
 
 	if (set & CE_REMOVE)
 		set |= CE_WT_REMOVE;
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (8 preceding siblings ...)
  2013-10-01  9:39 ` [PATCH v3 09/11] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
@ 2013-10-01  9:40 ` Karsten Blees
  2013-10-19 19:28   ` Thomas Rast
  2013-10-01  9:40 ` [PATCH v3 11/11] remove old hash.[ch] implementation Karsten Blees
  10 siblings, 1 reply; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:40 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

When cache_entry structs are removed from index_state.cache, they are not
properly freed. Freeing those entries wasn't possible before because we
couldn't remove them from index_state.name_hash.

Now that we _do_ remove the entries from name_hash, we can also free them.
Add free(cache_entry) to all call sites of name-hash.c::remove_name_hash in
read-cache.c, as name-hash.c isn't concerned with cache_entry allocation.

cmd_rm and unmerge_index_entry_at use cache_entry.name after removing the
entry. Copy the name so that we don't access memory that has been freed.

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 builtin/rm.c   | 2 +-
 read-cache.c   | 6 +++++-
 resolve-undo.c | 7 +++++--
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/builtin/rm.c b/builtin/rm.c
index 0df0b4d..678da8a 100644
--- a/builtin/rm.c
+++ b/builtin/rm.c
@@ -324,7 +324,7 @@ int cmd_rm(int argc, const char **argv, const char *prefix)
 		if (!match_pathspec(pathspec, ce->name, ce_namelen(ce), 0, seen))
 			continue;
 		ALLOC_GROW(list.entry, list.nr + 1, list.alloc);
-		list.entry[list.nr].name = ce->name;
+		list.entry[list.nr].name = xstrdup(ce->name);
 		list.entry[list.nr++].is_submodule = S_ISGITLINK(ce->ce_mode);
 	}
 
diff --git a/read-cache.c b/read-cache.c
index 0b5c44b..4c570a1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -47,6 +47,7 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache
 	struct cache_entry *old = istate->cache[nr];
 
 	remove_name_hash(istate, old);
+	free(old);
 	set_index_entry(istate, nr, ce);
 	istate->cache_changed = 1;
 }
@@ -478,6 +479,7 @@ int remove_index_entry_at(struct index_state *istate, int pos)
 
 	record_resolve_undo(istate, ce);
 	remove_name_hash(istate, ce);
+	free(ce);
 	istate->cache_changed = 1;
 	istate->cache_nr--;
 	if (pos >= istate->cache_nr)
@@ -499,8 +501,10 @@ void remove_marked_cache_entries(struct index_state *istate)
 	unsigned int i, j;
 
 	for (i = j = 0; i < istate->cache_nr; i++) {
-		if (ce_array[i]->ce_flags & CE_REMOVE)
+		if (ce_array[i]->ce_flags & CE_REMOVE) {
 			remove_name_hash(istate, ce_array[i]);
+			free(ce_array[i]);
+		}
 		else
 			ce_array[j++] = ce_array[i];
 	}
diff --git a/resolve-undo.c b/resolve-undo.c
index 77101f5..64793db 100644
--- a/resolve-undo.c
+++ b/resolve-undo.c
@@ -119,6 +119,7 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
 	struct string_list_item *item;
 	struct resolve_undo_info *ru;
 	int i, err = 0, matched;
+	char *name;
 
 	if (!istate->resolve_undo)
 		return pos;
@@ -138,20 +139,22 @@ int unmerge_index_entry_at(struct index_state *istate, int pos)
 	if (!ru)
 		return pos;
 	matched = ce->ce_flags & CE_MATCHED;
+	name = xstrdup(ce->name);
 	remove_index_entry_at(istate, pos);
 	for (i = 0; i < 3; i++) {
 		struct cache_entry *nce;
 		if (!ru->mode[i])
 			continue;
 		nce = make_cache_entry(ru->mode[i], ru->sha1[i],
-				       ce->name, i + 1, 0);
+				       name, i + 1, 0);
 		if (matched)
 			nce->ce_flags |= CE_MATCHED;
 		if (add_index_entry(istate, nce, ADD_CACHE_OK_TO_ADD)) {
 			err = 1;
-			error("cannot unmerge '%s'", ce->name);
+			error("cannot unmerge '%s'", name);
 		}
 	}
+	free(name);
 	if (err)
 		return pos;
 	free(ru);
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v3 11/11] remove old hash.[ch] implementation
  2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
                   ` (9 preceding siblings ...)
  2013-10-01  9:40 ` [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
@ 2013-10-01  9:40 ` Karsten Blees
  10 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-01  9:40 UTC (permalink / raw)
  To: Git List; +Cc: Karsten Blees

Signed-off-by: Karsten Blees <blees@dcon.de>
---
 Documentation/technical/api-hash.txt |  52 -----------------
 Makefile                             |   2 -
 cache.h                              |   1 -
 hash.c                               | 110 -----------------------------------
 hash.h                               |  50 ----------------
 test-hashmap.c                       |  84 --------------------------
 6 files changed, 299 deletions(-)
 delete mode 100644 Documentation/technical/api-hash.txt
 delete mode 100644 hash.c
 delete mode 100644 hash.h

diff --git a/Documentation/technical/api-hash.txt b/Documentation/technical/api-hash.txt
deleted file mode 100644
index e5061e0..0000000
--- a/Documentation/technical/api-hash.txt
+++ /dev/null
@@ -1,52 +0,0 @@
-hash API
-========
-
-The hash API is a collection of simple hash table functions. Users are expected
-to implement their own hashing.
-
-Data Structures
----------------
-
-`struct hash_table`::
-
-	The hash table structure. The `array` member points to the hash table
-	entries. The `size` member counts the total number of valid and invalid
-	entries in the table. The `nr` member keeps track of the number of
-	valid entries.
-
-`struct hash_table_entry`::
-
-	An opaque structure representing an entry in the hash table. The `hash`
-	member is the entry's hash key and the `ptr` member is the entry's
-	value.
-
-Functions
----------
-
-`init_hash`::
-
-	Initialize the hash table.
-
-`free_hash`::
-
-	Release memory associated with the hash table.
-
-`insert_hash`::
-
-	Insert a pointer into the hash table. If an entry with that hash
-	already exists, a pointer to the existing entry's value is returned.
-	Otherwise NULL is returned.  This allows callers to implement
-	chaining, etc.
-
-`lookup_hash`::
-
-	Lookup an entry in the hash table. If an entry with that hash exists
-	the entry's value is returned. Otherwise NULL is returned.
-
-`for_each_hash`::
-
-	Call a function for each entry in the hash table. The function is
-	expected to take the entry's value as its only argument and return an
-	int. If the function returns a negative int the loop is aborted
-	immediately.  Otherwise, the return value is accumulated and the sum
-	returned upon completion of the loop.
diff --git a/Makefile b/Makefile
index e6ad011..7d565c8 100644
--- a/Makefile
+++ b/Makefile
@@ -681,7 +681,6 @@ LIB_H += git-compat-util.h
 LIB_H += gpg-interface.h
 LIB_H += graph.h
 LIB_H += grep.h
-LIB_H += hash.h
 LIB_H += hashmap.h
 LIB_H += help.h
 LIB_H += http.h
@@ -812,7 +811,6 @@ LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
 LIB_OBJS += grep.o
-LIB_OBJS += hash.o
 LIB_OBJS += hashmap.o
 LIB_OBJS += help.o
 LIB_OBJS += hex.o
diff --git a/cache.h b/cache.h
index 8833944..ca5ffb6 100644
--- a/cache.h
+++ b/cache.h
@@ -3,7 +3,6 @@
 
 #include "git-compat-util.h"
 #include "strbuf.h"
-#include "hash.h"
 #include "hashmap.h"
 #include "advice.h"
 #include "gettext.h"
diff --git a/hash.c b/hash.c
deleted file mode 100644
index 749ecfe..0000000
--- a/hash.c
+++ /dev/null
@@ -1,110 +0,0 @@
-/*
- * Some generic hashing helpers.
- */
-#include "cache.h"
-#include "hash.h"
-
-/*
- * Look up a hash entry in the hash table. Return the pointer to
- * the existing entry, or the empty slot if none existed. The caller
- * can then look at the (*ptr) to see whether it existed or not.
- */
-static struct hash_table_entry *lookup_hash_entry(unsigned int hash, const struct hash_table *table)
-{
-	unsigned int size = table->size, nr = hash % size;
-	struct hash_table_entry *array = table->array;
-
-	while (array[nr].ptr) {
-		if (array[nr].hash == hash)
-			break;
-		nr++;
-		if (nr >= size)
-			nr = 0;
-	}
-	return array + nr;
-}
-
-
-/*
- * Insert a new hash entry pointer into the table.
- *
- * If that hash entry already existed, return the pointer to
- * the existing entry (and the caller can create a list of the
- * pointers or do anything else). If it didn't exist, return
- * NULL (and the caller knows the pointer has been inserted).
- */
-static void **insert_hash_entry(unsigned int hash, void *ptr, struct hash_table *table)
-{
-	struct hash_table_entry *entry = lookup_hash_entry(hash, table);
-
-	if (!entry->ptr) {
-		entry->ptr = ptr;
-		entry->hash = hash;
-		table->nr++;
-		return NULL;
-	}
-	return &entry->ptr;
-}
-
-static void grow_hash_table(struct hash_table *table)
-{
-	unsigned int i;
-	unsigned int old_size = table->size, new_size;
-	struct hash_table_entry *old_array = table->array, *new_array;
-
-	new_size = alloc_nr(old_size);
-	new_array = xcalloc(sizeof(struct hash_table_entry), new_size);
-	table->size = new_size;
-	table->array = new_array;
-	table->nr = 0;
-	for (i = 0; i < old_size; i++) {
-		unsigned int hash = old_array[i].hash;
-		void *ptr = old_array[i].ptr;
-		if (ptr)
-			insert_hash_entry(hash, ptr, table);
-	}
-	free(old_array);
-}
-
-void *lookup_hash(unsigned int hash, const struct hash_table *table)
-{
-	if (!table->array)
-		return NULL;
-	return lookup_hash_entry(hash, table)->ptr;
-}
-
-void **insert_hash(unsigned int hash, void *ptr, struct hash_table *table)
-{
-	unsigned int nr = table->nr;
-	if (nr >= table->size/2)
-		grow_hash_table(table);
-	return insert_hash_entry(hash, ptr, table);
-}
-
-int for_each_hash(const struct hash_table *table, int (*fn)(void *, void *), void *data)
-{
-	int sum = 0;
-	unsigned int i;
-	unsigned int size = table->size;
-	struct hash_table_entry *array = table->array;
-
-	for (i = 0; i < size; i++) {
-		void *ptr = array->ptr;
-		array++;
-		if (ptr) {
-			int val = fn(ptr, data);
-			if (val < 0)
-				return val;
-			sum += val;
-		}
-	}
-	return sum;
-}
-
-void free_hash(struct hash_table *table)
-{
-	free(table->array);
-	table->array = NULL;
-	table->size = 0;
-	table->nr = 0;
-}
diff --git a/hash.h b/hash.h
deleted file mode 100644
index 1d43ac0..0000000
--- a/hash.h
+++ /dev/null
@@ -1,50 +0,0 @@
-#ifndef HASH_H
-#define HASH_H
-
-/*
- * These are some simple generic hash table helper functions.
- * Not necessarily suitable for all users, but good for things
- * where you want to just keep track of a list of things, and
- * have a good hash to use on them.
- *
- * It keeps the hash table at roughly 50-75% free, so the memory
- * cost of the hash table itself is roughly
- *
- *	3 * 2*sizeof(void *) * nr_of_objects
- *
- * bytes.
- *
- * FIXME: on 64-bit architectures, we waste memory. It would be
- * good to have just 32-bit pointers, requiring a special allocator
- * for hashed entries or something.
- */
-struct hash_table_entry {
-	unsigned int hash;
-	void *ptr;
-};
-
-struct hash_table {
-	unsigned int size, nr;
-	struct hash_table_entry *array;
-};
-
-extern void *lookup_hash(unsigned int hash, const struct hash_table *table);
-extern void **insert_hash(unsigned int hash, void *ptr, struct hash_table *table);
-extern int for_each_hash(const struct hash_table *table, int (*fn)(void *, void *), void *data);
-extern void free_hash(struct hash_table *table);
-
-static inline void init_hash(struct hash_table *table)
-{
-	table->size = 0;
-	table->nr = 0;
-	table->array = NULL;
-}
-
-static inline void preallocate_hash(struct hash_table *table, unsigned int elts)
-{
-	assert(table->size == 0 && table->nr == 0 && table->array == NULL);
-	table->size = elts * 2;
-	table->array = xcalloc(sizeof(struct hash_table_entry), table->size);
-}
-
-#endif
diff --git a/test-hashmap.c b/test-hashmap.c
index 2d2b834..333b7b3 100644
--- a/test-hashmap.c
+++ b/test-hashmap.c
@@ -126,85 +126,6 @@ static void perf_hashmap(unsigned int method, unsigned int rounds)
 	}
 }
 
-struct hash_entry
-{
-	struct hash_entry *next;
-	char key[FLEX_ARRAY];
-};
-
-/*
- * Test insert performance of hash.[ch]
- * Usage: time echo "perfhash method rounds" | test-hashmap
- */
-static void perf_hash(unsigned int method, unsigned int rounds)
-{
-	struct hash_table map;
-	char buf[16];
-	struct hash_entry **entries, **res, *entry;
-	unsigned int *hashes;
-	unsigned int i, j;
-
-	entries = malloc(TEST_SIZE * sizeof(struct hash_entry*));
-	hashes = malloc(TEST_SIZE * sizeof(int));
-	for (i = 0; i < TEST_SIZE; i++) {
-		snprintf(buf, sizeof(buf), "%i", i);
-		entries[i] = malloc(sizeof(struct hash_entry) + strlen(buf) + 1);
-		strcpy(entries[i]->key, buf);
-		hashes[i] = hash(method, i, entries[i]->key);
-	}
-
-	if (method & TEST_ADD) {
-		/* test adding to the map */
-		for (j = 0; j < rounds; j++) {
-			init_hash(&map);
-
-			/* add entries */
-			for (i = 0; i < TEST_SIZE; i++) {
-				res = (struct hash_entry**) insert_hash(
-						hashes[i], entries[i], &map);
-				if (res) {
-					entries[i]->next = *res;
-					*res = entries[i];
-				} else {
-					entries[i]->next = NULL;
-				}
-			}
-
-			free_hash(&map);
-		}
-	} else {
-		/* test map lookups */
-		init_hash(&map);
-
-		/* fill the map (sparsely if specified) */
-		j = (method & TEST_SPARSE) ? TEST_SIZE / 10 : TEST_SIZE;
-		for (i = 0; i < j; i++) {
-			res = (struct hash_entry**) insert_hash(hashes[i],
-					entries[i], &map);
-			if (res) {
-				entries[i]->next = *res;
-				*res = entries[i];
-			} else {
-				entries[i]->next = NULL;
-			}
-		}
-
-		for (j = 0; j < rounds; j++) {
-			for (i = 0; i < TEST_SIZE; i++) {
-				entry = lookup_hash(hashes[i], &map);
-				while (entry) {
-					if (!strcmp(entries[i]->key, entry->key))
-						break;
-					entry = entry->next;
-				}
-			}
-		}
-
-		free_hash(&map);
-
-	}
-}
-
 #define DELIM " \t\r\n"
 
 /*
@@ -218,7 +139,6 @@ static void perf_hash(unsigned int method, unsigned int rounds)
  * size -> tablesize numentries
  *
  * perfhashmap method rounds -> test hashmap.[ch] performance
- * perfhash method rounds -> test hash.[ch] performance
  */
 int main(int argc, char *argv[])
 {
@@ -324,10 +244,6 @@ int main(int argc, char *argv[])
 
 			perf_hashmap(atoi(p1), atoi(p2));
 
-		} else if (!strcmp("perfhash", cmd) && l1 && l2) {
-
-			perf_hash(atoi(p1), atoi(p2));
-
 		} else {
 
 			printf("Unknown command %s\n", cmd);
-- 
1.8.4.11.g4f52745.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries
  2013-10-01  9:40 ` [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
@ 2013-10-19 19:28   ` Thomas Rast
  2013-10-22 13:13     ` [PATCH] fixup! read-cache.c: fix memory leaks caused by removed cache, entries Karsten Blees
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Rast @ 2013-10-19 19:28 UTC (permalink / raw)
  To: Karsten Blees; +Cc: Git List

Karsten Blees <karsten.blees@gmail.com> writes:

> When cache_entry structs are removed from index_state.cache, they are not
> properly freed. Freeing those entries wasn't possible before because we
> couldn't remove them from index_state.name_hash.
>
> Now that we _do_ remove the entries from name_hash, we can also free them.
> Add free(cache_entry) to all call sites of name-hash.c::remove_name_hash in
> read-cache.c, as name-hash.c isn't concerned with cache_entry allocation.
>
> cmd_rm and unmerge_index_entry_at use cache_entry.name after removing the
> entry. Copy the name so that we don't access memory that has been freed.

Is this the version that is currently in pu?  There's a valgrind test
failure in current pu that bisects to 36850edb, which would seem to be
from this email but doesn't have the right author date.  Worse, I cannot
apply this on top of 36850edb^ because I don't have the 'index' blobs
for this patch.  Confusing.

In any case 36850edb currently breaks several valgrind tests.  You can
valgrind only t6022.16 like so (that one test is sufficient to track it
down and it's much faster that way):

  cd t  
  ./t6022-merge-rename.sh --valgrind-only=16

The valgrind error in t6022.16 looks like this:

  ==4959== Invalid read of size 1
  ==4959==    at 0x5682A38: vfprintf (vfprintf.c:1629)
  ==4959==    by 0x56AC564: vsnprintf (vsnprintf.c:119)
  ==4959==    by 0x542005: vreportf (usage.c:12)
  ==4959==    by 0x54216C: error_builtin (usage.c:42)
  ==4959==    by 0x54261B: error (usage.c:147)
  ==4959==    by 0x4FC681: read_index_unmerged (read-cache.c:1900)
  ==4959==    by 0x475CF1: reset_index (reset.c:68)
  ==4959==    by 0x476A72: cmd_reset (reset.c:346)
  ==4959==    by 0x405999: run_builtin (git.c:314)
  ==4959==    by 0x405B2C: handle_internal_command (git.c:477)
  ==4959==    by 0x405C46: run_argv (git.c:523)
  ==4959==    by 0x405DE2: main (git.c:606)
  ==4959==  Address 0x5bedb54 is 84 bytes inside a block of size 104 free'd
  ==4959==    at 0x4C2ACDA: free (vg_replace_malloc.c:468)
  ==4959==    by 0x4F9360: remove_index_entry_at (read-cache.c:482)
  ==4959==    by 0x4FA469: add_index_entry_with_check (read-cache.c:964)
  ==4959==    by 0x4FA5A4: add_index_entry (read-cache.c:993)
  ==4959==    by 0x4FC663: read_index_unmerged (read-cache.c:1899)
  ==4959==    by 0x475CF1: reset_index (reset.c:68)
  ==4959==    by 0x476A72: cmd_reset (reset.c:346)
  ==4959==    by 0x405999: run_builtin (git.c:314)
  ==4959==    by 0x405B2C: handle_internal_command (git.c:477)
  ==4959==    by 0x405C46: run_argv (git.c:523)
  ==4959==    by 0x405DE2: main (git.c:606)

If you need any more information/help, just ask :-)

-- 
Thomas Rast
tr@thomasrast.ch

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] fixup! read-cache.c: fix memory leaks caused by removed cache, entries
  2013-10-19 19:28   ` Thomas Rast
@ 2013-10-22 13:13     ` Karsten Blees
  0 siblings, 0 replies; 14+ messages in thread
From: Karsten Blees @ 2013-10-22 13:13 UTC (permalink / raw)
  To: Thomas Rast; +Cc: Git List, Junio C Hamano

...another instance where we access a free()d ce.

Signed-off-by: Karsten Blees <blees@dcon.de>
---

Am 19.10.2013 21:28, schrieb Thomas Rast:
> 
> Is this the version that is currently in pu?

No, its from a rebased-to-next version [1]. I didn't resend the patches because nothing actually changed (except diff contexts). Sorry for the inconvenience.

> In any case 36850edb currently breaks several valgrind tests.  You can
> valgrind only t6022.16 like so (that one test is sufficient to track it
> down and it's much faster that way):
> 
>   cd t  
>   ./t6022-merge-rename.sh --valgrind-only=16
> 

Thanks, here's the fix.

Karsten

[1] https://github.com/kblees/git/commits/kb/hashmap-v3-next


 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index fe90c3b..3f735f3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1898,7 +1898,7 @@ int read_index_unmerged(struct index_state *istate)
 		new_ce->ce_mode = ce->ce_mode;
 		if (add_index_entry(istate, new_ce, 0))
 			return error("%s: cannot drop to stage #0",
-				     ce->name);
+				     new_ce->name);
 		i = index_name_pos(istate, new_ce->name, len);
 	}
 	return unmerged;
-- 
1.8.4.msysgit.0.12.g88f5ed0.dirty

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2013-10-22 13:13 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-10-01  9:33 [PATCH v3 00/11] New hash table implementation Karsten Blees
2013-10-01  9:34 ` [PATCH v3 01/11] add a hashtable implementation that supports O(1) removal Karsten Blees
2013-10-01  9:35 ` [PATCH v3 02/11] buitin/describe.c: use new hash map implementation Karsten Blees
2013-10-01  9:36 ` [PATCH v3 03/11] diffcore-rename.c: move code around to prepare for the next patch Karsten Blees
2013-10-01  9:36 ` [PATCH v3 04/11] diffcore-rename.c: simplify finding exact renames Karsten Blees
2013-10-01  9:37 ` [PATCH v3 05/11] diffcore-rename.c: use new hash map implementation Karsten Blees
2013-10-01  9:37 ` [PATCH v3 06/11] name-hash.c: use new hash map implementation for directories Karsten Blees
2013-10-01  9:38 ` [PATCH v3 07/11] name-hash.c: remove unreferenced directory entries Karsten Blees
2013-10-01  9:39 ` [PATCH v3 08/11] name-hash.c: use new hash map implementation for cache entries Karsten Blees
2013-10-01  9:39 ` [PATCH v3 09/11] name-hash.c: remove cache entries instead of marking them CE_UNHASHED Karsten Blees
2013-10-01  9:40 ` [PATCH v3 10/11] read-cache.c: fix memory leaks caused by removed cache entries Karsten Blees
2013-10-19 19:28   ` Thomas Rast
2013-10-22 13:13     ` [PATCH] fixup! read-cache.c: fix memory leaks caused by removed cache, entries Karsten Blees
2013-10-01  9:40 ` [PATCH v3 11/11] remove old hash.[ch] implementation Karsten Blees

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).