* [PATCH v3 0/3] merge-recursive: replace string_list with hashmap
@ 2017-09-07 16:25 Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
Code was using two string_lists, one for the directories and
one for the files. The code never checks the lists independently
so we should be able to only use one list. The string_list also
is a O(log n) for lookup and insertion. Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.
Also cleaned up a memory leak and method where the return was not
being used.
Changes since last version:
1. Removed the function pointers and just check the ignore_case in the
compare and hash methods.
2. Added a comment about the hashmap and why it is getting initialized
and freed but not a local.
3. Use hashmap_get_from_hash and remove the dummy entry
Kevin Willford (3):
merge-recursive: fix memory leak
merge-recursive: remove return value from get_files_dirs
merge-recursive: change current file dir string_lists to hashmap
merge-recursive.c | 76 ++++++++++++++++++++++++++++++++++++++++---------------
merge-recursive.h | 3 +--
2 files changed, 57 insertions(+), 22 deletions(-)
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH v3 1/3] merge-recursive: fix memory leak
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
In merge_trees if process_renames or process_entry returns less
than zero, the method will just return and not free re_merge,
re_head, or entries.
This change cleans up the allocated variables before returning
to the caller.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb82..033d7cd406 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o,
re_merge = get_renames(o, merge, common, head, merge, entries);
clean = process_renames(o, re_head, re_merge);
if (clean < 0)
- return clean;
+ goto cleanup;
for (i = entries->nr-1; 0 <= i; i--) {
const char *path = entries->items[i].string;
struct stage_data *e = entries->items[i].util;
@@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o,
int ret = process_entry(o, path, e);
if (!ret)
clean = 0;
- else if (ret < 0)
- return ret;
+ else if (ret < 0) {
+ clean = ret;
+ goto cleanup;
+ }
}
}
for (i = 0; i < entries->nr; i++) {
@@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o,
entries->items[i].string);
}
+cleanup:
string_list_clear(re_merge, 0);
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
@@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o,
free(re_merge);
free(re_head);
free(entries);
+
+ if (clean < 0)
+ return clean;
}
else
clean = 1;
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
The return value of the get_files_dirs call is never being used.
Looking at the history of the file and it was originally only
being used for debug output statements. Also when
read_tree_recursive return value is non zero it is changed to
zero. This leads me to believe that it doesn't matter if
read_tree_recursive gets an error.
Since the debug output has been removed and the caller isn't
checking the return value there is no reason to keep calculating
and returning a value.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index 033d7cd406..d47f40ea81 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1,
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
}
-static int get_files_dirs(struct merge_options *o, struct tree *tree)
+static void get_files_dirs(struct merge_options *o, struct tree *tree)
{
- int n;
struct pathspec match_all;
memset(&match_all, 0, sizeof(match_all));
- if (read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o))
- return 0;
- n = o->current_file_set.nr + o->current_directory_set.nr;
- return n;
+ read_tree_recursive(tree, "", 0, 0, &match_all, save_files_dirs, o);
}
/*
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
@ 2017-09-07 16:25 ` Kevin Willford
2 siblings, 0 replies; 4+ messages in thread
From: Kevin Willford @ 2017-09-07 16:25 UTC (permalink / raw)
To: git; +Cc: gitster, peff, Kevin Willford
The code was using two string_lists, one for the directories and
one for the files. The code never checks the lists independently
so we should be able to only use one list. The string_list also
is a O(log n) for lookup and insertion. Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
---
merge-recursive.c | 56 ++++++++++++++++++++++++++++++++++++++++++++-----------
merge-recursive.h | 3 +--
2 files changed, 46 insertions(+), 13 deletions(-)
diff --git a/merge-recursive.c b/merge-recursive.c
index d47f40ea81..35af3761ba 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -24,6 +24,31 @@
#include "dir.h"
#include "submodule.h"
+struct path_hashmap_entry {
+ struct hashmap_entry e;
+ char path[FLEX_ARRAY];
+};
+
+static int path_hashmap_cmp(const void *cmp_data,
+ const void *entry,
+ const void *entry_or_key,
+ const void *keydata)
+{
+ const struct path_hashmap_entry *a = entry;
+ const struct path_hashmap_entry *b = entry_or_key;
+ const char *key = keydata;
+
+ if (ignore_case)
+ return strcasecmp(a->path, key ? key : b->path);
+ else
+ return strcmp(a->path, key ? key : b->path);
+}
+
+static unsigned int path_hash(const char *path)
+{
+ return ignore_case ? strihash(path) : strhash(path);
+}
+
static void flush_output(struct merge_options *o)
{
if (o->buffer_output < 2 && o->obuf.len) {
@@ -314,15 +339,15 @@ static int save_files_dirs(const unsigned char *sha1,
struct strbuf *base, const char *path,
unsigned int mode, int stage, void *context)
{
+ struct path_hashmap_entry *entry;
int baselen = base->len;
struct merge_options *o = context;
strbuf_addstr(base, path);
- if (S_ISDIR(mode))
- string_list_insert(&o->current_directory_set, base->buf);
- else
- string_list_insert(&o->current_file_set, base->buf);
+ FLEX_ALLOC_MEM(entry, path, base->buf, base->len);
+ hashmap_entry_init(entry, path_hash(entry->path));
+ hashmap_add(&o->current_file_dir_set, entry);
strbuf_setlen(base, baselen);
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
@@ -642,6 +667,7 @@ static void add_flattened_path(struct strbuf *out, const char *s)
static char *unique_path(struct merge_options *o, const char *path, const char *branch)
{
+ struct path_hashmap_entry *entry;
struct strbuf newpath = STRBUF_INIT;
int suffix = 0;
size_t base_len;
@@ -650,14 +676,16 @@ static char *unique_path(struct merge_options *o, const char *path, const char *
add_flattened_path(&newpath, branch);
base_len = newpath.len;
- while (string_list_has_string(&o->current_file_set, newpath.buf) ||
- string_list_has_string(&o->current_directory_set, newpath.buf) ||
+ while (hashmap_get_from_hash(&o->current_file_dir_set,
+ path_hash(newpath.buf), newpath.buf) ||
(!o->call_depth && file_exists(newpath.buf))) {
strbuf_setlen(&newpath, base_len);
strbuf_addf(&newpath, "_%d", suffix++);
}
- string_list_insert(&o->current_file_set, newpath.buf);
+ FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len);
+ hashmap_entry_init(entry, path_hash(entry->path));
+ hashmap_add(&o->current_file_dir_set, entry);
return strbuf_detach(&newpath, NULL);
}
@@ -1941,8 +1969,14 @@ int merge_trees(struct merge_options *o,
if (unmerged_cache()) {
struct string_list *entries, *re_head, *re_merge;
int i;
- string_list_clear(&o->current_file_set, 1);
- string_list_clear(&o->current_directory_set, 1);
+ /*
+ * Only need the hashmap while processing entries, so
+ * initialize it here and free it when we are done running
+ * through the entries. Keeping it in the merge_options as
+ * opposed to decaring a local hashmap is for convenience
+ * so that we don't have to pass it to around.
+ */
+ hashmap_init(&o->current_file_dir_set, path_hashmap_cmp, NULL, 512);
get_files_dirs(o, head);
get_files_dirs(o, merge);
@@ -1978,6 +2012,8 @@ int merge_trees(struct merge_options *o,
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
+ hashmap_free(&o->current_file_dir_set, 1);
+
free(re_merge);
free(re_head);
free(entries);
@@ -2179,8 +2215,6 @@ void init_merge_options(struct merge_options *o)
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(&o->obuf, 0);
- string_list_init(&o->current_file_set, 1);
- string_list_init(&o->current_directory_set, 1);
string_list_init(&o->df_conflict_file_set, 1);
}
diff --git a/merge-recursive.h b/merge-recursive.h
index 735343b413..80d69d1401 100644
--- a/merge-recursive.h
+++ b/merge-recursive.h
@@ -25,8 +25,7 @@ struct merge_options {
int show_rename_progress;
int call_depth;
struct strbuf obuf;
- struct string_list current_file_set;
- struct string_list current_directory_set;
+ struct hashmap current_file_dir_set;
struct string_list df_conflict_file_set;
};
--
2.14.1.329.gcdd497e120.dirty
^ permalink raw reply related [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-07 16:26 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-07 16:25 [PATCH v3 0/3] merge-recursive: replace string_list with hashmap Kevin Willford
2017-09-07 16:25 ` [PATCH v3 1/3] merge-recursive: fix memory leak Kevin Willford
2017-09-07 16:25 ` [PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs Kevin Willford
2017-09-07 16:25 ` [PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap Kevin Willford
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.