git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects.
  2010-11-29 22:57 [PATCH v4] " Yann Dirson
@ 2010-11-29 22:57 ` Yann Dirson
  0 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-11-29 22:57 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 builtin/unpack-objects.c |   41 ++++++++++++++++++++++++++---------------
 1 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index f63973c..b0c15e6 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -11,6 +11,7 @@
 #include "progress.h"
 #include "decorate.h"
 #include "fsck.h"
+#include "sorted-array.h"
 
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict] < pack-file";
@@ -157,7 +158,25 @@ struct obj_info {
 #define FLAG_OPEN (1u<<20)
 #define FLAG_WRITTEN (1u<<21)
 
-static struct obj_info *obj_list;
+/*
+ * FIXME: obj_info is a sorted array, but we read it as a whole, we
+ * don't need insertion features.  This allows us to abuse unused
+ * obj_info_nr later as a means of specifying an upper bound for
+ * binary search.  obj_info_alloc shall be eliminated by the compiler
+ * as unused.
+ */
+static int obj_info_cmp(off_t ref, struct obj_info *elem)
+{
+	if (ref == elem->offset)
+		return 0;
+	if (ref < elem->offset)
+		return -1;
+	return 1;
+}
+declare_sorted_array(static, struct obj_info, obj_list);
+declare_gen_binsearch(static, struct obj_info, _obj_list_check, off_t);
+declare_sorted_array_search_check(static, obj_list_check, off_t, _obj_list_check,
+				  obj_list, obj_info_cmp);
 static unsigned nr_objects;
 
 /*
@@ -356,7 +375,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 		unsigned base_found = 0;
 		unsigned char *pack, c;
 		off_t base_offset;
-		unsigned lo, mid, hi;
+		int pos;
 
 		pack = fill(1);
 		c = *pack;
@@ -380,19 +399,11 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 			free(delta_data);
 			return;
 		}
-		lo = 0;
-		hi = nr;
-		while (lo < hi) {
-			mid = (lo + hi)/2;
-			if (base_offset < obj_list[mid].offset) {
-				hi = mid;
-			} else if (base_offset > obj_list[mid].offset) {
-				lo = mid + 1;
-			} else {
-				hashcpy(base_sha1, obj_list[mid].sha1);
-				base_found = !is_null_sha1(base_sha1);
-				break;
-			}
+		obj_list_nr = nr; /* kludge to bound the search */
+		pos = obj_list_check(base_offset);
+		if (pos >= 0) {
+			hashcpy(base_sha1, obj_list[pos].sha1);
+			base_found = !is_null_sha1(base_sha1);
 		}
 		if (!base_found) {
 			/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v5] generalizing sorted-array handling
@ 2010-12-05 10:34 Yann Dirson
  2010-12-05 10:34 ` [PATCH 1/6] Introduce sorted-array binary-search function Yann Dirson
                   ` (6 more replies)
  0 siblings, 7 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git

Changes from v4:

* better API documentation (was previously lacking or plain obsolete)
* added one more wrapper (used by yet-to-be-resent bulk-* series

Notes on current API:

* The macro names are a bit heavy-weight.  Better ideas welcome.

* This API is very verbose, and I'm not happy with that aspect.

It could be made less so, eg. causing insert wrappers to auto-declare
the required generic insert func, and causing the latter auto-declare
the required generic search func.  That would cause duplication of the
generic search func in many cases.

The duplication problem would not be an issue if we add an automatic
call to declare_gen_sorted_insert() in declare_sorted_array_insert_*,
but we would loose the symetry with the search API.

Adding "simple" API variants that would call all the necessary stuff
would help code readability, but adding yet more entry points seems a
dubious approach.

Or is that just the "use cpp for templating" just inadequate here ?

* could gain a dealloc API, to minimize the explicit use of the _nr
  and _alloc vars


The following binary-search occurences were not converted:

* read-cache.c::index_name_pos has widely-used API with 2 low-coupled
  cmp/init params: sorted-array could be generalized at the cost of
  using stdarg, but is it worth it ?

* pack-revindex.c::find_pack_revindex is a bit special and needs more
  thought

* cache-tree.c::subtree_pos and sha1_file::find_pack_entry_one too

* sha1_lookup.c stuff probably too special

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/6] Introduce sorted-array binary-search function.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:34 ` [PATCH 2/6] Convert diffcore-rename's rename_dst to the new sorted-array API Yann Dirson
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

We use a cpp-based template mechanism to declare the array and its
management data, as well as a search function.
Thanks to Jonathan Nieder for this design idea.

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 Makefile       |    1 +
 sorted-array.h |  153 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 154 insertions(+), 0 deletions(-)
 create mode 100644 sorted-array.h

diff --git a/Makefile b/Makefile
index 1d42413..ced07df 100644
--- a/Makefile
+++ b/Makefile
@@ -539,6 +539,7 @@ LIB_H += run-command.h
 LIB_H += sha1-lookup.h
 LIB_H += sideband.h
 LIB_H += sigchain.h
+LIB_H += sorted-array.h
 LIB_H += strbuf.h
 LIB_H += string-list.h
 LIB_H += submodule.h
diff --git a/sorted-array.h b/sorted-array.h
new file mode 100644
index 0000000..dc4be87
--- /dev/null
+++ b/sorted-array.h
@@ -0,0 +1,153 @@
+#ifndef SORTED_ARRAY_H_
+#define SORTED_ARRAY_H_
+
+/*
+ * Declare an array of given type, together with its management
+ * variable holding currently-allocated number of elements and number
+ * of elements effectively used.
+ */
+#define declare_sorted_array(MAYBESTATIC,ELEMTYPE,LIST)			\
+MAYBESTATIC ELEMTYPE *LIST;						\
+MAYBESTATIC int LIST##_nr, LIST##_alloc;
+
+/*
+ * Declare FUNCNAME as a binary-search function on sorted-arrays of
+ * ELEMTYPE elements, search term being be of type INITTYPE.
+ *
+ * The resulting function can act on any ELEMTYPE* list, using any
+ * suitable comparison function taking an INITTYPE argument and a
+ * pointer to an ELEMTYPE argument, and returning an int with the same
+ * meaning as strcmp.  If the element is found, it returns the index
+ * in the list where it was found; if it is not found, it returns
+ * (-pos - 1), where "pos" is the index in the list where the element
+ * would be inserted.
+ *
+ * See below for macros to define more specific functions tailored to
+ * a given list, and with output suitable to various usages.
+ */
+#define declare_gen_binsearch(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE)	\
+MAYBESTATIC int FUNCNAME(						\
+	ELEMTYPE *list, int list_nr,					\
+	int(*cmp_func)(INITTYPE ref, ELEMTYPE *elem),			\
+	INITTYPE data)							\
+{									\
+	int lo, hi;							\
+									\
+	lo = 0;								\
+	hi = list_nr;							\
+	while (hi > lo) {						\
+		int mid = (hi + lo) >> 1;				\
+		int cmp = cmp_func(data, list + mid);			\
+		if (!cmp)						\
+			return mid;					\
+		if (cmp < 0)						\
+			hi = mid;					\
+		else							\
+			lo = mid + 1;					\
+	}								\
+	return -lo - 1;							\
+}
+
+/*
+ * Declare FUNCNAME as a function to search for an element in
+ * sorted-arrays of ELEMTYPE elements, inserting it if it was not
+ * found, search term being be of type INITTYPE.  The position where
+ * to insert will be given found by SEARCHFUNC, which must be
+ * compatible with the search functions defined by
+ * declare_gen_binsearch().
+ *
+ * The resulting function takes the same arguments as similar search
+ * functions, with the addition of a function to initialize the
+ * newly-allocated element from the search term.
+ */
+#define declare_gen_sorted_insert(MAYBESTATIC,ELEMTYPE,FUNCNAME,SEARCHFUNC,INITTYPE) \
+MAYBESTATIC int FUNCNAME(						\
+	ELEMTYPE **list_p, int *list_nr_p, int *list_alloc_p,		\
+	int(*cmp_func)(INITTYPE ref, ELEMTYPE *elem),			\
+	void(*init_func)(ELEMTYPE *elem, INITTYPE init),		\
+	INITTYPE data)							\
+{									\
+	int pos = SEARCHFUNC(*list_p, *list_nr_p, cmp_func, data);	\
+	if (pos >= 0) 							\
+		return pos;						\
+	/* not found */							\
+	pos = -pos - 1;							\
+	/* insert to make it at "pos" */				\
+	if (*list_alloc_p <= *list_nr_p) {				\
+		(*list_alloc_p) = alloc_nr((*list_alloc_p));		\
+		*list_p = xrealloc(*list_p,				\
+				   (*list_alloc_p) * sizeof(**list_p)); \
+	}								\
+	(*list_nr_p)++;							\
+	if (pos < *list_nr_p)						\
+		memmove(*list_p + pos + 1, *list_p + pos,		\
+			(*list_nr_p - pos - 1) * sizeof(**list_p));	\
+	init_func(&(*list_p)[pos], data);				\
+	return -pos - 1;						\
+}
+
+/*
+ * Returns the position of the element if found pre-existing in the
+ * list, or if not found, -pos-1 where pos is where the element would
+ * have been inserted.
+ */
+#define declare_sorted_array_search_check(MAYBESTATIC,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	return GENSEARCH(LIST, LIST##_nr, CMP, data);			\
+}
+
+/*
+ * Returns the position of the element if found pre-existing in the
+ * list, or if not found inserts it, and returns -pos-1 where pos is
+ * where the element was inserted.
+ */
+#define declare_sorted_array_insert_check(MAYBESTATIC,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	return GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			 CMP, INIT, data);				\
+}
+
+/*
+ * Insert, and just tell whether the searched element was pre-existing
+ * in the list or not.
+ */
+#define declare_sorted_array_insert_checkbool(MAYBESTATIC,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC int FUNCNAME(INITTYPE data)					\
+{									\
+	int idx = GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			    CMP, INIT, data);				\
+	if (idx < 0)							\
+		return 0;						\
+	return 1;							\
+}
+
+/*
+ * Search for element.  Returns address of the element found, or NULL
+ * if not found.
+ */
+#define declare_sorted_array_search_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENSEARCH,LIST,CMP) \
+MAYBESTATIC ELEMTYPE *FUNCNAME(INITTYPE data)				\
+{									\
+	int idx = GENSEARCH(LIST, LIST##_nr, CMP, data);		\
+	if (idx < 0)							\
+		return NULL;						\
+	return &(LIST[idx]);						\
+}
+
+/*
+ * Insert element if not there already.  Returns address of the
+ * element found or newly-inserted.
+ */
+#define declare_sorted_array_insert_elem(MAYBESTATIC,ELEMTYPE,FUNCNAME,INITTYPE,GENINSERT,LIST,CMP,INIT) \
+MAYBESTATIC ELEMTYPE *FUNCNAME(INITTYPE data)				\
+{									\
+	int idx = GENINSERT(&LIST, &LIST##_nr, &LIST##_alloc,		\
+			    CMP, INIT, data);				\
+	if (idx < 0)							\
+		idx = -idx - 1;						\
+	return &(LIST[idx]);						\
+}
+
+#endif
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/6] Convert diffcore-rename's rename_dst to the new sorted-array API.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
  2010-12-05 10:34 ` [PATCH 1/6] Introduce sorted-array binary-search function Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:34 ` [PATCH 3/6] Convert diffcore-rename's rename_src " Yann Dirson
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

The sorted-array API splits search and insert into two separated
functions, which makes the caller code more clear.

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 diffcore-rename.c |   66 ++++++++++++++++++++---------------------------------
 1 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index df41be5..a655017 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -5,52 +5,36 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "hash.h"
+#include "sorted-array.h"
 
 /* Table of rename/copy destinations */
 
-static struct diff_rename_dst {
+struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
-} *rename_dst;
-static int rename_dst_nr, rename_dst_alloc;
+};
 
-static struct diff_rename_dst *locate_rename_dst(struct diff_filespec *two,
-						 int insert_ok)
+static int rename_dst_cmp(struct diff_filespec *ref_spec, struct diff_rename_dst *elem)
 {
-	int first, last;
-
-	first = 0;
-	last = rename_dst_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
-		struct diff_rename_dst *dst = &(rename_dst[next]);
-		int cmp = strcmp(two->path, dst->two->path);
-		if (!cmp)
-			return dst;
-		if (cmp < 0) {
-			last = next;
-			continue;
-		}
-		first = next+1;
-	}
-	/* not found */
-	if (!insert_ok)
-		return NULL;
-	/* insert to make it at "first" */
-	if (rename_dst_alloc <= rename_dst_nr) {
-		rename_dst_alloc = alloc_nr(rename_dst_alloc);
-		rename_dst = xrealloc(rename_dst,
-				      rename_dst_alloc * sizeof(*rename_dst));
-	}
-	rename_dst_nr++;
-	if (first < rename_dst_nr)
-		memmove(rename_dst + first + 1, rename_dst + first,
-			(rename_dst_nr - first - 1) * sizeof(*rename_dst));
-	rename_dst[first].two = alloc_filespec(two->path);
-	fill_filespec(rename_dst[first].two, two->sha1, two->mode);
-	rename_dst[first].pair = NULL;
-	return &(rename_dst[first]);
+	return strcmp(ref_spec->path, elem->two->path);
+}
+static void rename_dst_init(struct diff_rename_dst *elem, struct diff_filespec *ref_spec)
+{
+	elem->two = alloc_filespec(ref_spec->path);
+	fill_filespec(elem->two, ref_spec->sha1, ref_spec->mode);
+	elem->pair = NULL;
 }
+declare_sorted_array(static, struct diff_rename_dst, rename_dst);
+declare_gen_binsearch(static, struct diff_rename_dst, _locate_rename_dst,
+		      struct diff_filespec *);
+declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst,
+				 struct diff_filespec *, _locate_rename_dst,
+				 rename_dst, rename_dst_cmp);
+declare_gen_sorted_insert(static, struct diff_rename_dst, _register_rename_dst,
+			  _locate_rename_dst, struct diff_filespec *);
+declare_sorted_array_insert_checkbool(static, register_rename_dst, struct diff_filespec *,
+				      _register_rename_dst,
+				      rename_dst, rename_dst_cmp, rename_dst_init);
 
 /* Table of rename/copy src files */
 static struct diff_rename_src {
@@ -437,7 +421,7 @@ void diffcore_rename(struct diff_options *options)
 				 strcmp(options->single_follow, p->two->path))
 				continue; /* not interested */
 			else
-				locate_rename_dst(p->two, 1);
+				register_rename_dst(p->two);
 		}
 		else if (!DIFF_FILE_VALID(p->two)) {
 			/*
@@ -582,7 +566,7 @@ void diffcore_rename(struct diff_options *options)
 			 * not been turned into a rename/copy already.
 			 */
 			struct diff_rename_dst *dst =
-				locate_rename_dst(p->two, 0);
+				locate_rename_dst(p->two);
 			if (dst && dst->pair) {
 				diff_q(&outq, dst->pair);
 				pair_to_free = p;
@@ -613,7 +597,7 @@ void diffcore_rename(struct diff_options *options)
 			if (DIFF_PAIR_BROKEN(p)) {
 				/* broken delete */
 				struct diff_rename_dst *dst =
-					locate_rename_dst(p->one, 0);
+					locate_rename_dst(p->one);
 				if (dst && dst->pair)
 					/* counterpart is now rename/copy */
 					pair_to_free = p;
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/6] Convert diffcore-rename's rename_src to the new sorted-array API.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
  2010-12-05 10:34 ` [PATCH 1/6] Introduce sorted-array binary-search function Yann Dirson
  2010-12-05 10:34 ` [PATCH 2/6] Convert diffcore-rename's rename_dst to the new sorted-array API Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:34 ` [PATCH 4/6] Convert pack-objects.c " Yann Dirson
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

There was no compelling reason to pass separately two members of a
single struct to the insert function.  That's a happy coincidence.

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 diffcore-rename.c |   57 ++++++++++++++++++----------------------------------
 1 files changed, 20 insertions(+), 37 deletions(-)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index a655017..7e35a82 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -37,46 +37,29 @@ declare_sorted_array_insert_checkbool(static, register_rename_dst, struct diff_f
 				      rename_dst, rename_dst_cmp, rename_dst_init);
 
 /* Table of rename/copy src files */
-static struct diff_rename_src {
+
+struct diff_rename_src {
 	struct diff_filespec *one;
 	unsigned short score; /* to remember the break score */
-} *rename_src;
-static int rename_src_nr, rename_src_alloc;
+};
 
-static struct diff_rename_src *register_rename_src(struct diff_filespec *one,
-						   unsigned short score)
+static int rename_src_cmp(struct diff_filepair *ref_pair, struct diff_rename_src *elem)
 {
-	int first, last;
-
-	first = 0;
-	last = rename_src_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
-		struct diff_rename_src *src = &(rename_src[next]);
-		int cmp = strcmp(one->path, src->one->path);
-		if (!cmp)
-			return src;
-		if (cmp < 0) {
-			last = next;
-			continue;
-		}
-		first = next+1;
-	}
-
-	/* insert to make it at "first" */
-	if (rename_src_alloc <= rename_src_nr) {
-		rename_src_alloc = alloc_nr(rename_src_alloc);
-		rename_src = xrealloc(rename_src,
-				      rename_src_alloc * sizeof(*rename_src));
-	}
-	rename_src_nr++;
-	if (first < rename_src_nr)
-		memmove(rename_src + first + 1, rename_src + first,
-			(rename_src_nr - first - 1) * sizeof(*rename_src));
-	rename_src[first].one = one;
-	rename_src[first].score = score;
-	return &(rename_src[first]);
+	return strcmp(ref_pair->one->path, elem->one->path);
+}
+static void rename_src_init(struct diff_rename_src *elem, struct diff_filepair *ref_pair)
+{
+	elem->one = ref_pair->one;
+	elem->score = ref_pair->score;
 }
+declare_sorted_array(static, struct diff_rename_src, rename_src);
+declare_gen_binsearch(static, struct diff_rename_src, _locate_rename_src,
+		      struct diff_filepair *);
+declare_gen_sorted_insert(static, struct diff_rename_src, _register_rename_src,
+			  _locate_rename_src, struct diff_filepair *);
+declare_sorted_array_insert_checkbool(static, register_rename_src, struct diff_filepair *,
+				      _register_rename_src,
+				      rename_src, rename_src_cmp, rename_src_init);
 
 static int basename_same(struct diff_filespec *src, struct diff_filespec *dst)
 {
@@ -433,7 +416,7 @@ void diffcore_rename(struct diff_options *options)
 			 */
 			if (p->broken_pair && !p->score)
 				p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+			register_rename_src(p);
 		}
 		else if (detect_rename == DIFF_DETECT_COPY) {
 			/*
@@ -441,7 +424,7 @@ void diffcore_rename(struct diff_options *options)
 			 * one, to indicate ourselves as a user.
 			 */
 			p->one->rename_used++;
-			register_rename_src(p->one, p->score);
+			register_rename_src(p);
 		}
 	}
 	if (rename_dst_nr == 0 || rename_src_nr == 0)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/6] Convert pack-objects.c to the new sorted-array API.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
                   ` (2 preceding siblings ...)
  2010-12-05 10:34 ` [PATCH 3/6] Convert diffcore-rename's rename_src " Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:34 ` [PATCH 5/6] Use sorted-array API for commit.c's commit_graft Yann Dirson
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

In this file the "list size" variable was named in a non-standard way.
The new API forces to use a more common convention.

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 builtin/pack-objects.c |   51 ++++++++++++++---------------------------------
 1 files changed, 15 insertions(+), 36 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 3cbeb29..887a55c 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -16,6 +16,7 @@
 #include "list-objects.h"
 #include "progress.h"
 #include "refs.h"
+#include "sorted-array.h"
 
 #ifndef NO_PTHREADS
 #include <pthread.h>
@@ -871,45 +872,23 @@ static void add_pbase_object(struct tree_desc *tree,
 	}
 }
 
-static unsigned *done_pbase_paths;
-static int done_pbase_paths_num;
-static int done_pbase_paths_alloc;
-static int done_pbase_path_pos(unsigned hash)
+static int unsigned_cmp(unsigned ref, unsigned *elem)
 {
-	int lo = 0;
-	int hi = done_pbase_paths_num;
-	while (lo < hi) {
-		int mi = (hi + lo) / 2;
-		if (done_pbase_paths[mi] == hash)
-			return mi;
-		if (done_pbase_paths[mi] < hash)
-			hi = mi;
-		else
-			lo = mi + 1;
-	}
-	return -lo-1;
+	if (ref == *elem)
+		return 0;
+	if (ref < *elem)
+		return -1;
+	return 1;
 }
-
-static int check_pbase_path(unsigned hash)
+static void unsigned_init(unsigned *elem, unsigned ref)
 {
-	int pos = (!done_pbase_paths) ? -1 : done_pbase_path_pos(hash);
-	if (0 <= pos)
-		return 1;
-	pos = -pos - 1;
-	if (done_pbase_paths_alloc <= done_pbase_paths_num) {
-		done_pbase_paths_alloc = alloc_nr(done_pbase_paths_alloc);
-		done_pbase_paths = xrealloc(done_pbase_paths,
-					    done_pbase_paths_alloc *
-					    sizeof(unsigned));
-	}
-	done_pbase_paths_num++;
-	if (pos < done_pbase_paths_num)
-		memmove(done_pbase_paths + pos + 1,
-			done_pbase_paths + pos,
-			(done_pbase_paths_num - pos - 1) * sizeof(unsigned));
-	done_pbase_paths[pos] = hash;
-	return 0;
+	*elem = ref;
 }
+declare_sorted_array(static, unsigned, done_pbase_paths);
+declare_gen_binsearch(static, unsigned, done_pbase_path_pos, unsigned);
+declare_gen_sorted_insert(static, unsigned, _check_pbase_path, done_pbase_path_pos, unsigned);
+declare_sorted_array_insert_checkbool(static, check_pbase_path, unsigned, _check_pbase_path,
+				      done_pbase_paths, unsigned_cmp, unsigned_init);
 
 static void add_preferred_base_object(const char *name)
 {
@@ -987,7 +966,7 @@ static void cleanup_preferred_base(void)
 
 	free(done_pbase_paths);
 	done_pbase_paths = NULL;
-	done_pbase_paths_num = done_pbase_paths_alloc = 0;
+	done_pbase_paths_nr = done_pbase_paths_alloc = 0;
 }
 
 static void check_object(struct object_entry *entry)
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/6] Use sorted-array API for commit.c's commit_graft.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
                   ` (3 preceding siblings ...)
  2010-12-05 10:34 ` [PATCH 4/6] Convert pack-objects.c " Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:34 ` [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects Yann Dirson
  2010-12-05 10:44 ` [PATCH v5] generalizing sorted-array handling Jonathan Nieder
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Factorizing code fixes off-by-one error in the duplicated code (caused
mostly harmless anticipated growing of the array).

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 commit.c |   62 +++++++++++++++++++++++++++-----------------------------------
 1 files changed, 27 insertions(+), 35 deletions(-)

diff --git a/commit.c b/commit.c
index 2d9265d..b7aeee4 100644
--- a/commit.c
+++ b/commit.c
@@ -6,6 +6,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "notes.h"
+#include "sorted-array.h"
 
 int save_commit_buffer = 1;
 
@@ -76,33 +77,37 @@ static unsigned long parse_commit_date(const char *buf, const char *tail)
 	return strtoul(dateptr, NULL, 10);
 }
 
-static struct commit_graft **commit_graft;
-static int commit_graft_alloc, commit_graft_nr;
-
-static int commit_graft_pos(const unsigned char *sha1)
+static int commit_graft_cmp(const unsigned char *ref_sha1, struct commit_graft **elem)
 {
-	int lo, hi;
-	lo = 0;
-	hi = commit_graft_nr;
-	while (lo < hi) {
-		int mi = (lo + hi) / 2;
-		struct commit_graft *graft = commit_graft[mi];
-		int cmp = hashcmp(sha1, graft->sha1);
-		if (!cmp)
-			return mi;
-		if (cmp < 0)
-			hi = mi;
-		else
-			lo = mi + 1;
-	}
-	return -lo - 1;
+	return hashcmp(ref_sha1, (*elem)->sha1);
 }
+declare_sorted_array(static, struct commit_graft *, commit_graft);
+declare_gen_binsearch(static, struct commit_graft *, _commit_graft_pos,
+		      const unsigned char *);
+declare_sorted_array_search_check(static, commit_graft_pos, const unsigned char *,
+				  _commit_graft_pos, commit_graft, commit_graft_cmp);
 
+// FIXME: do we want to/can we remove INITTYPE from gen_binsearch ?
+static int commit_graft_cmp2(struct commit_graft *ref_graft, struct commit_graft **elem)
+{
+	return commit_graft_cmp(ref_graft->sha1, elem);
+}
+declare_gen_binsearch(static, struct commit_graft *, _commit_graft_pos2,
+		      struct commit_graft *);
+static void commit_graft_init(struct commit_graft **elem, struct commit_graft *ref_graft)
+{
+	*elem = ref_graft;
+}
+declare_gen_sorted_insert(static, struct commit_graft *, _register_commit_graft0,
+			  _commit_graft_pos2, struct commit_graft *)
+declare_sorted_array_insert_check(static, register_commit_graft0, struct commit_graft *,
+				  _register_commit_graft0, commit_graft,
+				  commit_graft_cmp2, commit_graft_init);
 int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 {
-	int pos = commit_graft_pos(graft->sha1);
+	int pos = register_commit_graft0(graft);
 
-	if (0 <= pos) {
+	if (pos >= 0) {
 		if (ignore_dups)
 			free(graft);
 		else {
@@ -111,19 +116,6 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups)
 		}
 		return 1;
 	}
-	pos = -pos - 1;
-	if (commit_graft_alloc <= ++commit_graft_nr) {
-		commit_graft_alloc = alloc_nr(commit_graft_alloc);
-		commit_graft = xrealloc(commit_graft,
-					sizeof(*commit_graft) *
-					commit_graft_alloc);
-	}
-	if (pos < commit_graft_nr)
-		memmove(commit_graft + pos + 1,
-			commit_graft + pos,
-			(commit_graft_nr - pos - 1) *
-			sizeof(*commit_graft));
-	commit_graft[pos] = graft;
 	return 0;
 }
 
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects.
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
                   ` (4 preceding siblings ...)
  2010-12-05 10:34 ` [PATCH 5/6] Use sorted-array API for commit.c's commit_graft Yann Dirson
@ 2010-12-05 10:34 ` Yann Dirson
  2010-12-05 10:44 ` [PATCH v5] generalizing sorted-array handling Jonathan Nieder
  6 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 10:34 UTC (permalink / raw)
  To: git; +Cc: Yann Dirson

Signed-off-by: Yann Dirson <ydirson@altern.org>
---
 builtin/unpack-objects.c |   41 ++++++++++++++++++++++++++---------------
 1 files changed, 26 insertions(+), 15 deletions(-)

diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index f63973c..b0c15e6 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -11,6 +11,7 @@
 #include "progress.h"
 #include "decorate.h"
 #include "fsck.h"
+#include "sorted-array.h"
 
 static int dry_run, quiet, recover, has_errors, strict;
 static const char unpack_usage[] = "git unpack-objects [-n] [-q] [-r] [--strict] < pack-file";
@@ -157,7 +158,25 @@ struct obj_info {
 #define FLAG_OPEN (1u<<20)
 #define FLAG_WRITTEN (1u<<21)
 
-static struct obj_info *obj_list;
+/*
+ * FIXME: obj_info is a sorted array, but we read it as a whole, we
+ * don't need insertion features.  This allows us to abuse unused
+ * obj_info_nr later as a means of specifying an upper bound for
+ * binary search.  obj_info_alloc shall be eliminated by the compiler
+ * as unused.
+ */
+static int obj_info_cmp(off_t ref, struct obj_info *elem)
+{
+	if (ref == elem->offset)
+		return 0;
+	if (ref < elem->offset)
+		return -1;
+	return 1;
+}
+declare_sorted_array(static, struct obj_info, obj_list);
+declare_gen_binsearch(static, struct obj_info, _obj_list_check, off_t);
+declare_sorted_array_search_check(static, obj_list_check, off_t, _obj_list_check,
+				  obj_list, obj_info_cmp);
 static unsigned nr_objects;
 
 /*
@@ -356,7 +375,7 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 		unsigned base_found = 0;
 		unsigned char *pack, c;
 		off_t base_offset;
-		unsigned lo, mid, hi;
+		int pos;
 
 		pack = fill(1);
 		c = *pack;
@@ -380,19 +399,11 @@ static void unpack_delta_entry(enum object_type type, unsigned long delta_size,
 			free(delta_data);
 			return;
 		}
-		lo = 0;
-		hi = nr;
-		while (lo < hi) {
-			mid = (lo + hi)/2;
-			if (base_offset < obj_list[mid].offset) {
-				hi = mid;
-			} else if (base_offset > obj_list[mid].offset) {
-				lo = mid + 1;
-			} else {
-				hashcpy(base_sha1, obj_list[mid].sha1);
-				base_found = !is_null_sha1(base_sha1);
-				break;
-			}
+		obj_list_nr = nr; /* kludge to bound the search */
+		pos = obj_list_check(base_offset);
+		if (pos >= 0) {
+			hashcpy(base_sha1, obj_list[pos].sha1);
+			base_found = !is_null_sha1(base_sha1);
 		}
 		if (!base_found) {
 			/*
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] generalizing sorted-array handling
  2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
                   ` (5 preceding siblings ...)
  2010-12-05 10:34 ` [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects Yann Dirson
@ 2010-12-05 10:44 ` Jonathan Nieder
  2010-12-05 12:02   ` Yann Dirson
  6 siblings, 1 reply; 10+ messages in thread
From: Jonathan Nieder @ 2010-12-05 10:44 UTC (permalink / raw)
  To: Yann Dirson; +Cc: git

Yann Dirson wrote:

> * better API documentation (was previously lacking or plain obsolete)

Thanks!  In general I find it is easiest to read and write
documentation out of line for this sort of thing.  That way, even
after the documentation grows obsolete it doesn't seem so out of
place.

See Documentation/technical/api-strbuf.txt, api-sigchain, and
api-allocation-growing for some nice (up-to-date) examples.

In particular:

> * This API is very verbose, and I'm not happy with that aspect.

Could you give a quick stripped-down usage example?

[...]
> Adding "simple" API variants that would call all the necessary stuff
> would help code readability, but adding yet more entry points seems a
> dubious approach.

On the contrary, simple API variants don't sound so bad to me,
once the fundamentals are in good shape.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v5] generalizing sorted-array handling
  2010-12-05 10:44 ` [PATCH v5] generalizing sorted-array handling Jonathan Nieder
@ 2010-12-05 12:02   ` Yann Dirson
  0 siblings, 0 replies; 10+ messages in thread
From: Yann Dirson @ 2010-12-05 12:02 UTC (permalink / raw)
  To: Jonathan Nieder; +Cc: git

On Sun, Dec 05, 2010 at 04:44:26AM -0600, Jonathan Nieder wrote:
> Yann Dirson wrote:
> 
> > * better API documentation (was previously lacking or plain obsolete)
> 
> Thanks!  In general I find it is easiest to read and write
> documentation out of line for this sort of thing.  That way, even
> after the documentation grows obsolete it doesn't seem so out of
> place.
> 
> See Documentation/technical/api-strbuf.txt, api-sigchain, and
> api-allocation-growing for some nice (up-to-date) examples.

OK, can do that.

> In particular:
> 
> > * This API is very verbose, and I'm not happy with that aspect.
> 
> Could you give a quick stripped-down usage example?

Well, patches 2-5 in the series provide good examples - probably
better seen with the "New version" checkbos in gitk, did not find a
commandline flag equivalent (is there one ?).

Patch 4 is probably the simplest example: we use the new macros to
define the same insert API (except for the "number of element" var,
which used a non-standard naming scheme here).  Since the lookup API
was only used inside the insert func, there was no need for a lookup
wrapper here, so we just declare the generic search+ insert funcs, and
an insert wrapper.

---------------------------- builtin/pack-objects.c ----------------------------
index 3cbeb29..887a55c 100644
@@ -16,6 +16,7 @@
 #include "list-objects.h"
 #include "progress.h"
 #include "refs.h"
+#include "sorted-array.h"
 
 #ifndef NO_PTHREADS
 #include <pthread.h>
@@ -871,45 +872,23 @@ static void add_pbase_object(struct tree_desc *tree,
 	}
 }
 
+static int unsigned_cmp(unsigned ref, unsigned *elem)
 {
+	if (ref == *elem)
+		return 0;
+	if (ref < *elem)
+		return -1;
+	return 1;
 }
+static void unsigned_init(unsigned *elem, unsigned ref)
 {
+	*elem = ref;
 }
+declare_sorted_array(static, unsigned, done_pbase_paths);
+declare_gen_binsearch(static, unsigned, done_pbase_path_pos, unsigned);
+declare_gen_sorted_insert(static, unsigned, _check_pbase_path, done_pbase_path_pos, unsigned);
+declare_sorted_array_insert_checkbool(static, check_pbase_path, unsigned, _check_pbase_path,
+				      done_pbase_paths, unsigned_cmp, unsigned_init);
 
 static void add_preferred_base_object(const char *name)
 {
@@ -987,7 +966,7 @@ static void cleanup_preferred_base(void)
 
 	free(done_pbase_paths);
 	done_pbase_paths = NULL;
+	done_pbase_paths_nr = done_pbase_paths_alloc = 0;
 }
 
 static void check_object(struct object_entry *entry)
----------------------------


Patch 2 is a more complete example, where the oiginal API used a
single function with an additional boolean arg to select the
behaviour.  So here we also define a search wrapper, and this make the
callsites more explicit.

------------------------------ diffcore-rename.c ------------------------------
index df41be5..a655017 100644
@@ -5,52 +5,36 @@
 #include "diff.h"
 #include "diffcore.h"
 #include "hash.h"
+#include "sorted-array.h"
 
 /* Table of rename/copy destinations */
 
+struct diff_rename_dst {
 	struct diff_filespec *two;
 	struct diff_filepair *pair;
+};
 
+static int rename_dst_cmp(struct diff_filespec *ref_spec, struct diff_rename_dst *elem)
 {
+	return strcmp(ref_spec->path, elem->two->path);
+}
+static void rename_dst_init(struct diff_rename_dst *elem, struct diff_filespec *ref_spec)
+{
+	elem->two = alloc_filespec(ref_spec->path);
+	fill_filespec(elem->two, ref_spec->sha1, ref_spec->mode);
+	elem->pair = NULL;
 }
+declare_sorted_array(static, struct diff_rename_dst, rename_dst);
+declare_gen_binsearch(static, struct diff_rename_dst, _locate_rename_dst,
+		      struct diff_filespec *);
+declare_sorted_array_search_elem(static, struct diff_rename_dst, locate_rename_dst,
+				 struct diff_filespec *, _locate_rename_dst,
+				 rename_dst, rename_dst_cmp);
+declare_gen_sorted_insert(static, struct diff_rename_dst, _register_rename_dst,
+			  _locate_rename_dst, struct diff_filespec *);
+declare_sorted_array_insert_checkbool(static, register_rename_dst, struct diff_filespec *,
+				      _register_rename_dst,
+				      rename_dst, rename_dst_cmp, rename_dst_init);
 
 /* Table of rename/copy src files */
 static struct diff_rename_src {
@@ -437,7 +421,7 @@ void diffcore_rename(struct diff_options *options)
 				 strcmp(options->single_follow, p->two->path))
 				continue; /* not interested */
 			else
+				register_rename_dst(p->two);
 		}
 		else if (!DIFF_FILE_VALID(p->two)) {
 			/*
@@ -582,7 +566,7 @@ void diffcore_rename(struct diff_options *options)
 			 * not been turned into a rename/copy already.
 			 */
 			struct diff_rename_dst *dst =
+				locate_rename_dst(p->two);
 			if (dst && dst->pair) {
 				diff_q(&outq, dst->pair);
 				pair_to_free = p;
@@ -613,7 +597,7 @@ void diffcore_rename(struct diff_options *options)
 			if (DIFF_PAIR_BROKEN(p)) {
 				/* broken delete */
 				struct diff_rename_dst *dst =
+					locate_rename_dst(p->one);
 				if (dst && dst->pair)
 					/* counterpart is now rename/copy */
 					pair_to_free = p;
------------------------------

> [...]
> > Adding "simple" API variants that would call all the necessary stuff
> > would help code readability, but adding yet more entry points seems a
> > dubious approach.
> 
> On the contrary, simple API variants don't sound so bad to me,
> once the fundamentals are in good shape.

The problem is with the number of combinations.  We already have
potentially 6 wrappers (not all of which are defined yet), with:

* operation: search | insert
* return-value semantic: check | checkbool | elem

If we add to these basic building-blocks:

* wrapper variants that declare the generic func: we double the count
* insert-wrapper variants that declare the generic search: *1.5

... which gives something like 18 wrappers.  And this number will
still raise by 6 each time we feel the need for a new return-value
semantic.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-12-05 12:02 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-05 10:34 [PATCH v5] generalizing sorted-array handling Yann Dirson
2010-12-05 10:34 ` [PATCH 1/6] Introduce sorted-array binary-search function Yann Dirson
2010-12-05 10:34 ` [PATCH 2/6] Convert diffcore-rename's rename_dst to the new sorted-array API Yann Dirson
2010-12-05 10:34 ` [PATCH 3/6] Convert diffcore-rename's rename_src " Yann Dirson
2010-12-05 10:34 ` [PATCH 4/6] Convert pack-objects.c " Yann Dirson
2010-12-05 10:34 ` [PATCH 5/6] Use sorted-array API for commit.c's commit_graft Yann Dirson
2010-12-05 10:34 ` [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects Yann Dirson
2010-12-05 10:44 ` [PATCH v5] generalizing sorted-array handling Jonathan Nieder
2010-12-05 12:02   ` Yann Dirson
  -- strict thread matches above, loose matches on Subject: below --
2010-11-29 22:57 [PATCH v4] " Yann Dirson
2010-11-29 22:57 ` [PATCH 6/6] [WIP] subvert sorted-array to replace binary-search in unpack-objects Yann Dirson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).