[PATCH 00/22] cache cursors: an introduction

git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 00/22] cache cursors: an introduction
@ 2005-09-12 14:55 Chuck Lever
  2005-09-12 14:55 ` [PATCH 01/22] introduce facility to walk through the active cache Chuck Lever
                   ` (23 more replies)
  0 siblings, 24 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git, git

[ This series is posted for review and comments. ]

The following patch series introduces an abstraction called a "cache
cursor" that will eventually allow us to replace the current
active_cache array with something else.

A cache cursor represents a position inside the cache.  This position
has a cache_entry associated with it, of course, but since the cache
is ordered, a cache cursor also has the concept of next, previous,
and end-of-cache.

With a cache cursor we can build a simple iterator mechanism that
calls a particular function for every entry in the cache, in order.
This allows us to hide further the specifics of the active cache
implementation -- the function gets to see the cache cursor and
an element, but does not have direct access to the cache and cannot
assume it has a particular structure.

Currently the cache cursor type is just a structure with an integer
in it, so it largely mimics the existing implementation.

This patch series is against the "proposed updates" branch, as of
a couple of days ago.  It has been tested via "make test" and I'm
currently using it for my own work without issue.

Signed-off-by: Chuck Lever <cel@netapp.com>
--

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 01/22] introduce facility to walk through the active cache
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 02/22] use cache iterator in checkout-index.c Chuck Lever
                   ` (22 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Introduce a mechanism that allows functions to walk through entries
in the active cache in order and execute an function on each entry.

We also introduce the concept of "cache cursor".  A cursor is simply
a type-independent way of referring to a unique position in the cache.
The cache is strongly ordered, so cursors also provide a type-
independent way of exposing the ordering of the cache positions:
ie next, previous, and eof?

This facility makes no changes to struct cache_entry, which also
happens to be the on-disk format of a cache entry.  By mmapping the
file that contains the cache entries, no data copying is required
to read in the cache.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 cache.h |   75 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 75 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -130,6 +130,12 @@ static inline unsigned int create_ce_mod
 extern struct cache_entry **active_cache;
 extern unsigned int active_nr, active_alloc, active_cache_changed;
 
+struct cache_cursor {
+	int pos;
+};
+
+typedef int (*cache_iterator_fn_t) (struct cache_cursor *cc, struct cache_entry *ce);
+
 #define GIT_DIR_ENVIRONMENT "GIT_DIR"
 #define DEFAULT_GIT_DIR_ENVIRONMENT ".git"
 #define DB_ENVIRONMENT "GIT_OBJECT_DIRECTORY"
@@ -273,6 +279,75 @@ static inline void *xcalloc(size_t nmemb
 	return ret;
 }
 
+static inline void init_cc(struct cache_cursor *cc)
+{
+	cc->pos = 0;
+}
+
+static inline void next_cc(struct cache_cursor *cc)
+{
+	cc->pos++;
+}
+
+static inline void prev_cc(struct cache_cursor *cc)
+{
+	cc->pos--;
+}
+
+static inline struct cache_entry *cc_to_ce(struct cache_cursor *cc)
+{
+	return active_cache[cc->pos];
+}
+
+static inline void set_ce_at_cursor(struct cache_cursor *cc, struct cache_entry *new)
+{
+	active_cache[cc->pos] = new;
+	active_cache_changed = 1;
+}
+
+static inline int cache_empty(void)
+{
+	return active_cache == NULL;
+}
+
+static inline int cache_eof(struct cache_cursor *cc)
+{
+	if (cc->pos < active_nr)
+		return 0;
+	return -1;
+}
+
+static inline int read_cache_needed(void)
+{
+	return cache_empty();
+}
+
+static inline void next_name(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	do {
+		next_cc(cc);
+	} while (!cache_eof(cc) && ce_same_name(ce, cc_to_ce(cc)));
+}
+
+/*
+ * Walk the entire active cache, invoking "func" on each entry
+ *
+ * "func" is responsible for updating the cache_cursor.  To break
+ * out of the loop, "func" can return a negative result.
+ */
+static inline int walk_cache(cache_iterator_fn_t func)
+{
+	struct cache_cursor cc;
+
+	init_cc(&cc);
+	while (!cache_eof(&cc)) {
+		int status = func(&cc, cc_to_ce(&cc));
+		if (status < 0)
+			return status;
+	}
+	return 0;
+}
+
 struct checkout {
 	const char *base_dir;
 	int base_dir_len;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 02/22] use cache iterator in checkout-index.c
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
  2005-09-12 14:55 ` [PATCH 01/22] introduce facility to walk through the active cache Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 03/22] teach diff.c about cache iterators Chuck Lever
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 checkout-index.c |   16 +++++-----------
 1 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/checkout-index.c b/checkout-index.c
--- a/checkout-index.c
+++ b/checkout-index.c
@@ -61,17 +61,12 @@ static int checkout_file(const char *nam
 	return checkout_entry(active_cache[pos], &state);
 }
 
-static int checkout_all(void)
+static int checkout_one(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i;
-
-	for (i = 0; i < active_nr ; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (ce_stage(ce))
-			continue;
+	if (!ce_stage(ce))
 		if (checkout_entry(ce, &state) < 0)
 			return -1;
-	}
+	next_cc(cc);
 	return 0;
 }
 
@@ -85,15 +80,14 @@ int main(int argc, char **argv)
 	int i, force_filename = 0;
 	int newfd = -1;
 
-	if (read_cache() < 0) {
+	if (read_cache() < 0)
 		die("invalid cache");
-	}
 
 	for (i = 1; i < argc; i++) {
 		const char *arg = argv[i];
 		if (!force_filename) {
 			if (!strcmp(arg, "-a")) {
-				checkout_all();
+				walk_cache(checkout_one);
 				continue;
 			}
 			if (!strcmp(arg, "--")) {

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 03/22] teach diff.c about cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
  2005-09-12 14:55 ` [PATCH 01/22] introduce facility to walk through the active cache Chuck Lever
  2005-09-12 14:55 ` [PATCH 02/22] use cache iterator in checkout-index.c Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 04/22] teach diff-index.c " Chuck Lever
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 diff.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -258,7 +258,7 @@ static int work_tree_matches(const char 
 	 * by diff-cache --cached, which does read the cache before
 	 * calling us.
 	 */
-	if (!active_cache)
+	if (read_cache_needed())
 		return 0;
 
 	len = strlen(name);
@@ -677,12 +677,12 @@ void diff_setup(int flags)
 	if (flags & DIFF_SETUP_REVERSE)
 		reverse_diff = 1;
 	if (flags & DIFF_SETUP_USE_CACHE) {
-		if (!active_cache)
+		if (read_cache_needed())
 			/* read-cache does not die even when it fails
 			 * so it is safe for us to do this here.  Also
 			 * it does not smudge active_cache or active_nr
 			 * when it fails, so we do not have to worry about
-			 * cleaning it up oufselves either.
+			 * cleaning it up ourselves either.
 			 */
 			read_cache();
 	}

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 04/22] teach diff-index.c about cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (2 preceding siblings ...)
  2005-09-12 14:55 ` [PATCH 03/22] teach diff.c about cache iterators Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 05/22] teach diff-files.c " Chuck Lever
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 diff-index.c |  126 +++++++++++++++++++++++++++++-----------------------------
 1 files changed, 62 insertions(+), 64 deletions(-)

diff --git a/diff-index.c b/diff-index.c
--- a/diff-index.c
+++ b/diff-index.c
@@ -14,9 +14,10 @@ static int pickaxe_opts = 0;
 static int diff_break_opt = -1;
 static const char *orderfile = NULL;
 static const char *diff_filter = NULL;
+static const char **pathspec = NULL;
 
 /* A file entry went away or appeared */
-static void show_file(const char *prefix, struct cache_entry *ce, unsigned char *sha1, unsigned int mode)
+static inline void show_file(const char *prefix, struct cache_entry *ce, unsigned char *sha1, unsigned int mode)
 {
 	diff_addremove(prefix[0], ntohl(mode), sha1, ce->name, NULL);
 }
@@ -88,62 +89,63 @@ static int show_modified(struct cache_en
 	return 0;
 }
 
-static int diff_cache(struct cache_entry **ac, int entries, const char **pathspec)
+static int diff_one(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	while (entries) {
-		struct cache_entry *ce = *ac;
-		int same = (entries > 1) && ce_same_name(ce, ac[1]);
-
-		if (!ce_path_match(ce, pathspec))
-			goto skip_entry;
-
-		switch (ce_stage(ce)) {
-		case 0:
-			/* No stage 1 entry? That means it's a new file */
-			if (!same) {
-				show_new_file(ce);
-				break;
-			}
-			/* Show difference between old and new */
-			show_modified(ac[1], ce, 1);
+	struct cache_entry *next;
+	int same;
+
+	if (!ce_path_match(ce, pathspec))
+		goto skip_entry;
+
+	next_cc(cc);
+	next = cc_to_ce(cc);
+	/* check eof here to skip the last entry in the cache */
+	same = (!cache_eof(cc) && ce_same_name(ce, next));
+	prev_cc(cc);
+
+	switch (ce_stage(ce)) {
+	case 0:
+		/* No stage 1 entry? That means it's a new file */
+		if (!same) {
+			show_new_file(ce);
 			break;
-		case 1:
-			/* No stage 3 (merge) entry? That means it's been deleted */
-			if (!same) {
-				show_file("-", ce, ce->sha1, ce->ce_mode);
-				break;
-			}
-			/* We come here with ce pointing at stage 1
-			 * (original tree) and ac[1] pointing at stage
-			 * 3 (unmerged).  show-modified with
-			 * report-mising set to false does not say the
-			 * file is deleted but reports true if work
-			 * tree does not have it, in which case we
-			 * fall through to report the unmerged state.
-			 * Otherwise, we show the differences between
-			 * the original tree and the work tree.
-			 */
-			if (!cached_only && !show_modified(ce, ac[1], 0))
-				break;
-			/* fallthru */
-		case 3:
-			diff_unmerge(ce->name);
+		}
+		/* Show difference between old and new */
+		show_modified(next, ce, 1);
+		break;
+	case 1:
+		/* No stage 3 (merge) entry? That means it's been deleted */
+		if (!same) {
+			show_file("-", ce, ce->sha1, ce->ce_mode);
 			break;
-
-		default:
-			die("impossible cache entry stage");
 		}
-
-skip_entry:
-		/*
-		 * Ignore all the different stages for this file,
-		 * we've handled the relevant cases now.
+		/* We come here with ce pointing at stage 1
+		 * (original tree) and next pointing at stage
+		 * 3 (unmerged).  show-modified with
+		 * report-mising set to false does not say the
+		 * file is deleted but reports true if work
+		 * tree does not have it, in which case we
+		 * fall through to report the unmerged state.
+		 * Otherwise, we show the differences between
+		 * the original tree and the work tree.
 		 */
-		do {
-			ac++;
-			entries--;
-		} while (entries && ce_same_name(ce, ac[0]));
+		if (!cached_only && !show_modified(ce, next, 0))
+			break;
+		/* fallthru */
+	case 3:
+		diff_unmerge(ce->name);
+		break;
+	default:
+		die("impossible cache entry stage");
+		break;
 	}
+
+skip_entry:
+	/*
+	 * Ignore all the different stages for this file,
+	 * we've handled the relevant cases now.
+	 */
+	next_name(cc, ce);
 	return 0;
 }
 
@@ -152,15 +154,12 @@ skip_entry:
  * when we read in the new tree (into "stage 1"), we won't lose sight
  * of the fact that we had unmerged entries.
  */
-static void mark_merge_entries(void)
+static int mark_one_entry(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i;
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (!ce_stage(ce))
-			continue;
+	if (ce_stage(ce))
 		ce->ce_flags |= htons(CE_STAGEMASK);
-	}
+	next_cc(cc);
+	return 0;
 }
 
 static const char diff_cache_usage[] =
@@ -173,10 +172,8 @@ int main(int argc, char **argv)
 	const char *tree_name = NULL;
 	unsigned char sha1[20];
 	const char *prefix = setup_git_directory();
-	const char **pathspec = NULL;
 	void *tree;
 	unsigned long size;
-	int ret;
 	int allow_options = 1;
 	int i;
 
@@ -271,12 +268,13 @@ int main(int argc, char **argv)
 	if (!tree_name || get_sha1(tree_name, sha1))
 		usage(diff_cache_usage);
 
-	read_cache();
+	if (read_cache() < 0)
+		die("unable to read index file");
 
 	/* The rest is for paths restriction. */
 	diff_setup(diff_setup_opt);
 
-	mark_merge_entries();
+	walk_cache(mark_one_entry);
 
 	tree = read_object_with_reference(sha1, "tree", &size, NULL);
 	if (!tree)
@@ -284,7 +282,7 @@ int main(int argc, char **argv)
 	if (read_tree(tree, size, 1, pathspec))
 		die("unable to read tree object %s", tree_name);
 
-	ret = diff_cache(active_cache, active_nr, pathspec);
+	walk_cache(diff_one);
 
 	diffcore_std(pathspec,
 		     detect_rename, diff_score_opt,
@@ -292,5 +290,5 @@ int main(int argc, char **argv)
 		     diff_break_opt,
 		     orderfile, diff_filter);
 	diff_flush(diff_output_format, diff_line_termination);
-	return ret;
+	return 0;
 }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 05/22] teach diff-files.c about cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (3 preceding siblings ...)
  2005-09-12 14:55 ` [PATCH 04/22] teach diff-index.c " Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 06/22] teach diff-stages.c " Chuck Lever
                   ` (18 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 diff-files.c |   79 ++++++++++++++++++++++++++++++----------------------------
 1 files changed, 41 insertions(+), 38 deletions(-)

diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -11,6 +11,7 @@ static const char diff_files_usage[] =
 "[<common diff options>] [<path>...]"
 COMMON_DIFF_OPTIONS_HELP;
 
+static const unsigned char null_sha1[20] = { 0, };
 static int diff_output_format = DIFF_FORMAT_RAW;
 static int diff_line_termination = '\n';
 static int detect_rename = 0;
@@ -22,6 +23,7 @@ static int pickaxe_opts = 0;
 static int diff_break_opt = -1;
 static const char *orderfile = NULL;
 static const char *diff_filter = NULL;
+static const char **pathspec;
 static int silent = 0;
 
 static void show_unmerge(const char *path)
@@ -41,12 +43,47 @@ static void show_modified(int oldmode, i
 	diff_change(oldmode, mode, old_sha1, sha1, path, NULL);
 }
 
+static int diff_one_file(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	struct stat st;
+	unsigned int oldmode;
+	int changed;
+
+	if (!ce_path_match(ce, pathspec))
+		goto out;
+
+	if (ce_stage(ce)) {
+		show_unmerge(ce->name);
+		next_name(cc, ce);
+		return 0;
+	}
+
+	if (lstat(ce->name, &st) < 0) {
+		if (errno != ENOENT && errno != ENOTDIR) {
+			perror(ce->name);
+			goto out;
+		}
+		if (silent)
+			goto out;
+		show_file('-', ce);
+		goto out;
+	}
+	changed = ce_match_stat(ce, &st);
+	if (!changed && !find_copies_harder)
+		goto out;
+	oldmode = ntohl(ce->ce_mode);
+	show_modified(oldmode, DIFF_FILE_CANON_MODE(st.st_mode),
+		      ce->sha1, (changed ? null_sha1 : ce->sha1),
+		      ce->name);
+out:
+	next_cc(cc);
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
-	static const unsigned char null_sha1[20] = { 0, };
-	const char **pathspec;
 	const char *prefix = setup_git_directory();
-	int entries, i;
+	int entries;
 
 	while (1 < argc && argv[1][0] == '-') {
 		if (!strcmp(argv[1], "-p") || !strcmp(argv[1], "-u"))
@@ -112,42 +149,8 @@ int main(int argc, char **argv)
 
 	diff_setup(diff_setup_opt);
 
-	for (i = 0; i < entries; i++) {
-		struct stat st;
-		unsigned int oldmode;
-		struct cache_entry *ce = active_cache[i];
-		int changed;
-
-		if (!ce_path_match(ce, pathspec))
-			continue;
-
-		if (ce_stage(ce)) {
-			show_unmerge(ce->name);
-			while (i < entries &&
-			       !strcmp(ce->name, active_cache[i]->name))
-				i++;
-			i--; /* compensate for loop control increments */
-			continue;
-		}
+	walk_cache(diff_one_file);
 
-		if (lstat(ce->name, &st) < 0) {
-			if (errno != ENOENT && errno != ENOTDIR) {
-				perror(ce->name);
-				continue;
-			}
-			if (silent)
-				continue;
-			show_file('-', ce);
-			continue;
-		}
-		changed = ce_match_stat(ce, &st);
-		if (!changed && !find_copies_harder)
-			continue;
-		oldmode = ntohl(ce->ce_mode);
-		show_modified(oldmode, DIFF_FILE_CANON_MODE(st.st_mode),
-			      ce->sha1, (changed ? null_sha1 : ce->sha1),
-			      ce->name);
-	}
 	diffcore_std(pathspec, 
 		     detect_rename, diff_score_opt,
 		     pickaxe, pickaxe_opts,

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 06/22] teach diff-stages.c about cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (4 preceding siblings ...)
  2005-09-12 14:55 ` [PATCH 05/22] teach diff-files.c " Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:55 ` [PATCH 07/22] teach fsck-objects.c to use " Chuck Lever
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 diff-stages.c |   73 +++++++++++++++++++++++++++++----------------------------
 1 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/diff-stages.c b/diff-stages.c
--- a/diff-stages.c
+++ b/diff-stages.c
@@ -17,54 +17,55 @@ static int diff_break_opt = -1;
 static const char *orderfile = NULL;
 static const char *diff_filter = NULL;
 
+static int stage1, stage2;
+
 static const char diff_stages_usage[] =
 "git-diff-stages [<common diff options>] <stage1> <stage2> [<path>...]"
 COMMON_DIFF_OPTIONS_HELP;
 
-static void diff_stages(int stage1, int stage2)
+static int diff_one(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i = 0;
-	while (i < active_nr) {
-		struct cache_entry *ce, *stages[4] = { NULL, };
-		struct cache_entry *one, *two;
-		const char *name;
-		int len;
-		ce = active_cache[i];
-		len = ce_namelen(ce);
-		name = ce->name;
-		for (;;) {
-			int stage = ce_stage(ce);
-			stages[stage] = ce;
-			if (active_nr <= ++i)
-				break;
-			ce = active_cache[i];
-			if (ce_namelen(ce) != len ||
-			    memcmp(name, ce->name, len))
-				break;
-		}
-		one = stages[stage1];
-		two = stages[stage2];
-		if (!one && !two)
-			continue;
-		if (!one)
-			diff_addremove('+', ntohl(two->ce_mode),
-				       two->sha1, name, NULL);
-		else if (!two)
-			diff_addremove('-', ntohl(one->ce_mode),
-				       one->sha1, name, NULL);
+	struct cache_entry *one, *two, *stages[4] = { NULL, };
+	const char *name = ce->name;
+	int len = ce_namelen(ce);
+
+	for (;;) {
+		int stage = ce_stage(ce);
+		stages[stage] = ce;
+		next_cc(cc);
+		if (cache_eof(cc))
+			break;
+		ce = cc_to_ce(cc);
+		if (ce_namelen(ce) != len ||
+			memcmp(name, ce->name, len))
+			break;
+	}
+
+	one = stages[stage1];
+	two = stages[stage2];
+	if (!one && !two)
+		goto out;
+	if (!one)
+		diff_addremove('+', ntohl(two->ce_mode),
+					two->sha1, name, NULL);
+	else if (!two)
+		diff_addremove('-', ntohl(one->ce_mode),
+					one->sha1, name, NULL);
 		else if (memcmp(one->sha1, two->sha1, 20) ||
 			 (one->ce_mode != two->ce_mode) ||
-			 find_copies_harder)
+			find_copies_harder)
 			diff_change(ntohl(one->ce_mode), ntohl(two->ce_mode),
-				    one->sha1, two->sha1, name, NULL);
-	}
+					one->sha1, two->sha1, name, NULL);
+out:
+	next_cc(cc);
+	return 0;
 }
 
 int main(int ac, const char **av)
 {
-	int stage1, stage2;
+	if (read_cache() < 0)
+		die("unable to read index file");
 
-	read_cache();
 	while (1 < ac && av[1][0] == '-') {
 		const char *arg = av[1];
 		if (!strcmp(arg, "-r"))
@@ -117,7 +118,7 @@ int main(int ac, const char **av)
 	av += 3; /* The rest from av[0] are for paths restriction. */
 	diff_setup(diff_setup_opt);
 
-	diff_stages(stage1, stage2);
+	walk_cache(diff_one);
 
 	diffcore_std(av,
 		     detect_rename, diff_score_opt,

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 07/22] teach fsck-objects.c to use cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (5 preceding siblings ...)
  2005-09-12 14:55 ` [PATCH 06/22] teach diff-stages.c " Chuck Lever
@ 2005-09-12 14:55 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 08/22] teach ls-files.c " Chuck Lever
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:55 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 fsck-objects.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/fsck-objects.c b/fsck-objects.c
--- a/fsck-objects.c
+++ b/fsck-objects.c
@@ -412,6 +412,20 @@ static int fsck_head_link(void)
 	return 0;
 }
 
+static int mark_one_blob_reachable(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	struct blob *blob = lookup_blob(ce->sha1);
+
+	if (blob) {
+		struct object *obj = &blob->object;
+		obj->used = 1;
+		mark_reachable(obj, REACHABLE);
+	}
+
+	next_cc(cc);
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
 	int i, heads;
@@ -519,17 +533,8 @@ int main(int argc, char **argv)
 	}
 
 	if (keep_cache_objects) {
-		int i;
-		read_cache();
-		for (i = 0; i < active_nr; i++) {
-			struct blob *blob = lookup_blob(active_cache[i]->sha1);
-			struct object *obj;
-			if (!blob)
-				continue;
-			obj = &blob->object;
-			obj->used = 1;
-			mark_reachable(obj, REACHABLE);
-		}
+		if (read_cache() > 0)
+			walk_cache(mark_one_blob_reachable);
 	}
 
 	check_connectivity();

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 08/22] teach ls-files.c to use cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (6 preceding siblings ...)
  2005-09-12 14:55 ` [PATCH 07/22] teach fsck-objects.c to use " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 09/22] teach read-tree.c " Chuck Lever
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 ls-files.c |   58 +++++++++++++++++++++++++++++++++-------------------------
 1 files changed, 33 insertions(+), 25 deletions(-)

diff --git a/ls-files.c b/ls-files.c
--- a/ls-files.c
+++ b/ls-files.c
@@ -414,14 +414,38 @@ static void show_ce_entry(const char *ta
 		       ce->name + offset, line_terminator); 
 }
 
-static void show_files(void)
+static int show_one_cached(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i;
+	if (excluded(ce->name) != show_ignored)
+		goto out;
+	if (show_unmerged && !ce_stage(ce))
+		goto out;
+	show_ce_entry(ce_stage(ce) ? tag_unmerged : tag_cached, ce);
+out:
+	next_cc(cc);
+	return 0;
+}
 
+static int show_one_deleted(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	struct stat st;
+
+	if (excluded(ce->name) != show_ignored)
+		goto out;
+	if (!lstat(ce->name, &st))
+		goto out;
+	show_ce_entry(tag_removed, ce);
+out:
+	next_cc(cc);
+	return 0;
+}
+
+static void show_files(void)
+{
 	/* For cached/deleted files we don't need to even do the readdir */
 	if (show_others || show_killed) {
 		const char *path = ".", *base = "";
-		int baselen = prefix_len;
+		int i, baselen = prefix_len;
 
 		if (baselen)
 			path = base = prefix;
@@ -433,27 +457,10 @@ static void show_files(void)
 		if (show_killed)
 			show_killed_files();
 	}
-	if (show_cached | show_stage) {
-		for (i = 0; i < active_nr; i++) {
-			struct cache_entry *ce = active_cache[i];
-			if (excluded(ce->name) != show_ignored)
-				continue;
-			if (show_unmerged && !ce_stage(ce))
-				continue;
-			show_ce_entry(ce_stage(ce) ? tag_unmerged : tag_cached, ce);
-		}
-	}
-	if (show_deleted) {
-		for (i = 0; i < active_nr; i++) {
-			struct cache_entry *ce = active_cache[i];
-			struct stat st;
-			if (excluded(ce->name) != show_ignored)
-				continue;
-			if (!lstat(ce->name, &st))
-				continue;
-			show_ce_entry(tag_removed, ce);
-		}
-	}
+	if (show_cached | show_stage)
+		walk_cache(show_one_cached);
+	if (show_deleted)
+		walk_cache(show_one_deleted);
 }
 
 /*
@@ -633,7 +640,8 @@ int main(int argc, char **argv)
 	if (!(show_stage | show_deleted | show_others | show_unmerged | show_killed))
 		show_cached = 1;
 
-	read_cache();
+	if (read_cache() < 0)
+		die("unable to read index file");
 	if (prefix)
 		prune_cache();
 	show_files();

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 09/22] teach read-tree.c to use cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (7 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 08/22] teach ls-files.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 10/22] teach update-index.c about cache cursors Chuck Lever
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Merging will wait for another patch.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 read-tree.c |   66 +++++++++++++++++++++++++++++++++--------------------------
 1 files changed, 37 insertions(+), 29 deletions(-)

diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -233,7 +233,7 @@ static void reject_merge(struct cache_en
 	    ce->name);
 }
 
-static void check_updates(struct cache_entry **src, int nr)
+static int check_one_out(struct cache_cursor *cc, struct cache_entry *ce)
 {
 	static struct checkout state = {
 		.base_dir = "",
@@ -242,19 +242,21 @@ static void check_updates(struct cache_e
 		.refresh_cache = 1,
 	};
 	unsigned short mask = htons(CE_UPDATE);
-	while (nr--) {
-		struct cache_entry *ce = *src++;
-		if (!ce->ce_mode) {
-			if (update)
-				unlink(ce->name);
-			continue;
-		}
-		if (ce->ce_flags & mask) {
-			ce->ce_flags &= ~mask;
-			if (update)
-				checkout_entry(ce, &state);
-		}
+
+	if (!ce->ce_mode) {
+		if (update)
+			unlink(ce->name);
+		goto out;
 	}
+
+	if (ce->ce_flags & mask) {
+		ce->ce_flags &= ~mask;
+		if (update)
+			checkout_entry(ce, &state);
+	}
+out:
+	next_cc(cc);
+	return 0;
 }
 
 static int unpack_trees(merge_fn_t fn)
@@ -273,7 +275,7 @@ static int unpack_trees(merge_fn_t fn)
 	if (unpack_trees_rec(posns, len, "", fn, &indpos))
 		return -1;
 
-	check_updates(active_cache, active_nr);
+	walk_cache(check_one_out);
 	return 0;
 }
 
@@ -553,24 +555,30 @@ static int oneway_merge(struct cache_ent
 	return merged_entry(a, NULL);
 }
 
-static int read_cache_unmerged(void)
+static int deleted = 0;
+static struct cache_cursor dst;
+
+static int read_one_unmerged(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i, deleted;
-	struct cache_entry **dst;
+	if (ce_stage(ce)) {
+		deleted++;
+		goto out;
+	}
+	if (deleted)
+		set_ce_at_cursor(&dst, ce);
+	next_cc(&dst);
+out:
+	next_cc(cc);
+	return 0;
+}
 
-	read_cache();
-	dst = active_cache;
-	deleted = 0;
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (ce_stage(ce)) {
-			deleted++;
-			continue;
-		}
-		if (deleted)
-			*dst = ce;
-		dst++;
+static int read_cache_unmerged(void)
+{
+	if (read_cache() > 0) {
+		init_cc(&dst);
+		walk_cache(read_one_unmerged);
 	}
+
 	active_nr -= deleted;
 	return deleted;
 }

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 10/22] teach update-index.c about cache cursors
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (8 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 09/22] teach read-tree.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 11/22] teach write-tree.c to use cache iterators Chuck Lever
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 update-index.c |   62 +++++++++++++++++++++++++-------------------------------
 1 files changed, 28 insertions(+), 34 deletions(-)

diff --git a/update-index.c b/update-index.c
--- a/update-index.c
+++ b/update-index.c
@@ -13,7 +13,7 @@
  * files be revision controlled.
  */
 static int allow_add = 0, allow_remove = 0, allow_replace = 0, not_new = 0, quiet = 0, info_only = 0;
-static int force_remove;
+static int force_remove, has_errors = 0;
 
 /* Three functions to allow overloaded pointer return; see linux/err.h */
 static inline void *ERR_PTR(long error)
@@ -190,41 +190,35 @@ static struct cache_entry *refresh_entry
 	return updated;
 }
 
-static int refresh_cache(void)
+static int refresh_one(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i;
-	int has_errors = 0;
+	struct cache_entry *new;
 
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce, *new;
-		ce = active_cache[i];
-		if (ce_stage(ce)) {
-			printf("%s: needs merge\n", ce->name);
-			has_errors = 1;
-			while ((i < active_nr) &&
-			       ! strcmp(active_cache[i]->name, ce->name))
-				i++;
-			i--;
-			continue;
-		}
+	if (ce_stage(ce)) {
+		printf("%s: needs merge\n", ce->name);
+		has_errors = 1;
+		next_name(cc, ce);
+		return 0;
+	}
 
-		new = refresh_entry(ce);
-		if (IS_ERR(new)) {
-			if (not_new && PTR_ERR(new) == -ENOENT)
-				continue;
-			if (quiet)
-				continue;
-			printf("%s: needs update\n", ce->name);
-			has_errors = 1;
-			continue;
-		}
-		active_cache_changed = 1;
-		/* You can NOT just free active_cache[i] here, since it
-		 * might not be necessarily malloc()ed but can also come
-		 * from mmap(). */
-		active_cache[i] = new;
+	new = refresh_entry(ce);
+	if (IS_ERR(new)) {
+		if (not_new && PTR_ERR(new) == -ENOENT)
+			return 0;
+		if (quiet)
+			return 0;
+		printf("%s: needs update\n", ce->name);
+		has_errors = 1;
+		next_cc(cc);
+		return 0;
 	}
-	return has_errors;
+
+	/* You can NOT just free active_cache[i] here, since it
+	 * might not be necessarily malloc()ed but can also come
+	 * from mmap(). */
+	set_ce_at_cursor(cc, new);
+	next_cc(cc);
+	return 0;
 }
 
 /*
@@ -323,7 +317,7 @@ static struct cache_file cache_file;
 
 int main(int argc, char **argv)
 {
-	int i, newfd, entries, has_errors = 0;
+	int i, newfd, entries;
 	int allow_options = 1;
 	const char *prefix = setup_git_directory();
 
@@ -360,7 +354,7 @@ int main(int argc, char **argv)
 				continue;
 			}
 			if (!strcmp(path, "--refresh")) {
-				has_errors |= refresh_cache();
+				walk_cache(refresh_one);
 				continue;
 			}
 			if (!strcmp(path, "--cacheinfo")) {

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 11/22] teach write-tree.c to use cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (9 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 10/22] teach update-index.c about cache cursors Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 12/22] simplify write_cache() calling sequence Chuck Lever
                   ` (12 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 write-tree.c |  110 ++++++++++++++++++++++++++++++++++++----------------------
 1 files changed, 69 insertions(+), 41 deletions(-)

diff --git a/write-tree.c b/write-tree.c
--- a/write-tree.c
+++ b/write-tree.c
@@ -5,7 +5,7 @@
  */
 #include "cache.h"
 
-static int missing_ok = 0;
+static int funny, missing_ok = 0;
 
 static int check_valid_sha1(unsigned char *sha1)
 {
@@ -18,8 +18,9 @@ static int check_valid_sha1(unsigned cha
 	return ret ? 0 : -1;
 }
 
-static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+static int write_tree(struct cache_cursor *cc, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
 {
+	struct cache_cursor next = *cc;
 	unsigned char subdir_sha1[20];
 	unsigned long size, offset;
 	char *buffer;
@@ -32,7 +33,7 @@ static int write_tree(struct cache_entry
 
 	nr = 0;
 	while (nr < maxentries) {
-		struct cache_entry *ce = cachep[nr];
+		struct cache_entry *ce = cc_to_ce(&next);
 		const char *pathname = ce->name, *filename, *dirname;
 		int pathlen = ce_namelen(ce), entrylen;
 		unsigned char *sha1;
@@ -51,15 +52,20 @@ static int write_tree(struct cache_entry
 		if (dirname) {
 			int subdir_written;
 
-			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
-			nr += subdir_written;
-
+			subdir_written = write_tree(&next,
+						    maxentries - nr,
+						    pathname,
+						    dirname - pathname + 1,
+						    subdir_sha1);
+			
 			/* Now we need to write out the directory entry into this tree.. */
 			mode = S_IFDIR;
 			pathlen = dirname - pathname;
 
 			/* ..but the directory entry doesn't count towards the total count */
-			nr--;
+			nr += subdir_written - 1;
+			while (--subdir_written)
+				next_cc(&next);
 			sha1 = subdir_sha1;
 		}
 
@@ -76,6 +82,7 @@ static int write_tree(struct cache_entry
 		memcpy(buffer + offset, sha1, 20);
 		offset += 20;
 		nr++;
+		next_cc(&next);
 	}
 
 	write_sha1_file(buffer, offset, "tree", returnsha1);
@@ -83,11 +90,57 @@ static int write_tree(struct cache_entry
 	return nr;
 }
 
+static int verify_merged(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	if (ntohs(ce->ce_flags) & ~CE_NAMEMASK) {
+		if (10 < ++funny) {
+			fprintf(stderr, "...\n");
+			return -1;
+		}
+		fprintf(stderr, "%s: unmerged (%s)\n", ce->name, sha1_to_hex(ce->sha1));
+	}
+
+	next_cc(cc);
+	return 0;
+}
+
+/*
+ * path/file always comes after path because of the way
+ * the cache is sorted.  Also path can appear only once,
+ * which means conflicting one would immediately follow.
+ */
+static int verify_path(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	struct cache_entry *next;
+	const char *next_name, *this_name = ce->name;
+	int this_len = strlen(this_name);
+
+	/* don't check the last cache entry */
+	next_cc(cc);
+	if (cache_eof(cc))
+		return -1;
+	next = cc_to_ce(cc);
+	next_name = next->name;
+
+	if (this_len < strlen(next_name) &&
+	    strncmp(this_name, next_name, this_len) == 0 &&
+	    next_name[this_len] == '/') {
+		if (10 < ++funny) {
+			fprintf(stderr, "...\n");
+			return -1;
+		}
+		fprintf(stderr, "You have both %s and %s\n",
+			this_name, next_name);
+	}
+
+	return 0;
+}
+
 int main(int argc, char **argv)
 {
-	int i, funny;
-	int entries = read_cache();
+	struct cache_cursor cc;
 	unsigned char sha1[20];
+	int entries;
 	
 	if (argc == 2) {
 		if (!strcmp(argv[1], "--missing-ok"))
@@ -99,53 +152,28 @@ int main(int argc, char **argv)
 	if (argc > 2)
 		die("too many options");
 
+	entries = read_cache();
 	if (entries < 0)
 		die("git-write-tree: error reading cache");
 
 	/* Verify that the tree is merged */
 	funny = 0;
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (ntohs(ce->ce_flags) & ~CE_NAMEMASK) {
-			if (10 < ++funny) {
-				fprintf(stderr, "...\n");
-				break;
-			}
-			fprintf(stderr, "%s: unmerged (%s)\n", ce->name, sha1_to_hex(ce->sha1));
-		}
-	}
+	walk_cache(verify_merged);
 	if (funny)
-		die("git-write-tree: not able to write tree");
+		die("git-write-tree: verify_merged: not able to write tree");
 
 	/* Also verify that the cache does not have path and path/file
 	 * at the same time.  At this point we know the cache has only
 	 * stage 0 entries.
 	 */
 	funny = 0;
-	for (i = 0; i < entries - 1; i++) {
-		/* path/file always comes after path because of the way
-		 * the cache is sorted.  Also path can appear only once,
-		 * which means conflicting one would immediately follow.
-		 */
-		const char *this_name = active_cache[i]->name;
-		const char *next_name = active_cache[i+1]->name;
-		int this_len = strlen(this_name);
-		if (this_len < strlen(next_name) &&
-		    strncmp(this_name, next_name, this_len) == 0 &&
-		    next_name[this_len] == '/') {
-			if (10 < ++funny) {
-				fprintf(stderr, "...\n");
-				break;
-			}
-			fprintf(stderr, "You have both %s and %s\n",
-				this_name, next_name);
-		}
-	}
+	walk_cache(verify_path);
 	if (funny)
-		die("git-write-tree: not able to write tree");
+		die("git-write-tree: verify_path: not able to write tree");
 
 	/* Ok, write it out */
-	if (write_tree(active_cache, entries, "", 0, sha1) != entries)
+	init_cc(&cc);
+	if (write_tree(&cc, entries, "", 0, sha1) != entries)
 		die("git-write-tree: internal error");
 	printf("%s\n", sha1_to_hex(sha1));
 	return 0;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 12/22] simplify write_cache() calling sequence
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (10 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 11/22] teach write-tree.c to use cache iterators Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 13/22] move purge_cache() to read-cache.c Chuck Lever
                   ` (11 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Clean-up:  Hide some external references to "active_cache" and "active_nr"
by simplifying the calling sequence of write_cache().

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 apply.c          |    3 +--
 cache.h          |    2 +-
 checkout-index.c |    3 +--
 read-cache.c     |   41 +++++++++++++++++++++++++++--------------
 read-tree.c      |    3 +--
 update-index.c   |    3 +--
 6 files changed, 32 insertions(+), 23 deletions(-)

diff --git a/apply.c b/apply.c
--- a/apply.c
+++ b/apply.c
@@ -1440,8 +1440,7 @@ static int apply_patch(int fd)
 		write_out_results(list, skipped_patch);
 
 	if (write_index) {
-		if (write_cache(newfd, active_cache, active_nr) ||
-		    commit_index_file(&cache_file))
+		if (write_cache(newfd) || commit_index_file(&cache_file))
 			die("Unable to write new cachefile");
 	}
 
diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -157,7 +157,7 @@ extern char *prefix_path(const char *pre
 
 /* Initialize and use the cache information */
 extern int read_cache(void);
-extern int write_cache(int newfd, struct cache_entry **cache, int entries);
+extern int write_cache(int newfd);
 extern int cache_name_pos(const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
diff --git a/checkout-index.c b/checkout-index.c
--- a/checkout-index.c
+++ b/checkout-index.c
@@ -138,8 +138,7 @@ int main(int argc, char **argv)
 	}
 
 	if (0 <= newfd &&
-	    (write_cache(newfd, active_cache, active_nr) ||
-	     commit_index_file(&cache_file)))
+	    (write_cache(newfd) || commit_index_file(&cache_file)))
 		die("Unable to write new cachefile");
 	return 0;
 }
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -470,30 +470,43 @@ static int ce_flush(SHA_CTX *context, in
 	return 0;
 }
 
-int write_cache(int newfd, struct cache_entry **cache, int entries)
+static int fd, removed = 0;
+static SHA_CTX c;
+
+static int count_removed(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	if (ce->ce_mode == 0)
+		removed++;
+	next_cc(cc);
+	return 0;
+}
+
+static int write_one_cache_entry(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	if (ce->ce_mode != 0)
+		if (ce_write(&c, fd, ce, ce_size(ce)) < 0)
+			return -1;
+	next_cc(cc);
+	return 0;
+}
+
+int write_cache(int newfd)
 {
-	SHA_CTX c;
 	struct cache_header hdr;
-	int i, removed;
 
-	for (i = removed = 0; i < entries; i++)
-		if (!cache[i]->ce_mode)
-			removed++;
+	walk_cache(count_removed);
 
 	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
 	hdr.hdr_version = htonl(2);
-	hdr.hdr_entries = htonl(entries - removed);
+	hdr.hdr_entries = htonl(active_nr - removed);
 
 	SHA1_Init(&c);
 	if (ce_write(&c, newfd, &hdr, sizeof(hdr)) < 0)
 		return -1;
 
-	for (i = 0; i < entries; i++) {
-		struct cache_entry *ce = cache[i];
-		if (!ce->ce_mode)
-			continue;
-		if (ce_write(&c, newfd, ce, ce_size(ce)) < 0)
-			return -1;
-	}
+	fd = newfd;
+	if (walk_cache(write_one_cache_entry))
+		return -1;
+
 	return ce_flush(&c, newfd);
 }
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -670,8 +670,7 @@ int main(int argc, char **argv)
 	}
 
 	unpack_trees(fn);
-	if (write_cache(newfd, active_cache, active_nr) ||
-	    commit_index_file(&cache_file))
+	if (write_cache(newfd) || commit_index_file(&cache_file))
 		die("unable to write new index file");
 	return 0;
 }
diff --git a/update-index.c b/update-index.c
--- a/update-index.c
+++ b/update-index.c
@@ -393,8 +393,7 @@ int main(int argc, char **argv)
 		if (add_file_to_cache(path))
 			die("Unable to add %s to database; maybe you want to use --add option?", path);
 	}
-	if (write_cache(newfd, active_cache, active_nr) ||
-	    commit_index_file(&cache_file))
+	if (write_cache(newfd) || commit_index_file(&cache_file))
 		die("Unable to write new cachefile");
 
 	return has_errors ? 1 : 0;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 13/22] move purge_cache() to read-cache.c
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (11 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 12/22] simplify write_cache() calling sequence Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 14/22] move read_cache_unmerged into read-cache.c Chuck Lever
                   ` (10 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Functions that manipulate active_cache and active_nr should be in one place.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 cache.h      |    1 +
 ls-files.c   |   28 +---------------------------
 read-cache.c |   26 ++++++++++++++++++++++++++
 3 files changed, 28 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -165,6 +165,7 @@ extern int cache_name_pos(const char *na
 extern int add_cache_entry(struct cache_entry *ce, int option);
 extern int remove_cache_entry_at(int pos);
 extern int remove_file_from_cache(char *path);
+extern void prune_cache(const char *prefix, int prefix_len);
 extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
 extern int ce_match_stat(struct cache_entry *ce, struct stat *st);
 extern int ce_path_match(const struct cache_entry *ce, const char **pathspec);
diff --git a/ls-files.c b/ls-files.c
--- a/ls-files.c
+++ b/ls-files.c
@@ -463,32 +463,6 @@ static void show_files(void)
 		walk_cache(show_one_deleted);
 }
 
-/*
- * Prune the index to only contain stuff starting with "prefix"
- */
-static void prune_cache(void)
-{
-	int pos = cache_name_pos(prefix, prefix_len);
-	unsigned int first, last;
-
-	if (pos < 0)
-		pos = -pos-1;
-	active_cache += pos;
-	active_nr -= pos;
-	first = 0;
-	last = active_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
-		struct cache_entry *ce = active_cache[next];
-		if (!strncmp(ce->name, prefix, prefix_len)) {
-			first = next+1;
-			continue;
-		}
-		last = next;
-	}
-	active_nr = last;
-}
-
 static void verify_pathspec(void)
 {
 	const char **p, *n, *prev;
@@ -643,7 +617,7 @@ int main(int argc, char **argv)
 	if (read_cache() < 0)
 		die("unable to read index file");
 	if (prefix)
-		prune_cache();
+		prune_cache(prefix, prefix_len);
 	show_files();
 	return 0;
 }
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -165,6 +165,32 @@ int remove_file_from_cache(char *path)
 	return 0;
 }
 
+/*
+ * Prune the index to only contain stuff starting with "prefix"
+ */
+void prune_cache(const char *prefix, int prefix_len)
+{
+	int pos = cache_name_pos(prefix, prefix_len);
+	unsigned int first, last;
+
+	if (pos < 0)
+		pos = -pos-1;
+	active_cache += pos;
+	active_nr -= pos;
+	first = 0;
+	last = active_nr;
+	while (last > first) {
+		int next = (last + first) >> 1;
+		struct cache_entry *ce = active_cache[next];
+		if (!strncmp(ce->name, prefix, prefix_len)) {
+			first = next+1;
+			continue;
+		}
+		last = next;
+	}
+	active_nr = last;
+}
+
 int ce_same_name(struct cache_entry *a, struct cache_entry *b)
 {
 	int len = ce_namelen(a);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 14/22] move read_cache_unmerged into read-cache.c
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (12 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 13/22] move purge_cache() to read-cache.c Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 15/22] replace cache_name_pos Chuck Lever
                   ` (9 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Clean-up: put read_cache_unmerged() right next to read_cache().
read_cache_unmerged() will likely need to be reconstructed if
the active cache data type changes.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 cache.h      |    1 +
 read-cache.c |   31 +++++++++++++++++++++++++++++++
 read-tree.c  |   28 ----------------------------
 3 files changed, 32 insertions(+), 28 deletions(-)

diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -157,6 +157,7 @@ extern char *prefix_path(const char *pre
 
 /* Initialize and use the cache information */
 extern int read_cache(void);
+extern int read_cache_unmerged(void);
 extern int write_cache(int newfd);
 extern int cache_name_pos(const char *name, int namelen);
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -453,6 +453,37 @@ unmap:
 	return error("verify header failed");
 }
 
+static int deleted = 0;
+static struct cache_cursor dst;
+
+static int read_one_unmerged(struct cache_cursor *cc, struct cache_entry *ce)
+{
+	if (ce_stage(ce)) {
+		deleted++;
+		goto out;
+	}
+	if (deleted)
+		set_ce_at_cursor(&dst, ce);
+	next_cc(&dst);
+out:
+	next_cc(cc);
+	return 0;
+}
+
+/*
+ * Read in the cache, then throw away the unmerged entries
+ */
+int read_cache_unmerged(void)
+{
+	if (read_cache() > 0) {
+		init_cc(&dst);
+		walk_cache(read_one_unmerged);
+	}
+
+	active_nr -= deleted;
+	return deleted;
+}
+
 #define WRITE_BUFFER_SIZE 8192
 static unsigned char write_buffer[WRITE_BUFFER_SIZE];
 static unsigned long write_buffer_len;
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -555,34 +555,6 @@ static int oneway_merge(struct cache_ent
 	return merged_entry(a, NULL);
 }
 
-static int deleted = 0;
-static struct cache_cursor dst;
-
-static int read_one_unmerged(struct cache_cursor *cc, struct cache_entry *ce)
-{
-	if (ce_stage(ce)) {
-		deleted++;
-		goto out;
-	}
-	if (deleted)
-		set_ce_at_cursor(&dst, ce);
-	next_cc(&dst);
-out:
-	next_cc(cc);
-	return 0;
-}
-
-static int read_cache_unmerged(void)
-{
-	if (read_cache() > 0) {
-		init_cc(&dst);
-		walk_cache(read_one_unmerged);
-	}
-
-	active_nr -= deleted;
-	return deleted;
-}
-
 static const char read_tree_usage[] = "git-read-tree (<sha> | -m [-u] <sha1> [<sha2> [<sha3>]])";
 
 static struct cache_file cache_file;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 15/22] replace cache_name_pos
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (13 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 14/22] move read_cache_unmerged into read-cache.c Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 16/22] teach apply.c to use cache_find_name() Chuck Lever
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Clean up: Introduce an interface to return a cache_cursor instead of
an integer.  Note we can also eliminate the need to overload the
return value of cache_name_pos to return a negative "pos" value to
signal an insertion point rather than a found entry.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 cache.h      |   13 +++++++++++++
 read-cache.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -160,6 +160,8 @@ extern int read_cache(void);
 extern int read_cache_unmerged(void);
 extern int write_cache(int newfd);
 extern int cache_name_pos(const char *name, int namelen);
+extern int cache_find_name(const char *name, int namelen, struct cache_cursor *cc);
+
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
@@ -350,6 +352,17 @@ static inline int walk_cache(cache_itera
 	return 0;
 }
 
+static inline int cache_find_entry(const char *name, int namelen, struct cache_entry **ce)
+{
+	struct cache_cursor cc;
+	int result;
+
+	result = cache_find_name(name, namelen, &cc);
+	if (ce)
+		*ce = active_cache[cc.pos];
+	return result;
+}
+
 struct checkout {
 	const char *base_dir;
 	int base_dir_len;
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -144,6 +144,50 @@ int cache_name_pos(const char *name, int
 	return -first-1;
 }
 
+/*
+ * Given a name, find the first cache entry that matches.  Returning 1
+ * means the cursor points to the cache entry with a matching name.
+ * Returning 0 means the name wasn't found, but the cursor points to an
+ * appropriate insertion point.
+ */
+int cache_find_name(const char *name, int namelen, struct cache_cursor *cc)
+{
+	int first, last;
+
+	/*
+	 * Look for the right name
+	 */
+	cc->pos = first = 0;
+	last = active_nr;
+	while (last > first) {
+		struct cache_entry *ce;
+		int cmp;
+		cc->pos = (last + first) >> 1;
+		ce = active_cache[cc->pos];
+		cmp = cache_name_compare(name, namelen, ce->name, ntohs(ce->ce_flags));
+		if (!cmp) {
+			/* found it */
+			return 1;
+		}
+		if (cmp < 0) {
+			/* next: search [first, cc->pos] */
+			last = cc->pos;
+			continue;
+		}
+		/* next: search [cc->pos + 1, last] */
+		first = cc->pos + 1;
+	}
+
+	/*
+	 * Name not found, so return an insertion point.
+	 *
+	 * On return, callers insert *before* the insertion point,
+	 * not after it, to maintain proper list order.
+	 */
+	cc->pos = first;
+	return 0;
+}
+
 /* Remove entry, return true if there are more entries to go.. */
 int remove_cache_entry_at(int pos)
 {

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 16/22] teach apply.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (14 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 15/22] replace cache_name_pos Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 17/22] teach checkout-index.c " Chuck Lever
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 apply.c |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/apply.c b/apply.c
--- a/apply.c
+++ b/apply.c
@@ -1021,10 +1021,10 @@ static int check_patch(struct patch *pat
 		if (lstat(old_name, &st) < 0)
 			return error("%s: %s", old_name, strerror(errno));
 		if (check_index) {
-			int pos = cache_name_pos(old_name, strlen(old_name));
-			if (pos < 0)
+			struct cache_entry *ce;
+			if (!cache_find_entry(old_name, strlen(old_name), &ce))
 				return error("%s: does not exist in index", old_name);
-			changed = ce_match_stat(active_cache[pos], &st);
+			changed = ce_match_stat(ce, &st);
 			if (changed)
 				return error("%s: does not match index", old_name);
 		}
@@ -1041,7 +1041,7 @@ static int check_patch(struct patch *pat
 	}
 
 	if (new_name && (patch->is_new | patch->is_rename | patch->is_copy)) {
-		if (check_index && cache_name_pos(new_name, strlen(new_name)) >= 0)
+		if (check_index && cache_find_entry(new_name, strlen(new_name), NULL))
 			return error("%s: already exists in index", new_name);
 		if (!lstat(new_name, &st))
 			return error("%s: already exists in working directory", new_name);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 17/22] teach checkout-index.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (15 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 16/22] teach apply.c to use cache_find_name() Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 18/22] teach diff.c " Chuck Lever
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 checkout-index.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/checkout-index.c b/checkout-index.c
--- a/checkout-index.c
+++ b/checkout-index.c
@@ -45,20 +45,19 @@ static struct checkout state = {
 
 static int checkout_file(const char *name)
 {
-	int pos = cache_name_pos(name, strlen(name));
-	if (pos < 0) {
+	struct cache_entry *ce;
+
+	if (!cache_find_entry(name, strlen(name), &ce)) {
 		if (!state.quiet) {
-			pos = -pos - 1;
 			fprintf(stderr,
 				"git-checkout-index: %s is %s.\n",
 				name,
-				(pos < active_nr &&
-				 !strcmp(active_cache[pos]->name, name)) ?
+				!strcmp(ce->name, name) ?
 				"unmerged" : "not in the cache");
 		}
 		return -1;
 	}
-	return checkout_entry(active_cache[pos], &state);
+	return checkout_entry(ce, &state);
 }
 
 static int checkout_one(struct cache_cursor *cc, struct cache_entry *ce)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 18/22] teach diff.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (16 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 17/22] teach checkout-index.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 19/22] teach ls-files.c " Chuck Lever
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 diff.c |    6 +-----
 1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -244,7 +244,6 @@ static int work_tree_matches(const char 
 {
 	struct cache_entry *ce;
 	struct stat st;
-	int pos, len;
 
 	/* We do not read the cache ourselves here, because the
 	 * benchmark with my previous version that always reads cache
@@ -261,11 +260,8 @@ static int work_tree_matches(const char 
 	if (read_cache_needed())
 		return 0;
 
-	len = strlen(name);
-	pos = cache_name_pos(name, len);
-	if (pos < 0)
+	if (!cache_find_entry(name, strlen(name), &ce))
 		return 0;
-	ce = active_cache[pos];
 	if ((lstat(name, &st) < 0) ||
 	    !S_ISREG(st.st_mode) || /* careful! */
 	    ce_match_stat(ce, &st) ||

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 19/22] teach ls-files.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (17 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 18/22] teach diff.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 20/22] teach merge-index.c " Chuck Lever
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 ls-files.c |   30 ++++++++++++++----------------
 1 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/ls-files.c b/ls-files.c
--- a/ls-files.c
+++ b/ls-files.c
@@ -216,7 +216,7 @@ static void add_name(const char *pathnam
 {
 	struct nond_on_fs *ent;
 
-	if (cache_name_pos(pathname, len) >= 0)
+	if (cache_find_entry(pathname, len, NULL))
 		return;
 
 	if (nr_dir == dir_alloc) {
@@ -349,36 +349,34 @@ static void show_killed_files(void)
 	for (i = 0; i < nr_dir; i++) {
 		struct nond_on_fs *ent = dir[i];
 		char *cp, *sp;
-		int pos, len, killed = 0;
+		int killed = 0;
 
 		for (cp = ent->name; cp - ent->name < ent->len; cp = sp + 1) {
 			sp = strchr(cp, '/');
 			if (!sp) {
+				struct cache_cursor cc;
+				struct cache_entry *ce;
 				/* If ent->name is prefix of an entry in the
 				 * cache, it will be killed.
 				 */
-				pos = cache_name_pos(ent->name, ent->len);
-				if (0 <= pos)
+				if (cache_find_name(ent->name, ent->len, &cc))
 					die("bug in show-killed-files");
-				pos = -pos - 1;
-				while (pos < active_nr &&
-				       ce_stage(active_cache[pos]))
-					pos++; /* skip unmerged */
-				if (active_nr <= pos)
+				while (!cache_eof(&cc) && ce_stage(cc_to_ce(&cc)))
+					next_cc(&cc); /* skip unmerged */
+				if (cache_eof(&cc))
 					break;
-				/* pos points at a name immediately after
+				/* cc points at a name immediately after
 				 * ent->name in the cache.  Does it expect
 				 * ent->name to be a directory?
 				 */
-				len = ce_namelen(active_cache[pos]);
-				if ((ent->len < len) &&
-				    !strncmp(active_cache[pos]->name,
-					     ent->name, ent->len) &&
-				    active_cache[pos]->name[ent->len] == '/')
+				ce = cc_to_ce(&cc);
+				if ((ent->len < ce_namelen(ce)) &&
+				    !strncmp(ce->name, ent->name, ent->len) &&
+				    ce->name[ent->len] == '/')
 					killed = 1;
 				break;
 			}
-			if (0 <= cache_name_pos(ent->name, sp - ent->name)) {
+			if (cache_find_entry(ent->name, sp - ent->name, NULL)) {
 				/* If any of the leading directories in
 				 * ent->name is registered in the cache,
 				 * ent->name will be killed.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 20/22] teach merge-index.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (18 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 19/22] teach ls-files.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 14:56 ` [PATCH 21/22] teach the merge algorithm about cache iterators Chuck Lever
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 merge-index.c |   44 +++++++++++++++++++++-----------------------
 1 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/merge-index.c b/merge-index.c
--- a/merge-index.c
+++ b/merge-index.c
@@ -37,11 +37,11 @@ static void run_program(void)
 	}
 }
 
-static int merge_entry(int pos, const char *path)
+static void merge_entry(struct cache_cursor *cc, const char *path)
 {
 	int found;
 	
-	if (pos >= active_nr)
+	if (cache_eof(cc))
 		die("git-merge-index: %s not in the cache", path);
 	arguments[0] = pgm;
 	arguments[1] = "";
@@ -55,7 +55,7 @@ static int merge_entry(int pos, const ch
 	do {
 		static char hexbuf[4][60];
 		static char ownbuf[4][60];
-		struct cache_entry *ce = active_cache[pos];
+		struct cache_entry *ce = cc_to_ce(cc);
 		int stage = ce_stage(ce);
 
 		if (strcmp(ce->name, path))
@@ -65,34 +65,31 @@ static int merge_entry(int pos, const ch
 		sprintf(ownbuf[stage], "%o", ntohl(ce->ce_mode) & (~S_IFMT));
 		arguments[stage] = hexbuf[stage];
 		arguments[stage + 4] = ownbuf[stage];
-	} while (++pos < active_nr);
+		next_cc(cc);
+	} while (!cache_eof(cc));
 	if (!found)
 		die("git-merge-index: %s not in the cache", path);
 	run_program();
-	return found;
 }
 
+/*
+ * If it already exists in the cache as stage0, it's
+ * already merged and there is nothing to do.
+ */
 static void merge_file(const char *path)
 {
-	int pos = cache_name_pos(path, strlen(path));
-
-	/*
-	 * If it already exists in the cache as stage0, it's
-	 * already merged and there is nothing to do.
-	 */
-	if (pos < 0)
-		merge_entry(-pos-1, path);
+	struct cache_cursor cc;
+	if (!cache_find_name(path, strlen(path), &cc))
+		merge_entry(&cc, path);
 }
 
-static void merge_all(void)
+static int merge_one(struct cache_cursor *cc, struct cache_entry *ce)
 {
-	int i;
-	for (i = 0; i < active_nr; i++) {
-		struct cache_entry *ce = active_cache[i];
-		if (!ce_stage(ce))
-			continue;
-		i += merge_entry(i, ce->name)-1;
-	}
+	if (ce_stage(ce))
+		merge_entry(cc, ce->name);
+	else
+		next_cc(cc);
+	return 0;
 }
 
 int main(int argc, char **argv)
@@ -102,7 +99,8 @@ int main(int argc, char **argv)
 	if (argc < 3)
 		usage("git-merge-index [-o] [-q] <merge-program> (-a | <filename>*)");
 
-	read_cache();
+	if (read_cache() < 0)
+		die("unable to read index file");
 
 	i = 1;
 	if (!strcmp(argv[i], "-o")) {
@@ -122,7 +120,7 @@ int main(int argc, char **argv)
 				continue;
 			}
 			if (!strcmp(arg, "-a")) {
-				merge_all();
+				walk_cache(merge_one);
 				continue;
 			}
 			die("git-merge-index: unknown option %s", arg);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (19 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 20/22] teach merge-index.c " Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 20:43   ` Daniel Barkalow
  2005-09-12 14:56 ` [PATCH 22/22] teach read-cache.c to use cache_find_name() Chuck Lever
                   ` (2 subsequent siblings)
  23 siblings, 1 reply; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

For now, we simply replace indpos with a cache cursor.  Likely more
changes will be needed after we successfully replace the cache array
with an abstract data type.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 read-tree.c |   23 ++++++++++++++---------
 1 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -50,7 +50,8 @@ static int entcmp(char *name1, int dir1,
 }
 
 static int unpack_trees_rec(struct tree_entry_list **posns, int len,
-			    const char *base, merge_fn_t fn, int *indpos)
+			    const char *base, merge_fn_t fn,
+			    struct cache_cursor *cc)
 {
 	int baselen = strlen(base);
 	int src_size = len + 1;
@@ -73,7 +74,7 @@ static int unpack_trees_rec(struct tree_
 		cache_name = NULL;
 
 		/* Check the cache */
-		if (merge && *indpos < active_nr) {
+		if (merge && !cache_eof(cc)) {
 			/* This is a bit tricky: */
 			/* If the index has a subdirectory (with
 			 * contents) as the first name, it'll get a
@@ -91,7 +92,7 @@ static int unpack_trees_rec(struct tree_
 			 * file case.
 			 */
 
-			cache_name = active_cache[*indpos]->name;
+			cache_name = cc_to_ce(cc)->name;
 			if (strlen(cache_name) > baselen &&
 			    !memcmp(cache_name, base, baselen)) {
 				cache_name += baselen;
@@ -133,8 +134,8 @@ static int unpack_trees_rec(struct tree_
 
 		if (cache_name && !strcmp(cache_name, first)) {
 			any_files = 1;
-			src[0] = active_cache[*indpos];
-			remove_cache_entry_at(*indpos);
+			src[0] = cc_to_ce(cc);
+			remove_cache_entry_at(cc->pos);
 		}
 
 		for (i = 0; i < len; i++) {
@@ -203,7 +204,8 @@ static int unpack_trees_rec(struct tree_
 #if DBRT_DEBUG > 1
 				printf("Added %d entries\n", ret);
 #endif
-				*indpos += ret;
+				while (ret--)
+					next_cc(cc);
 			} else {
 				for (i = 0; i < src_size; i++) {
 					if (src[i]) {
@@ -219,7 +221,7 @@ static int unpack_trees_rec(struct tree_
 			newbase[baselen + pathlen] = '/';
 			newbase[baselen + pathlen + 1] = '\0';
 			if (unpack_trees_rec(subposns, len, newbase, fn,
-					     indpos))
+					     cc))
 				return -1;
 		}
 		free(subposns);
@@ -261,18 +263,21 @@ out:
 
 static int unpack_trees(merge_fn_t fn)
 {
-	int indpos = 0;
+	struct cache_cursor cc;
 	unsigned len = object_list_length(trees);
 	struct tree_entry_list **posns = 
 		xmalloc(len * sizeof(struct tree_entry_list *));
 	int i;
 	struct object_list *posn = trees;
+
 	merge_size = len;
 	for (i = 0; i < len; i++) {
 		posns[i] = ((struct tree *) posn->item)->entries;
 		posn = posn->next;
 	}
-	if (unpack_trees_rec(posns, len, "", fn, &indpos))
+
+	init_cc(&cc);
+	if (unpack_trees_rec(posns, len, "", fn, &cc))
 		return -1;
 
 	walk_cache(check_one_out);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 22/22] teach read-cache.c to use cache_find_name()
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (20 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 21/22] teach the merge algorithm about cache iterators Chuck Lever
@ 2005-09-12 14:56 ` Chuck Lever
  2005-09-12 15:38 ` [PATCH 00/22] cache cursors: an introduction A Large Angry SCM
  2005-09-12 19:53 ` Junio C Hamano
  23 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 14:56 UTC (permalink / raw)
  To: git

Fix up the functions in read-cache.c to use cache cursors and
cache_find_name().  As a bonus, cache_name_pos() is no longer
needed.

We've now replaced all logic that depends on a negative result
from cache_name_pos to detect cache insertion points.

Signed-off-by: Chuck Lever <cel@netapp.com>
---

 cache.h      |    3 -
 read-cache.c |  118 ++++++++++++++++++++++++----------------------------------
 read-tree.c  |    2 -
 3 files changed, 51 insertions(+), 72 deletions(-)

diff --git a/cache.h b/cache.h
--- a/cache.h
+++ b/cache.h
@@ -159,14 +159,13 @@ extern char *prefix_path(const char *pre
 extern int read_cache(void);
 extern int read_cache_unmerged(void);
 extern int write_cache(int newfd);
-extern int cache_name_pos(const char *name, int namelen);
 extern int cache_find_name(const char *name, int namelen, struct cache_cursor *cc);
 
 #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4	/* Ok to skip DF conflict checks */
 extern int add_cache_entry(struct cache_entry *ce, int option);
-extern int remove_cache_entry_at(int pos);
+extern int remove_cache_entry_at(struct cache_cursor *cc);
 extern int remove_file_from_cache(char *path);
 extern void prune_cache(const char *prefix, int prefix_len);
 extern int ce_same_name(struct cache_entry *a, struct cache_entry *b);
diff --git a/read-cache.c b/read-cache.c
--- a/read-cache.c
+++ b/read-cache.c
@@ -123,27 +123,6 @@ int cache_name_compare(const char *name1
 	return 0;
 }
 
-int cache_name_pos(const char *name, int namelen)
-{
-	int first, last;
-
-	first = 0;
-	last = active_nr;
-	while (last > first) {
-		int next = (last + first) >> 1;
-		struct cache_entry *ce = active_cache[next];
-		int cmp = cache_name_compare(name, namelen, ce->name, ntohs(ce->ce_flags));
-		if (!cmp)
-			return next;
-		if (cmp < 0) {
-			last = next;
-			continue;
-		}
-		first = next+1;
-	}
-	return -first-1;
-}
-
 /*
  * Given a name, find the first cache entry that matches.  Returning 1
  * means the cursor points to the cache entry with a matching name.
@@ -189,11 +168,13 @@ int cache_find_name(const char *name, in
 }
 
 /* Remove entry, return true if there are more entries to go.. */
-int remove_cache_entry_at(int pos)
+int remove_cache_entry_at(struct cache_cursor *cc)
 {
+	int pos = cc->pos;
+
 	active_cache_changed = 1;
 	active_nr--;
-	if (pos >= active_nr)
+	if (cache_eof(cc))
 		return 0;
 	memmove(active_cache + pos, active_cache + pos + 1, (active_nr - pos) * sizeof(struct cache_entry *));
 	return 1;
@@ -201,11 +182,11 @@ int remove_cache_entry_at(int pos)
 
 int remove_file_from_cache(char *path)
 {
-	int pos = cache_name_pos(path, strlen(path));
-	if (pos < 0)
-		pos = -pos-1;
-	while (pos < active_nr && !strcmp(active_cache[pos]->name, path))
-		remove_cache_entry_at(pos);
+	struct cache_cursor cc;
+
+	cache_find_name(path, strlen(path), &cc);
+	while (!cache_eof(&cc) && !strcmp(cc_to_ce(&cc)->name, path))
+		remove_cache_entry_at(&cc);
 	return 0;
 }
 
@@ -214,13 +195,12 @@ int remove_file_from_cache(char *path)
  */
 void prune_cache(const char *prefix, int prefix_len)
 {
-	int pos = cache_name_pos(prefix, prefix_len);
+	struct cache_cursor cc;
 	unsigned int first, last;
 
-	if (pos < 0)
-		pos = -pos-1;
-	active_cache += pos;
-	active_nr -= pos;
+	cache_find_name(prefix, prefix_len, &cc);
+	active_cache += cc.pos;
+	active_nr -= cc.pos;
 	first = 0;
 	last = active_nr;
 	while (last > first) {
@@ -237,7 +217,8 @@ void prune_cache(const char *prefix, int
 
 int ce_same_name(struct cache_entry *a, struct cache_entry *b)
 {
-	int len = ce_namelen(a);
+	int len;
+	len = ce_namelen(a);
 	return ce_namelen(b) == len && !memcmp(a->name, b->name, len);
 }
 
@@ -271,28 +252,30 @@ int ce_path_match(const struct cache_ent
  * Do we have another file that has the beginning components being a
  * proper superset of the name we're trying to add?
  */
-static int has_file_name(const struct cache_entry *ce, int pos, int ok_to_replace)
+static int has_file_name(const struct cache_entry *ce, struct cache_cursor *cc, int ok_to_replace)
 {
 	int retval = 0;
 	int len = ce_namelen(ce);
 	int stage = ce_stage(ce);
 	const char *name = ce->name;
 
-	while (pos < active_nr) {
-		struct cache_entry *p = active_cache[pos++];
+	while (!cache_eof(cc)) {
+		struct cache_entry *p = cc_to_ce(cc);
 
 		if (len >= ce_namelen(p))
 			break;
 		if (memcmp(name, p->name, len))
 			break;
 		if (ce_stage(p) != stage)
-			continue;
+			goto next;
 		if (p->name[len] != '/')
-			continue;
+			goto next;
 		retval = -1;
 		if (!ok_to_replace)
 			break;
-		remove_cache_entry_at(--pos);
+		remove_cache_entry_at(cc);
+next:
+		next_cc(cc);
 	}
 	return retval;
 }
@@ -301,8 +284,9 @@ static int has_file_name(const struct ca
  * Do we have another file with a pathname that is a proper
  * subset of the name we're trying to add?
  */
-static int has_dir_name(const struct cache_entry *ce, int pos, int ok_to_replace)
+static int has_dir_name(const struct cache_entry *ce, int ok_to_replace)
 {
+	struct cache_cursor cc;
 	int retval = 0;
 	int stage = ce_stage(ce);
 	const char *name = ce->name;
@@ -319,12 +303,11 @@ static int has_dir_name(const struct cac
 		}
 		len = slash - name;
 
-		pos = cache_name_pos(name, ntohs(create_ce_flags(len, stage)));
-		if (pos >= 0) {
+		if (cache_find_name(name, ntohs(create_ce_flags(len, stage)), &cc)) {
 			retval = -1;
 			if (ok_to_replace)
 				break;
-			remove_cache_entry_at(pos);
+			remove_cache_entry_at(&cc);
 			continue;
 		}
 
@@ -333,9 +316,8 @@ static int has_dir_name(const struct cac
 		 * already matches the sub-directory, then we know
 		 * we're ok, and we can exit.
 		 */
-		pos = -pos-1;
-		while (pos < active_nr) {
-			struct cache_entry *p = active_cache[pos];
+		while (!cache_eof(&cc)) {
+			struct cache_entry *p = cc_to_ce(&cc);
 			if ((ce_namelen(p) <= len) ||
 			    (p->name[len] != '/') ||
 			    memcmp(p->name, name, len))
@@ -347,7 +329,7 @@ static int has_dir_name(const struct cac
 				 * level or anything shorter.
 				 */
 				return retval;
-			pos++;
+			next_cc(&cc);
 		}
 	}
 	return retval;
@@ -362,45 +344,41 @@ static int has_dir_name(const struct cac
  * from the cache so the caller should recompute the insert position.
  * When this happens, we return non-zero.
  */
-static int check_file_directory_conflict(const struct cache_entry *ce, int pos, int ok_to_replace)
+static int check_file_directory_conflict(const struct cache_entry *ce, struct cache_cursor *cc, int ok_to_replace)
 {
+	struct cache_cursor cd = *cc;
 	/*
 	 * We check if the path is a sub-path of a subsequent pathname
 	 * first, since removing those will not change the position
 	 * in the array
 	 */
-	int retval = has_file_name(ce, pos, ok_to_replace);
+	int retval = has_file_name(ce, &cd, ok_to_replace);
 	/*
 	 * Then check if the path might have a clashing sub-directory
 	 * before it.
 	 */
-	return retval + has_dir_name(ce, pos, ok_to_replace);
+	return retval + has_dir_name(ce, ok_to_replace);
 }
 
 int add_cache_entry(struct cache_entry *ce, int option)
 {
-	int pos;
+	struct cache_cursor cc;
 	int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
 	int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
 	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
-	pos = cache_name_pos(ce->name, ntohs(ce->ce_flags));
 
 	/* existing match? Just replace it */
-	if (pos >= 0) {
-		active_cache_changed = 1;
-		active_cache[pos] = ce;
-		return 0;
-	}
-	pos = -pos-1;
+	if (cache_find_name(ce->name, ntohs(ce->ce_flags), &cc))
+		goto out;
 
 	/*
 	 * Inserting a merged entry ("stage 0") into the index
 	 * will always replace all non-merged entries..
 	 */
-	if (pos < active_nr && ce_stage(ce) == 0) {
-		while (ce_same_name(active_cache[pos], ce)) {
+	if (!cache_eof(&cc) && ce_stage(ce) == 0) {
+		while (ce_same_name(cc_to_ce(&cc), ce)) {
 			ok_to_add = 1;
-			if (!remove_cache_entry_at(pos))
+			if (!remove_cache_entry_at(&cc))
 				break;
 		}
 	}
@@ -408,11 +386,10 @@ int add_cache_entry(struct cache_entry *
 	if (!ok_to_add)
 		return -1;
 
-	if (!skip_df_check && check_file_directory_conflict(ce, pos, ok_to_replace)) {
+	if (!skip_df_check && check_file_directory_conflict(ce, &cc, ok_to_replace)) {
 		if (!ok_to_replace)
 			return -1;
-		pos = cache_name_pos(ce->name, ntohs(ce->ce_flags));
-		pos = -pos-1;
+		cache_find_name(ce->name, ntohs(ce->ce_flags), &cc);
 	}
 
 	/* Make sure the array is big enough .. */
@@ -423,10 +400,13 @@ int add_cache_entry(struct cache_entry *
 
 	/* Add it in.. */
 	active_nr++;
-	if (active_nr > pos)
+	if (!cache_eof(&cc)) {
+		int pos = cc.pos;
 		memmove(active_cache + pos + 1, active_cache + pos, (active_nr - pos - 1) * sizeof(ce));
-	active_cache[pos] = ce;
-	active_cache_changed = 1;
+	}
+
+out:
+	set_ce_at_cursor(&cc, ce);
 	return 0;
 }
 
@@ -456,7 +436,7 @@ int read_cache(void)
 	struct cache_header *hdr;
 
 	errno = EBUSY;
-	if (active_cache)
+	if (!read_cache_needed())
 		return error("more than one cachefile");
 	errno = ENOENT;
 	fd = open(get_index_file(), O_RDONLY);
diff --git a/read-tree.c b/read-tree.c
--- a/read-tree.c
+++ b/read-tree.c
@@ -135,7 +135,7 @@ static int unpack_trees_rec(struct tree_
 		if (cache_name && !strcmp(cache_name, first)) {
 			any_files = 1;
 			src[0] = cc_to_ce(cc);
-			remove_cache_entry_at(cc->pos);
+			remove_cache_entry_at(cc);
 		}
 
 		for (i = 0; i < len; i++) {

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (21 preceding siblings ...)
  2005-09-12 14:56 ` [PATCH 22/22] teach read-cache.c to use cache_find_name() Chuck Lever
@ 2005-09-12 15:38 ` A Large Angry SCM
  2005-09-12 16:37   ` Chuck Lever
  2005-09-12 19:53 ` Junio C Hamano
  23 siblings, 1 reply; 49+ messages in thread
From: A Large Angry SCM @ 2005-09-12 15:38 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

Chuck Lever wrote:
> [ This series is posted for review and comments. ]
> 
> The following patch series introduces an abstraction called a "cache
> cursor" that will eventually allow us to replace the current
> active_cache array with something else.
> 
> A cache cursor represents a position inside the cache.  This position
> has a cache_entry associated with it, of course, but since the cache
> is ordered, a cache cursor also has the concept of next, previous,
> and end-of-cache.
> 
> With a cache cursor we can build a simple iterator mechanism that
> calls a particular function for every entry in the cache, in order.
> This allows us to hide further the specifics of the active cache
> implementation -- the function gets to see the cache cursor and
> an element, but does not have direct access to the cache and cannot
> assume it has a particular structure.
> 
> Currently the cache cursor type is just a structure with an integer
> in it, so it largely mimics the existing implementation.
> 
> This patch series is against the "proposed updates" branch, as of
> a couple of days ago.  It has been tested via "make test" and I'm
> currently using it for my own work without issue.

I'll let others comment on the need for this type of facility and it's 
proposed implementation.

Since you are proposing an API, some basic documentation about how to 
use the API would be nice. Comments in cache.h seems the best place, for 
now.

The sentence "This patch series is against the "proposed updates" 
branch, as of a couple of days ago." should have also included a commit 
ID. That way we would know where/when the patches would apply cleanly 
for testing and dissection.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 15:38 ` [PATCH 00/22] cache cursors: an introduction A Large Angry SCM
@ 2005-09-12 16:37   ` Chuck Lever
  2005-09-12 20:26     ` Daniel Barkalow
                       ` (2 more replies)
  0 siblings, 3 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 16:37 UTC (permalink / raw)
  To: gitzilla; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]

A Large Angry SCM wrote:
> Since you are proposing an API, some basic documentation about how to 
> use the API would be nice. Comments in cache.h seems the best place, for 
> now.

actually it might be best to have a list discussion first.  if i can 
answer questions and provide a few explanations here, the list archive 
might be a reasonable resting place for documentation.

the API is fairly simple and is documented via the function names. 
there isn't a whole lot of function-level documentation in the git code 
base that i have seen, so i erred on the side of less is more.

the first patch "introduce-cache-cursors provides most of the interface, 
and the subsequent patches demonstrate how to use it by changing parts 
of the git C code base to use the interface instead.

the main pieces are:

+  struct cache_cursor

which describes a position in the cache.

+  init_cc

which sets the given cursor to the top of the cache.

+  cc_to_ce

which extracts the cache entry that a cursor refers to.

+  next_cc, prev_cc, cache_eof

which allow cursor movement.

+  next_name

which skips to the next unique name in the cache.

+  walk_cache

which calls a function for each entry in the cache, in order.  the given 
function is responsible for moving the cursor to the next position 
before returning.

+  cache_find_name

which returns a cache cursor pointing to the found entry or to the entry 
which can be used as an insertion point if nothing matches.

+  cache_find_entry

which is a wrapper around cache_find_name that returns an entry instead 
of a position.

> The sentence "This patch series is against the "proposed updates" 
> branch, as of a couple of days ago." should have also included a commit 
> ID. That way we would know where/when the patches would apply cleanly 
> for testing and dissection.

i'm a dork.

6ae3d6e6d0f87cfa75b4bf213a485ff687defce8

i will include the base ref in my future postings.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
                   ` (22 preceding siblings ...)
  2005-09-12 15:38 ` [PATCH 00/22] cache cursors: an introduction A Large Angry SCM
@ 2005-09-12 19:53 ` Junio C Hamano
  2005-09-12 20:22   ` Daniel Barkalow
  2005-09-12 23:59   ` Chuck Lever
  23 siblings, 2 replies; 49+ messages in thread
From: Junio C Hamano @ 2005-09-12 19:53 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

I've only skimmed the surface of your patchset and cannot
comment on the correctness of all the conversion of active_cache
users; today is my day-job day not a GIT day.

I have to say you did quite a lot of work, and I am pleasantly
surprised to see the massive clean-up this change brings us.  It
seems like this makes the active_cache users a lot easier to
read.

I have a couple of comments on the API, though.

* Doesn't function to be applied usually want to have its own
  data when passed to walk, maybe something like this?

  typedef int (*cache_iterator_fn_t) (struct cache_cursor *cc,
			 struct cache_entry *ce, void *udata);
  static inline int walk_cache(cache_iterator_fn_t func, void *udata)
  {
          struct cache_cursor cc;

          init_cc(&cc);
          while (!cache_eof(&cc)) {
                  int status = func(&cc, cc_to_ce(&cc), udata);
                  if (status < 0)
                          return status;
          }
          return 0;
  }

  This was a question I had when I read [PATCH 01/22] before
  reading the rest of the patches, but the actual conversion
  does not seem to find much need for it.  A new global variable
  pathspec is introduced to pass information the API is unable
  to pass to diff_one() in diff-index.c, which may be a sign
  that an extra "user data" parameter might help.  Your call.

* It may make sense to give another param to describe which
  cache the caller is talking about so that we can later have
  more than one cache at the same time:

  struct cache {
      struct cache_entry **cache_array;
      unsigned int nr;
      unsigned int alloc;
      unsigned int cache_changed;
  };
  struct cache active_cache;

  and use it like this:

  static inline struct cache_entry *cc_to_ce(struct cache_cursor *cc,
                                             struct cache *cache)
  {
          return cache->cache_array[cc->pos];
  }

  We could argue that this should be left for later rounds.  On
  the other hand, we will be changing all the cc_* function call
  sites during that round, which is by definition the places you
  are touching in this round anyway.  Also I suspect that the
  "later job" is made larger if we do something like this during
  this round:

  diff --git a/cache.h b/cache.h
  --- a/cache.h
  +++ b/cache.h
  @@ -157,7 +157,7 @@ extern char *prefix_path(const char *pre

   /* Initialize and use the cache information */
   extern int read_cache(void);
  -extern int write_cache(int newfd, struct cache_entry **cache, int entries);
  +extern int write_cache(int newfd);
   extern int cache_name_pos(const char *name, int namelen);
   #define ADD_CACHE_OK_TO_ADD 1		/* Ok to add */
   #define ADD_CACHE_OK_TO_REPLACE 2	/* Ok to replace file/directory */

  This function could already act on more than one active_cache,
  although nobody uses it like that.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 19:53 ` Junio C Hamano
@ 2005-09-12 20:22   ` Daniel Barkalow
  2005-09-12 20:30     ` Junio C Hamano
  2005-09-12 23:59   ` Chuck Lever
  1 sibling, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-12 20:22 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Chuck Lever, git

On Mon, 12 Sep 2005, Junio C Hamano wrote:

> I've only skimmed the surface of your patchset and cannot
> comment on the correctness of all the conversion of active_cache
> users; today is my day-job day not a GIT day.
> 
> I have to say you did quite a lot of work, and I am pleasantly
> surprised to see the massive clean-up this change brings us.  It
> seems like this makes the active_cache users a lot easier to
> read.
> 
> I have a couple of comments on the API, though.
> 
> * Doesn't function to be applied usually want to have its own
>   data when passed to walk, maybe something like this?
> 
>   This was a question I had when I read [PATCH 01/22] before
>   reading the rest of the patches, but the actual conversion
>   does not seem to find much need for it.  A new global variable
>   pathspec is introduced to pass information the API is unable
>   to pass to diff_one() in diff-index.c, which may be a sign
>   that an extra "user data" parameter might help.  Your call.

I agree that it only works for the current conversion, due to there only 
being a limited amount you might try to do in a single git executable 
currently. Long-term, that should be fixed.

> * It may make sense to give another param to describe which
>   cache the caller is talking about so that we can later have
>   more than one cache at the same time:
> 
>   We could argue that this should be left for later rounds.  On
>   the other hand, we will be changing all the cc_* function call
>   sites during that round, which is by definition the places you
>   are touching in this round anyway. 

Wouldn't it be better to only take it in cc_init(), and have the cursor 
remember what it's iterating through?

I'm actually particularly interested in having a pair of caches for 
read-tree, because it would actually like to keep the old index separate 
from the index it's building.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 16:37   ` Chuck Lever
@ 2005-09-12 20:26     ` Daniel Barkalow
  2005-09-12 20:47     ` Junio C Hamano
  2005-09-13 19:06     ` Tim Ottinger
  2 siblings, 0 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-12 20:26 UTC (permalink / raw)
  To: Chuck Lever; +Cc: gitzilla, git

On Mon, 12 Sep 2005, Chuck Lever wrote:

> A Large Angry SCM wrote:
> > Since you are proposing an API, some basic documentation about how to use
> > the API would be nice. Comments in cache.h seems the best place, for now.
> 
> actually it might be best to have a list discussion first.  if i can answer
> questions and provide a few explanations here, the list archive might be a
> reasonable resting place for documentation.
> 
> the API is fairly simple and is documented via the function names. there isn't
> a whole lot of function-level documentation in the git code base that i have
> seen, so i erred on the side of less is more.
> 
> the main pieces are:
> 
> +  next_name
> 
> which skips to the next unique name in the cache.

Something with "cc" in it?

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 20:22   ` Daniel Barkalow
@ 2005-09-12 20:30     ` Junio C Hamano
  0 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2005-09-12 20:30 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Chuck Lever, git

Daniel Barkalow <barkalow@iabervon.org> writes:

>> * It may make sense to give another param to describe which
>>   cache the caller is talking about so that we can later have
>>   more than one cache at the same time:
>> 
> Wouldn't it be better to only take it in cc_init(), and have the cursor 
> remember what it's iterating through?

Yes.

> I'm actually particularly interested in having a pair of caches for 
> read-tree, because it would actually like to keep the old index separate 
> from the index it's building.

Yes.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-12 14:56 ` [PATCH 21/22] teach the merge algorithm about cache iterators Chuck Lever
@ 2005-09-12 20:43   ` Daniel Barkalow
  2005-09-13  0:02     ` Chuck Lever
  2005-09-14 15:36     ` Chuck Lever
  0 siblings, 2 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-12 20:43 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

On Mon, 12 Sep 2005, Chuck Lever wrote:

> For now, we simply replace indpos with a cache cursor.  Likely more
> changes will be needed after we successfully replace the cache array
> with an abstract data type.

The right order is probably to add the concept of a cache that isn't the 
one that normal functions deal with, have read_cache_unmerged return such 
a thing, call cc_init with that, and rip out all of the removal and 
position adjustment code. Then read_tree won't care at all about the 
internal structure of the cache type, and it can be replaced without any 
problem.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 16:37   ` Chuck Lever
  2005-09-12 20:26     ` Daniel Barkalow
@ 2005-09-12 20:47     ` Junio C Hamano
  2005-09-13 19:06     ` Tim Ottinger
  2 siblings, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2005-09-12 20:47 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

Chuck Lever <cel@citi.umich.edu> writes:

>> The sentence "This patch series is against the "proposed updates"
>> branch, as of a couple of days ago." should have also included a
>> commit ID. That way we would know where/when the patches would apply
>> cleanly for testing and dissection.
>
> i'm a dork.
>
> 6ae3d6e6d0f87cfa75b4bf213a485ff687defce8
>
> i will include the base ref in my future postings.

No need for any of that.  All the necessary bits are already in
the "master" branch.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 19:53 ` Junio C Hamano
  2005-09-12 20:22   ` Daniel Barkalow
@ 2005-09-12 23:59   ` Chuck Lever
  2005-09-13  0:14     ` Junio C Hamano
  2005-09-13  0:17     ` Linus Torvalds
  1 sibling, 2 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-12 23:59 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 2277 bytes --]

Junio C Hamano wrote:
> I have a couple of comments on the API, though.
> 
> * Doesn't function to be applied usually want to have its own
>   data when passed to walk, maybe something like this?
> 
>   typedef int (*cache_iterator_fn_t) (struct cache_cursor *cc,
> 			 struct cache_entry *ce, void *udata);
>   static inline int walk_cache(cache_iterator_fn_t func, void *udata)
>   {
>           struct cache_cursor cc;
> 
>           init_cc(&cc);
>           while (!cache_eof(&cc)) {
>                   int status = func(&cc, cc_to_ce(&cc), udata);
>                   if (status < 0)
>                           return status;
>           }
>           return 0;
>   }
> 
>   This was a question I had when I read [PATCH 01/22] before
>   reading the rest of the patches, but the actual conversion
>   does not seem to find much need for it.  A new global variable
>   pathspec is introduced to pass information the API is unable
>   to pass to diff_one() in diff-index.c, which may be a sign
>   that an extra "user data" parameter might help.  Your call.

heh.  well, i had something like this earlier, but i know linus doesn't 
like void *, and it was really kind of ugly.  and as you observed, it's 
used so rarely.  so i just decided to drop it.

> * It may make sense to give another param to describe which
>   cache the caller is talking about so that we can later have
>   more than one cache at the same time:
> 
>   struct cache {
>       struct cache_entry **cache_array;
>       unsigned int nr;
>       unsigned int alloc;
>       unsigned int cache_changed;
>   };
>   struct cache active_cache;
> 
>   and use it like this:
> 
>   static inline struct cache_entry *cc_to_ce(struct cache_cursor *cc,
>                                              struct cache *cache)
>   {
>           return cache->cache_array[cc->pos];
>   }
> 
>   We could argue that this should be left for later rounds.  On
>   the other hand, we will be changing all the cc_* function call
>   sites during that round, which is by definition the places you
>   are touching in this round anyway.

actually this is simple to add now.  i'll give it a shot (and fix up 
write_cache to use it).

btw, with daniel's changes i don't see where we're using 
active_cache_changed any more.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-12 20:43   ` Daniel Barkalow
@ 2005-09-13  0:02     ` Chuck Lever
  2005-09-14 15:36     ` Chuck Lever
  1 sibling, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-13  0:02 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 707 bytes --]

Daniel Barkalow wrote:
> On Mon, 12 Sep 2005, Chuck Lever wrote:
>>For now, we simply replace indpos with a cache cursor.  Likely more
>>changes will be needed after we successfully replace the cache array
>>with an abstract data type.
> 
> The right order is probably to add the concept of a cache that isn't the 
> one that normal functions deal with, have read_cache_unmerged return such 
> a thing, call cc_init with that, and rip out all of the removal and 
> position adjustment code. Then read_tree won't care at all about the 
> internal structure of the cache type, and it can be replaced without any 
> problem.

yeah, i've come to the same conclusion, and started screwing with this 
idea today.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 23:59   ` Chuck Lever
@ 2005-09-13  0:14     ` Junio C Hamano
  2005-09-13  0:17     ` Linus Torvalds
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2005-09-13  0:14 UTC (permalink / raw)
  To: cel; +Cc: git

Chuck Lever <cel@citi.umich.edu> writes:

> btw, with daniel's changes i don't see where we're using 
> active_cache_changed any more.

That came from an earlier botched attempt of mine to optimize
out writing of cache (eh, index file these days it is called but
back then it was "cache") when read-tree ended up not modifying
the cache contents (e.g. reading HEAD immediately after checking
it out).  My implementation was quite buggy and Linus fixed it
in the ee267527aa80807f37caf1d00bcf1b5263945adb commit by adding
the variable while disabling the optimization for safety.  And
the optimization has not been re-enabled ever since.  I think
you can remove the variable now.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 23:59   ` Chuck Lever
  2005-09-13  0:14     ` Junio C Hamano
@ 2005-09-13  0:17     ` Linus Torvalds
  1 sibling, 0 replies; 49+ messages in thread
From: Linus Torvalds @ 2005-09-13  0:17 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Junio C Hamano, git

On Mon, 12 Sep 2005, Chuck Lever wrote:
> 
> heh.  well, i had something like this earlier, but i know linus doesn't 
> like void *, and it was really kind of ugly.  and as you observed, it's 
> used so rarely.  so i just decided to drop it.

Iterators are _much_ nicer if you can use them in-line instead of with a 
function pointer. It makes them a hundred times more powerful, and avoids 
all the crap with trying to pass a magic argument around.

That was why I mentioned the sparse "ptrlist" implementation: the data 
structure itself may not be all that exciting, but the syntax for _using_ 
it is incredibly powerful. Much better than any other list handling 
package I've ever seen, if I do say so myself.

So check out

	kernel.org:/pub/scm/devel/sparse/sparse.git

and look at how easy it is to iterate over a list. The _true_ power is how 
you can return out of a function in the middle of the iterator, ie

	int pseudo_in_list(struct pseudo_list *list, pseudo_t pseudo)
	{
	        pseudo_t old;
	        FOR_EACH_PTR(list,old) {
	                if (old == pseudo)
	                        return 1;
	        } END_FOR_EACH_PTR(old);
	        return 0;
	}

(and it's type-safe too!)

Now, nested iterators may sound easy, but they aren't easy if you want to 
actually break out of them in a nested way. Try to do _this_ with a 
function pointer interface without going crazy:

	        /* Remove the pseudos from the "defines" list that are used internally */
	        FOR_EACH_PTR(ep->bbs, bb) {
	                pseudo_t def;
	                FOR_EACH_PTR(bb->defines, def) {
	                        struct basic_block *child;
	                        FOR_EACH_PTR(bb->children, child) {
	                                if (pseudo_in_list(child->needs, def))
	                                        goto is_used;
	                        } END_FOR_EACH_PTR(child);
	                        DELETE_CURRENT_PTR(def);
	is_used:
	                ;
	                } END_FOR_EACH_PTR(def);
	                PACK_PTR_LIST(&bb->defines);
	        } END_FOR_EACH_PTR(bb);

all real examples from real code that implements an almost-real compiler.

Very dense code too, btw. It's incredible how expressive these iterators 
are: I started out with a function pointer interface, and it was painful 
as _hell_ to see what was going on.

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-12 16:37   ` Chuck Lever
  2005-09-12 20:26     ` Daniel Barkalow
  2005-09-12 20:47     ` Junio C Hamano
@ 2005-09-13 19:06     ` Tim Ottinger
  2005-09-13 19:46       ` Junio C Hamano
  2005-09-14  8:28       ` Catalin Marinas
  2 siblings, 2 replies; 49+ messages in thread
From: Tim Ottinger @ 2005-09-13 19:06 UTC (permalink / raw)
  To: cel; +Cc: gitzilla, git

I know it's a little picky, but can we be consistently noun-verb-noun 
with these?
I'm not meaning to be a jerk (it just happens unbidden).

My reasons
1) My editor has word completion, which is handy.
2) However unimportant, I'm an old OO guy and object_cmd looks like 
object.command to me.
3) I'm so stupid I need consistent naming to help me learn the git guts.
4) I'm generally a butt about naming anyway 
(http://tottinge.blogsome.com/naming-rules/)
5) I work with a guy who is a stickler for lexicon, and he revived the 
passion for consistency.

> which describes a position in the cache.
>
> +  init_cc

cc_init?

>
> +  next_cc, prev_cc, cache_eof

cc_next, cc_previous

Which is that, cc_to_end or cc_at_end?

> +  next_name

cc_next_name?

>
> +  walk_cache

cache_walk?

The rest of those looked great.  I only listed the ones I was unsure about.

-- 
                             ><>
... either 'way ahead of the game, or 'way out in left field.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-13 19:06     ` Tim Ottinger
@ 2005-09-13 19:46       ` Junio C Hamano
  2005-09-13 20:06         ` Tim Ottinger
  2005-09-14  8:28       ` Catalin Marinas
  1 sibling, 1 reply; 49+ messages in thread
From: Junio C Hamano @ 2005-09-13 19:46 UTC (permalink / raw)
  To: Tim Ottinger; +Cc: gitzilla, git, cel

Tim Ottinger <tottinge@progeny.com> writes:

> 2) However unimportant, I'm an old OO guy and object_cmd looks like 
> object.command to me.

If you are OO then would not object_method remind you of object->method ??

>> +  init_cc
>> +  next_cc, prev_cc
>
> cc_init?
> cc_next, cc_previous

Nah, either set is fine as long as it is internally consistent.
I tend to prefer "do-this-to-that" so init_cc and next_cc are
fine by me (just one person's opinion, not a dictator's ruling).

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-13 19:46       ` Junio C Hamano
@ 2005-09-13 20:06         ` Tim Ottinger
  0 siblings, 0 replies; 49+ messages in thread
From: Tim Ottinger @ 2005-09-13 20:06 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: gitzilla, git, cel

Junio C Hamano wrote:

>Tim Ottinger <tottinge@progeny.com> writes:
>
>  
>
>>2) However unimportant, I'm an old OO guy and object_cmd looks like 
>>object.command to me.
>>    
>>
>
>If you are OO then would not object_method remind you of object->method ??
>
>  
>
>>>+  init_cc
>>>+  next_cc, prev_cc
>>>      
>>>
>>cc_init?
>>cc_next, cc_previous
>>    
>>
>
>Nah, either set is fine as long as it is internally consistent.
>I tend to prefer "do-this-to-that" so init_cc and next_cc are
>fine by me (just one person's opinion, not a dictator's ruling).
>
>  
>
I guess it depends on whether you're looking at command completion or
not. Most the time I have a thing, and want to do something to it.  Then
starting with cc_ helps, but starting with init_ only tells me what I can
init -- more filtering on my part.

Of course, i can just open the darned file and read it. ;-)  So it's a 
matter
of what you and your tools like best.  Starting with the subject does sort
better, though. 


-- 
                             ><>
... either 'way ahead of the game, or 'way out in left field.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-13 19:06     ` Tim Ottinger
  2005-09-13 19:46       ` Junio C Hamano
@ 2005-09-14  8:28       ` Catalin Marinas
  2005-09-14 14:49         ` Chuck Lever
  1 sibling, 1 reply; 49+ messages in thread
From: Catalin Marinas @ 2005-09-14  8:28 UTC (permalink / raw)
  To: Tim Ottinger; +Cc: cel, gitzilla, git

Tim Ottinger <tottinge@progeny.com> wrote:
> 2) However unimportant, I'm an old OO guy and object_cmd looks like
> object.command to me.

Well, it depends on the language. If you only used LISP/CLOS, it would
look more like (cmd object) :-)

-- 
Catalin

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 00/22] cache cursors: an introduction
  2005-09-14  8:28       ` Catalin Marinas
@ 2005-09-14 14:49         ` Chuck Lever
  0 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-14 14:49 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Tim Ottinger, gitzilla, git

[-- Attachment #1: Type: text/plain, Size: 494 bytes --]

Catalin Marinas wrote:
> Tim Ottinger <tottinge@progeny.com> wrote:
> 
>>2) However unimportant, I'm an old OO guy and object_cmd looks like
>>object.command to me.
> 
> 
> Well, it depends on the language. If you only used LISP/CLOS, it would
> look more like (cmd object) :-)
> 

just a note on function naming:  i followed the pre-existing function 
naming convention.  to wit:

existing:

read_cache
write_cache
add_cache_entry
remove_cache_entry_at

new:

init_cc
next_cc
walk_cache

etc.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-12 20:43   ` Daniel Barkalow
  2005-09-13  0:02     ` Chuck Lever
@ 2005-09-14 15:36     ` Chuck Lever
  2005-09-14 16:41       ` Daniel Barkalow
  1 sibling, 1 reply; 49+ messages in thread
From: Chuck Lever @ 2005-09-14 15:36 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1749 bytes --]

Daniel Barkalow wrote:
> On Mon, 12 Sep 2005, Chuck Lever wrote:
> 
> 
>>For now, we simply replace indpos with a cache cursor.  Likely more
>>changes will be needed after we successfully replace the cache array
>>with an abstract data type.
> 
> 
> The right order is probably to add the concept of a cache that isn't the 
> one that normal functions deal with, have read_cache_unmerged return such 
> a thing, call cc_init with that, and rip out all of the removal and 
> position adjustment code. Then read_tree won't care at all about the 
> internal structure of the cache type, and it can be replaced without any 
> problem.

ok, i've done this.  read_cache_unmerged now reads into a separate 
cache, and read-tree.c does the merge by moving the appropriate cache 
entries into the active cache.

the linked list prototype is done, and works correctly.  this validates 
the new cache cursor API.  unfortunately because finding a name is now 
O(n), many things are slower than before (but i expected this would be 
the case for lists).

the next step is to try out more sophisticated data types.  we have 
three on the table so far:

1.  linus' sparse hyperlist implementation.  i suspect this will have 
the same bad performance characteristics as a standard linked list.

2.  self-balancing trees.  a splay tree is a good example.  we can 
reduce the size of the tree by storing all stages of a name in each 
node.  kernel source is about 18K files, which means we can find names 
in about 15 steps, on average.

3.  hash table, hashing on ce->name.  similar to a Python dictionary. 
with an 8 kilobucket hash table and a good hash function, we can store 
the kernel source, finding names in two or three steps on average.

are there others?

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 15:36     ` Chuck Lever
@ 2005-09-14 16:41       ` Daniel Barkalow
  2005-09-14 17:50         ` Junio C Hamano
  2005-09-14 19:49         ` Chuck Lever
  0 siblings, 2 replies; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-14 16:41 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

On Wed, 14 Sep 2005, Chuck Lever wrote:

> Daniel Barkalow wrote:
> > On Mon, 12 Sep 2005, Chuck Lever wrote:
> > 
> > 
> > >For now, we simply replace indpos with a cache cursor.  Likely more
> > >changes will be needed after we successfully replace the cache array
> > >with an abstract data type.
> > 
> > 
> > The right order is probably to add the concept of a cache that isn't the one
> > that normal functions deal with, have read_cache_unmerged return such a
> > thing, call cc_init with that, and rip out all of the removal and position
> > adjustment code. Then read_tree won't care at all about the internal
> > structure of the cache type, and it can be replaced without any problem.
> 
> ok, i've done this.  read_cache_unmerged now reads into a separate cache, and
> read-tree.c does the merge by moving the appropriate cache entries into the
> active cache.
> 
> the linked list prototype is done, and works correctly.  this validates the
> new cache cursor API.  unfortunately because finding a name is now O(n), many
> things are slower than before (but i expected this would be the case for
> lists).

The really exciting thing to do would be to have different programs use 
different implementations, by way of linker magic.

My guess for the ideal is to have a linked list with a hashtable for 
finding entries by looking up names, because we don't look things up by 
index. This combination gives O(1) in-order iteration, O(1) lookup by 
name, O(1) append, O(n) insert, and O(1) remove. This means that 
git-update-cache --add would be slow, but everything else would be fast. 
(Except, of course, for the overhead of actually reading and writing the 
index file, rather than mmaping it.)

Another thing to try would be the original dynamic table implementation, 
plus a hashtable for name lookups, generated the first time a lookup is 
attempted (since some programs don't do any lookups by name). This has the 
advantage of skipping the O(n) startup.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 16:41       ` Daniel Barkalow
@ 2005-09-14 17:50         ` Junio C Hamano
  2005-09-14 19:49         ` Chuck Lever
  1 sibling, 0 replies; 49+ messages in thread
From: Junio C Hamano @ 2005-09-14 17:50 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

Daniel Barkalow <barkalow@iabervon.org> writes:

> Another thing to try would be the original dynamic table implementation, 
> plus a hashtable for name lookups, generated the first time a lookup is 
> attempted (since some programs don't do any lookups by name). This has the 
> advantage of skipping the O(n) startup.

How about just the original dynamic table implementation with
the original binary search name lookups?  Am I missing
something?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 16:41       ` Daniel Barkalow
  2005-09-14 17:50         ` Junio C Hamano
@ 2005-09-14 19:49         ` Chuck Lever
  2005-09-14 20:40           ` Daniel Barkalow
  1 sibling, 1 reply; 49+ messages in thread
From: Chuck Lever @ 2005-09-14 19:49 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 1352 bytes --]

Daniel Barkalow wrote:
> The really exciting thing to do would be to have different programs use 
> different implementations, by way of linker magic.

yes, i've been considering that, but i'm not sure it is really worth the 
effort.  see below -- the right data structure should be good for just 
about any git workload.

> My guess for the ideal is to have a linked list with a hashtable for 
> finding entries by looking up names, because we don't look things up by 
> index. This combination gives O(1) in-order iteration, O(1) lookup by 
> name, O(1) append, O(n) insert, and O(1) remove. This means that 
> git-update-cache --add would be slow, but everything else would be fast. 
> (Except, of course, for the overhead of actually reading and writing the 
> index file, rather than mmaping it.)

[ i'm not sure why you think insert would be O(n). ]

keeping the linked list for O(1) next/prev and delete, and augmenting it 
with a hash table to allow O(m/n) insert and find would be ideal.  with 
a fairly large hash table, we do better than a tree for any reasonably 
sized repository i can imagine.

and, i believe simply adding a hash table to my list implementation will 
be easy, and simpler overall than a tree implementation.  famous last words.

mmapping the index file is still OK.  i haven't changed the cache_entry 
structure at all.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 19:49         ` Chuck Lever
@ 2005-09-14 20:40           ` Daniel Barkalow
  2005-09-14 22:28             ` Chuck Lever
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-14 20:40 UTC (permalink / raw)
  To: Chuck Lever; +Cc: git

On Wed, 14 Sep 2005, Chuck Lever wrote:

> Daniel Barkalow wrote:
> > The really exciting thing to do would be to have different programs use
> > different implementations, by way of linker magic.
> 
> yes, i've been considering that, but i'm not sure it is really worth the
> effort.  see below -- the right data structure should be good for just about
> any git workload.
> 
> > My guess for the ideal is to have a linked list with a hashtable for finding
> > entries by looking up names, because we don't look things up by index. This
> > combination gives O(1) in-order iteration, O(1) lookup by name, O(1) append,
> > O(n) insert, and O(1) remove. This means that git-update-cache --add would
> > be slow, but everything else would be fast. (Except, of course, for the
> > overhead of actually reading and writing the index file, rather than mmaping
> > it.)
> 
> [ i'm not sure why you think insert would be O(n). ]

You need to find the correct location to insert in the sorted list, and 
the hash table won't help you, because it doesn't have the new name. 
Remember that the cursors need to go through the index in order, so the 
list has to stay sorted.

> keeping the linked list for O(1) next/prev and delete, and augmenting it with
> a hash table to allow O(m/n) insert and find would be ideal.  with a fairly
> large hash table, we do better than a tree for any reasonably sized repository
> i can imagine.
> 
> and, i believe simply adding a hash table to my list implementation will be
> easy, and simpler overall than a tree implementation.  famous last words.

I've written a nice hash table which should work well, if you want to make 
the coding style suitable.

> mmapping the index file is still OK.  i haven't changed the cache_entry
> structure at all.

Oh, right, I forgot that the orgnaizational structure isn't the array of 
structs, but an array of pointers.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 20:40           ` Daniel Barkalow
@ 2005-09-14 22:28             ` Chuck Lever
  2005-09-14 22:50               ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Chuck Lever @ 2005-09-14 22:28 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git

[-- Attachment #1: Type: text/plain, Size: 924 bytes --]

Daniel Barkalow wrote:
> On Wed, 14 Sep 2005, Chuck Lever wrote:
>>[ i'm not sure why you think insert would be O(n). ]
> 
> 
> You need to find the correct location to insert in the sorted list, and 
> the hash table won't help you, because it doesn't have the new name. 
> Remember that the cursors need to go through the index in order, so the 
> list has to stay sorted.

oh, i see.  the hash table won't help cache_find_name find an insertion 
point quickly if the name isn't already in the cache.

in fact, this will impact the other places that need an insertion point, 
such as ls-files and merge-index, as well as your new merge algorithm 
(which inserts all merged entries into the active cache one at a time 
via add_cache_entry).

considering that add_cache_entry can do a cache lookup several times, i 
think we need the "not found, returning insertion point" case to be fast 
too.

back to the drawring board.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763 4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668 1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 22:28             ` Chuck Lever
@ 2005-09-14 22:50               ` Linus Torvalds
  2005-09-14 23:23                 ` Daniel Barkalow
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2005-09-14 22:50 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Daniel Barkalow, git

On Wed, 14 Sep 2005, Chuck Lever wrote:
> 
> oh, i see.  the hash table won't help cache_find_name find an insertion 
> point quickly if the name isn't already in the cache.

Note that almost all insertion tends to happen linearly.

In particular, read-tree always inserts things in order.

So probably _most_ of the file finding could actually use even a stupid 
linear search, if they just had a place to start from. And 99% of the 
time, it would be very close to where they wanted to be.

Hmm?

		Linus

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 22:50               ` Linus Torvalds
@ 2005-09-14 23:23                 ` Daniel Barkalow
  2005-09-15 14:01                   ` Chuck Lever
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel Barkalow @ 2005-09-14 23:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Chuck Lever, git

On Wed, 14 Sep 2005, Linus Torvalds wrote:

> On Wed, 14 Sep 2005, Chuck Lever wrote:
> > 
> > oh, i see.  the hash table won't help cache_find_name find an insertion 
> > point quickly if the name isn't already in the cache.
> 
> Note that almost all insertion tends to happen linearly.
> 
> In particular, read-tree always inserts things in order.

read-tree (with Chuck's latest work) should actually only append entries 
to an initially-empty list, which is even easier. Dunno about the other 
stuff, but I'd guess inserting into a cursor would handle a lot of it.

	-Daniel
*This .sig left intentionally blank*

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 21/22] teach the merge algorithm about cache iterators
  2005-09-14 23:23                 ` Daniel Barkalow
@ 2005-09-15 14:01                   ` Chuck Lever
  0 siblings, 0 replies; 49+ messages in thread
From: Chuck Lever @ 2005-09-15 14:01 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: Linus Torvalds, git

[-- Attachment #1: Type: text/plain, Size: 852 bytes --]

Daniel Barkalow wrote:
> On Wed, 14 Sep 2005, Linus Torvalds wrote:
> 
> 
>>On Wed, 14 Sep 2005, Chuck Lever wrote:
>>
>>>oh, i see.  the hash table won't help cache_find_name find an insertion 
>>>point quickly if the name isn't already in the cache.
>>
>>Note that almost all insertion tends to happen linearly.
>>
>>In particular, read-tree always inserts things in order.
> 
> read-tree (with Chuck's latest work) should actually only append entries 
> to an initially-empty list, which is even easier. Dunno about the other 
> stuff, but I'd guess inserting into a cursor would handle a lot of it.

i'm implementing the splay tree now.

part of the insertion process is to splay the insertion point up to the 
root of the tree.  if what you and linus says is true, then the search 
for the next insertion point will be very fast most of the time.

[-- Attachment #2: cel.vcf --]
[-- Type: text/x-vcard, Size: 439 bytes --]

begin:vcard
fn:Chuck Lever
n:Lever;Charles
org:Network Appliance, Incorporated;Linux NFS Client Development
adr:535 West William Street, Suite 3100;;Center for Information Technology Integration;Ann Arbor;MI;48103-4943;USA
email;internet:cel@citi.umich.edu
title:Member of Technical Staff
tel;work:+1 734 763-4415
tel;fax:+1 734 763 4434
tel;home:+1 734 668-1089
x-mozilla-html:FALSE
url:http://www.monkey.org/~cel/
version:2.1
end:vcard


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2005-09-15 14:01 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-09-12 14:55 [PATCH 00/22] cache cursors: an introduction Chuck Lever
2005-09-12 14:55 ` [PATCH 01/22] introduce facility to walk through the active cache Chuck Lever
2005-09-12 14:55 ` [PATCH 02/22] use cache iterator in checkout-index.c Chuck Lever
2005-09-12 14:55 ` [PATCH 03/22] teach diff.c about cache iterators Chuck Lever
2005-09-12 14:55 ` [PATCH 04/22] teach diff-index.c " Chuck Lever
2005-09-12 14:55 ` [PATCH 05/22] teach diff-files.c " Chuck Lever
2005-09-12 14:55 ` [PATCH 06/22] teach diff-stages.c " Chuck Lever
2005-09-12 14:55 ` [PATCH 07/22] teach fsck-objects.c to use " Chuck Lever
2005-09-12 14:56 ` [PATCH 08/22] teach ls-files.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 09/22] teach read-tree.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 10/22] teach update-index.c about cache cursors Chuck Lever
2005-09-12 14:56 ` [PATCH 11/22] teach write-tree.c to use cache iterators Chuck Lever
2005-09-12 14:56 ` [PATCH 12/22] simplify write_cache() calling sequence Chuck Lever
2005-09-12 14:56 ` [PATCH 13/22] move purge_cache() to read-cache.c Chuck Lever
2005-09-12 14:56 ` [PATCH 14/22] move read_cache_unmerged into read-cache.c Chuck Lever
2005-09-12 14:56 ` [PATCH 15/22] replace cache_name_pos Chuck Lever
2005-09-12 14:56 ` [PATCH 16/22] teach apply.c to use cache_find_name() Chuck Lever
2005-09-12 14:56 ` [PATCH 17/22] teach checkout-index.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 18/22] teach diff.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 19/22] teach ls-files.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 20/22] teach merge-index.c " Chuck Lever
2005-09-12 14:56 ` [PATCH 21/22] teach the merge algorithm about cache iterators Chuck Lever
2005-09-12 20:43   ` Daniel Barkalow
2005-09-13  0:02     ` Chuck Lever
2005-09-14 15:36     ` Chuck Lever
2005-09-14 16:41       ` Daniel Barkalow
2005-09-14 17:50         ` Junio C Hamano
2005-09-14 19:49         ` Chuck Lever
2005-09-14 20:40           ` Daniel Barkalow
2005-09-14 22:28             ` Chuck Lever
2005-09-14 22:50               ` Linus Torvalds
2005-09-14 23:23                 ` Daniel Barkalow
2005-09-15 14:01                   ` Chuck Lever
2005-09-12 14:56 ` [PATCH 22/22] teach read-cache.c to use cache_find_name() Chuck Lever
2005-09-12 15:38 ` [PATCH 00/22] cache cursors: an introduction A Large Angry SCM
2005-09-12 16:37   ` Chuck Lever
2005-09-12 20:26     ` Daniel Barkalow
2005-09-12 20:47     ` Junio C Hamano
2005-09-13 19:06     ` Tim Ottinger
2005-09-13 19:46       ` Junio C Hamano
2005-09-13 20:06         ` Tim Ottinger
2005-09-14  8:28       ` Catalin Marinas
2005-09-14 14:49         ` Chuck Lever
2005-09-12 19:53 ` Junio C Hamano
2005-09-12 20:22   ` Daniel Barkalow
2005-09-12 20:30     ` Junio C Hamano
2005-09-12 23:59   ` Chuck Lever
2005-09-13  0:14     ` Junio C Hamano
2005-09-13  0:17     ` Linus Torvalds

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).