CFT: merge-recursive in C

Git development
 help / color / mirror / Atom feed

* CFT: merge-recursive in C
@ 2006-06-26 23:38 Alex Riesen
  2006-06-26 23:42 ` CFT: merge-recursive in C (test updates) Alex Riesen
                   ` (6 more replies)
  0 siblings, 7 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-26 23:38 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Fredrik Kuivinen

Hi all.

I finally got pis^Witched enough by my platform at work and decided
to start the effort of converting Fredriks git-merge-recursive to C.
At the moment it is the only one annoyingly slow thing there.

The result, represented by the patch below, is really just a
conversion. I DO NOT propose it for inclusion yet (maybe someone else
looks at it and does it for me). It passes the test suite, but I don't
think it is mature enough to be used routinely. I am about to test it
extensively on my repository (which is kind of labirinth for any SCM)
at work, so it is not really tested as well.

It still uses some calls to git programs (git-update-index,
git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
the labels (-L) missing - I was unsure how to tackle this on windows -
it has only argv[1].

And it needs to be cleaned up a bit (a lot) - Python is not exactly
what you'd call "C-conversion-friendly-language", and I am not an
expert (as in "git-merge-recursive.py is my first Python experience").

To my deep disappointment, it didn't work out as good as I hoped: one
program I see most often and for longest time in the process list
(git-diff-tree) is a too complex thing to be put directly into
merge-recursive.c, so any help in this direction will be greatly
appreciated.

Still, I have a dependency less, some spared calls to git-cat-file,
git-rev-parse, git-read-tree and git-ls-tree, so it should be
faster (my workflow on windows is seldom constrained by CPU, that
being maxed out as per OS manufacturer recommendations).

Two files, path_list.[hc] can be probably used somewhere else - its a
crude string list. The other two, graph.[hc] - an attempt to represent
a graph. The "struct node" (a graph node representation) looks
somewhat superflous to me (copied from Commit) and probably should be
replaced by "struct commit". "merge-recursive.c" is what left from
"git-merge-recursive.py".

Have fun.

---

diff --git a/Makefile b/Makefile
index cde619c..660f09b 100644
--- a/Makefile
+++ b/Makefile
@@ -163,7 +163,8 @@ PROGRAMS = \
 	git-upload-pack$X git-verify-pack$X \
 	git-symbolic-ref$X \
 	git-name-rev$X git-pack-redundant$X git-repo-config$X git-var$X \
-	git-describe$X git-merge-tree$X git-blame$X git-imap-send$X
+	git-describe$X git-merge-tree$X git-blame$X git-imap-send$X \
+	git-merge-recur$X
 
 BUILT_INS = git-log$X git-whatchanged$X git-show$X git-update-ref$X \
 	git-count-objects$X git-diff$X git-push$X git-mailsplit$X \
@@ -595,6 +596,10 @@ git-http-push$X: revision.o http.o http-
 	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) \
 		$(LIBS) $(CURL_LIBCURL) $(EXPAT_LIBEXPAT)
 
+git-merge-recur$X: merge-recursive.o graph.o path-list.o $(LIB_FILE)
+	$(CC) $(ALL_CFLAGS) -o $@ $(ALL_LDFLAGS) $(filter %.o,$^) \
+		$(LIBS)
+
 $(LIB_OBJS) $(BUILTIN_OBJS): $(LIB_H)
 $(patsubst git-%$X,%.o,$(PROGRAMS)): $(LIB_H) $(wildcard */*.h)
 $(DIFF_OBJS): diffcore.h
diff --git a/git-merge.sh b/git-merge.sh
index da5657e..aa93847 100755
--- a/git-merge.sh
+++ b/git-merge.sh
@@ -10,15 +10,15 @@ USAGE='[-n] [--no-commit] [-s <strategy>
 LF='
 '
 
-all_strategies='recursive octopus resolve stupid ours'
-default_twohead_strategies='recursive'
+all_strategies='recur recursive octopus resolve stupid ours'
+default_twohead_strategies='recur'
 default_octopus_strategies='octopus'
 no_trivial_merge_strategies='ours'
 use_strategies=
 
 index_merge=t
 if test "@@NO_PYTHON@@"; then
-	all_strategies='resolve octopus stupid ours'
+	all_strategies='recur resolve octopus stupid ours'
 	default_twohead_strategies='resolve'
 fi
 
diff --git a/graph.c b/graph.c
new file mode 100644
index 0000000..7458930
--- /dev/null
+++ b/graph.c
@@ -0,0 +1,346 @@
+#include <stdlib.h>
+#include <string.h>
+#include <sys/wait.h>
+#include "cache.h"
+#include "commit.h"
+#include "graph.h"
+
+// does not belong here
+struct tree *git_write_tree()
+{
+#if 0
+	fprintf(stderr, "GIT_INDEX_FILE='%s' git-write-tree\n",
+		getenv("GIT_INDEX_FILE"));
+#endif
+	FILE *fp = popen("git-write-tree 2>/dev/null", "r");
+	char buf[41];
+	unsigned char sha1[20];
+	int ch;
+	unsigned i = 0;
+	while ( (ch = fgetc(fp)) != EOF )
+		if ( i < sizeof(buf)-1 && ch >= '0' && ch <= 'f' )
+			buf[i++] = ch;
+		else
+			break;
+	int rc = pclose(fp);
+	if ( rc == -1 || WEXITSTATUS(rc) )
+		return NULL;
+	buf[i] = '\0';
+	if ( get_sha1(buf, sha1) != 0 )
+		return NULL;
+	return lookup_tree(sha1);
+}
+
+const char *node_title(struct node *node, int *len)
+{
+	const char *s = "(null commit)";
+	*len = strlen(s);
+
+	if ( node->virtual ) {
+		s = node->comment;
+		*len = strlen(s);
+	} else if ( node->commit ) {
+		if ( parse_commit(node->commit) != 0 ) {
+			s = "(bad commit)";
+			*len = strlen(s);
+		} else {
+			s = node->commit->buffer;
+			char prev = '\0';
+			while ( *s ) {
+				if ( '\n' == prev && '\n' == *s ) {
+					++s;
+					break;
+				}
+				prev = *s++;
+			}
+			*len = 0;
+			while ( s[*len] && '\n' != s[*len] )
+				++(*len);
+		}
+	}
+	return s;
+}
+
+const unsigned char *node_sha(const struct node *node)
+{
+	return node->commit->object.sha1;
+}
+
+const char *node_hex_sha1(const struct node *node)
+{
+	return node->commit ?  sha1_to_hex(node->commit->object.sha1):
+		node->virtual ? "virtual": "undefined";
+}
+
+static int sha_eq(const unsigned char *a, const unsigned char *b)
+{
+	if ( !a && !b )
+		return 2;
+	return a && b && memcmp(a, b, 20) == 0;
+}
+
+static struct node_list *node_list_pool = NULL;
+static unsigned node_list_pool_size = 0;
+
+unsigned node_list_count(const struct node_list *l)
+{
+	unsigned c = 0;
+	while ( l ) {
+		++c;
+		l = l->next;
+	}
+	return c;
+}
+
+struct node_list *node_list_insert(struct node *node, struct node_list **list)
+{
+	struct node_list *e;
+	if ( node_list_pool )
+		e = node_list_shift(&node_list_pool);
+	else
+		e = malloc(sizeof(struct node_list));
+	e->next = *list;
+	e->node = node;
+	*list = e;
+	return e;
+}
+
+struct node_list *node_list_free1(struct node_list *list, int free_nodes)
+{
+	struct node_list *next = list->next;
+	if ( free_nodes )
+		free(list->node);
+
+	if ( node_list_pool_size < 1000 ) {
+		list->node = NULL;
+		list->next = node_list_pool;
+		node_list_pool = list;
+	} else
+		free(list);
+	return next;
+}
+
+void node_list_free(struct node_list **list, int free_nodes)
+{
+	while ( *list )
+		node_list_free1(node_list_shift(list), free_nodes);
+}
+
+struct node *node_alloc(struct commit *commit)
+{
+	struct node *node = malloc(sizeof(struct node));
+	node->parents = NULL;
+	node->parents_count = 0;
+	node->children = NULL;
+	node->virtual = 0;
+	node->commit = commit;
+	if ( parse_commit(commit) == 0 )
+		node->tree = commit->tree;
+	else
+		die("failed to parse commit %s", sha1_to_hex(commit->object.sha1));
+
+	return node;
+}
+
+struct node *node_alloc_virtual(struct tree *tree, const char *comment)
+{
+	struct node *node = malloc(sizeof(struct node));
+	node->parents = NULL;
+	node->children = NULL;
+	node->commit = NULL;
+	node->tree = tree;
+	node->virtual = 1;
+	static unsigned virtual_id = 0;
+	node->virtual_id = virtual_id++;
+	node->comment = comment;
+	return node;
+}
+
+void node_set_parents(struct node *node, struct node_list *parents)
+{
+	struct node_list *p;
+	node->parents_count = 0;
+	for_each_node_list(p, node->parents = parents) {
+		node_list_insert(node, &p->node->children);
+		node->parents_count++;
+	}
+}
+
+static inline int node_eq(const struct node *a, const struct node *b)
+{
+	if ( a == b )
+		return 1;
+	if ( a->virtual != b->virtual )
+		return 0;
+	if ( a->virtual )
+		return a->virtual_id == b->virtual_id;
+	if ( a->commit == b->commit )
+		return 1;
+	return sha_eq(a->commit->object.sha1, b->commit->object.sha1);
+}
+
+struct node_list *node_list_find_node(const struct node *node,
+				      const struct node_list *list)
+{
+	const struct node_list *p;
+	for_each_node_list(p, list)
+		if ( node_eq(p->node, node) )
+			break;
+	return (struct node_list*)p;
+}
+
+static struct node_list *node_all_parents(struct node *node)
+{
+	struct node_list *parents = NULL, **set = &parents;
+	struct node_list *stack = NULL;
+	node_list_insert(node, &stack);
+	while ( stack ) {
+		struct node_list *el = node_list_shift(&stack);
+
+		node_list_insert(el->node, set);
+		set = &(*set)->next;
+
+		struct node_list *p = el->node->parents;
+		while ( p ) {
+			if ( !node_list_find_node(p->node, parents) )
+				node_list_insert(p->node, &stack);
+			p = p->next;
+		}
+		node_list_free1(el, 0);
+	}
+	return parents;
+}
+
+// a & b. a and are invalid after the call,
+// the result will contain all the common nodes
+struct node_list *node_list_intersect(struct node_list *a,
+				      struct node_list *b)
+{
+	struct node_list *result = NULL;
+	struct node_list *p = a;
+	while ( p ) {
+		struct node_list *next = p->next;
+		struct node_list *bn = node_list_find_node(p->node, b);
+		if ( bn ) {
+			p->next = result;
+			result = p;
+		} else
+			node_list_free1(p, 0);
+		p = next;
+	}
+	node_list_free(&b, 0);
+	return result;
+}
+
+struct node *graph_add_node(struct graph *graph, struct node *node)
+{
+	struct node_list **bucket;
+	if ( node->virtual )
+		// virtual nodes hashed by lowest byte of virtual_id
+		bucket = graph->commits + (node->virtual_id & 0xff);
+	else
+		bucket = graph->commits + node->commit->object.sha1[0];
+	node_list_insert(node, bucket);
+	return node;
+}
+
+struct node *graph_node_bysha(const struct graph *graph,
+			      const unsigned char *sha)
+{
+	const struct node_list *head = *(graph->commits + sha[0]);
+	while ( head ) {
+		if ( !head->node->virtual &&
+		     sha_eq(head->node->commit->object.sha1, sha) ) {
+			return head->node;
+		}
+		head = head->next;
+	}
+	return NULL;
+}
+
+// setup an empty tree in repository and returns it
+static struct tree *get_empty_tree()
+{
+	static struct tree *tree = NULL;
+	if ( !tree ) {
+		char *tmpindex = git_path("/merge-tmp-index");
+		setenv(INDEX_ENVIRONMENT, tmpindex, 1);
+		unlink(tmpindex);
+		tree = git_write_tree();
+		unlink(tmpindex);
+		if ( !tree )
+			die("failed to setup an empty tree");
+	}
+	return tree;
+}
+
+static struct node *addCommonRoot(struct graph *graph)
+{
+	struct node_list *roots = NULL;
+	struct node_list *p;
+	unsigned b;
+	for (b = 0; b < sizeof(graph->commits)/sizeof(graph->commits[0]); ++b) {
+		for_each_node_list(p, graph->commits[b])
+			if ( p->node->parents_count )
+				node_list_insert(p->node, &roots);
+	}
+	struct node *super = node_alloc_virtual(get_empty_tree(), "Root");
+	for_each_node_list(p, roots) {
+		struct node_list *list = NULL;
+		node_list_insert(super, &list);
+		node_set_parents(p->node, list);
+	}
+	graph_add_node(graph, super);
+	return super;
+}
+
+// getCommonAncestors: Find the common ancestors for commit1 and commit2
+struct node_list *graph_common_ancestors(struct graph *graph,
+					 struct node *commit1,
+					 struct node *commit2)
+{
+	struct node_list *shared;
+	shared = node_list_intersect(node_all_parents(commit1),
+				     node_all_parents(commit2));
+	if ( !shared )
+		node_list_insert(addCommonRoot(graph), &shared);
+
+	/*
+	   for s in shared:
+	   if len([c for c in s.children if c in shared]) == 0:
+	   res.add(s)
+	   return list(res)
+	   */
+	struct node_list *ca = NULL, **pca = &ca;
+	struct node_list *s;
+	for_each_node_list(s, shared) {
+		unsigned n = 0;
+		struct node_list *c;
+		for_each_node_list(c, s->node->children)
+			if ( node_list_find_node(c->node, shared) ) {
+				++n;
+				break;
+			}
+		if ( n == 0 ) {
+			node_list_insert(s->node, pca);
+			pca = &(*pca)->next;
+		}
+	}
+	node_list_free(&shared, 0);
+	return ca;
+}
+
+#if 0
+void node_list_print(const char *msg, const struct node_list *list)
+{
+    const struct node_list *p;
+    printf("%s\n", msg);
+    for_each_node_list(p, list) {
+	int len;
+	const char *msg = node_title(p->node, &len);
+	printf("\t%s %.*s\n", node_hex_sha1(p->node), len, msg);
+    }
+}
+#endif
+
+// vim: sw=8 noet
diff --git a/graph.h b/graph.h
new file mode 100644
index 0000000..2cdbd96
--- /dev/null
+++ b/graph.h
@@ -0,0 +1,85 @@
+#ifndef _GRAPH_H_
+#define _GRAPH_H_
+
+struct node;
+struct graph;
+struct commit;
+struct tree;
+struct commit_list;
+
+struct node_list
+{
+	struct node_list *next;
+	struct node *node;
+};
+
+struct node
+{
+	struct commit *commit;
+	struct tree *tree;
+
+	unsigned parents_count;
+	struct node_list *parents;
+	struct node_list *children;
+
+	unsigned virtual:1;
+	unsigned virtual_id;
+	const char *comment;
+};
+
+struct node_list *node_list_insert(struct node *node, struct node_list **list);
+void node_list_free(struct node_list **list, int free_nodes);
+struct node_list *node_list_free1(struct node_list *list, int free_node);
+
+static inline struct node_list *node_list_shift(struct node_list **list)
+{
+	struct node_list *head = *list;
+	*list = head->next;
+	return head;
+}
+
+static inline struct node *node_list_shift_node(struct node_list **list)
+{
+	struct node *node = (*list)->node;
+	*list = node_list_free1(*list, 0);
+	return node;
+}
+
+struct node *node_alloc(struct commit *);
+struct node *node_alloc_virtual(struct tree *, const char *comment);
+
+void node_set_parents(struct node *node, struct node_list *parents);
+unsigned node_list_count(const struct node_list *);
+
+struct node_list *node_list_find_node(const struct node *node,
+				      const struct node_list *list);
+
+
+// find the common ancestors for commit1 and commit2
+struct node_list *graph_common_ancestors(struct graph *graph,
+					 struct node *commit1,
+					 struct node *commit2);
+
+struct graph
+{
+	// hashed by first byte of SHA-1 or low byte of virtual_id
+	struct node_list *commits[256];
+};
+
+struct node *graph_add_node(struct graph *graph, struct node *node);
+struct node *graph_node_bysha(const struct graph *graph,
+			      const unsigned char *sha);
+
+#define for_each_node_list(p,list) \
+	for ( p = (list); p; p = p->next )
+
+const char *node_title(struct node *, int *len);
+const unsigned char *node_sha(const struct node *);
+const char *node_hex_sha1(const struct node *);
+void node_list_print(const char *msg, const struct node_list *list);
+
+struct tree *git_write_tree();
+
+#endif /* _GRAPH_H_ */
+
+// vim: sw=8 noet
diff --git a/merge-recursive.c b/merge-recursive.c
new file mode 100644
index 0000000..9a0181c
--- /dev/null
+++ b/merge-recursive.c
@@ -0,0 +1,1608 @@
+//
+// Recursive Merge algorithm stolen from git-merge-recursive.py by
+// Fredrik Kuivinen.
+//
+#include <stdarg.h>
+#include <string.h>
+#include <assert.h>
+#include <sys/wait.h>
+#include <sys/types.h>
+#include <sys/stat.h>
+#include "cache.h"
+#include "commit.h"
+#include "blob.h"
+#include "tree-walk.h"
+
+#include "graph.h"
+#include "path-list.h"
+
+#define for_each_commit(p,list) for ( p = (list); p; p = p->next )
+
+struct merge_result
+{
+	struct node *commit;
+	unsigned clean:1;
+};
+
+struct merge_tree_result
+{
+	struct tree *tree;
+	unsigned clean:1;
+};
+
+struct index_entry
+{
+	struct index_entry *next;
+	struct
+	{
+		unsigned mode;
+		unsigned char sha[20];
+	} stages[4];
+	unsigned processed:1;
+	char path[1];
+};
+
+struct index_entry *index_entry_alloc(const char *path)
+{
+	size_t n = strlen(path); // index_entry::path has room for \0
+	struct index_entry *p = xmalloc(sizeof(struct index_entry) + n);
+	if ( !p )
+		return NULL;
+	memcpy(p->path, path, n + 1);
+	p->next = NULL;
+	p->processed = 0;
+	return p;
+}
+
+#if 0
+static
+void print_index_entry(const char *text, const struct index_entry *e)
+{
+    printf("%s%s next: %p %s\n", text,
+	   e ? e->path: NULL,
+	   e ? e->next: NULL,
+	   e && e->processed ? "processed": "");
+    if ( e ) {
+	int i;
+	for ( i = 1; i < 4; ++i )
+	    printf("\tstage[%d]: %06o %s\n", i,
+		   e->stages[i].mode,
+		   sha1_to_hex(e->stages[i].sha));
+    }
+}
+#endif
+
+static struct path_list *currentFileSet;
+static struct path_list *currentDirectorySet;
+
+static int output_indent = 0;
+
+static void output(const char *fmt, ...)
+{
+	va_list args;
+	int i;
+	for ( i = output_indent; i--; )
+		fputs("  ", stdout);
+	va_start(args, fmt);
+	vfprintf(stdout, fmt, args);
+	va_end(args);
+	fputc('\n', stdout);
+}
+
+static const char *original_index_file;
+static const char *temporary_index_file;
+
+static void setup_index(int temp)
+{
+	const char *idx = temp ? temporary_index_file: original_index_file;
+	unlink(temporary_index_file);
+	setenv("GIT_INDEX_FILE", idx, 1);
+}
+
+// This is a global variable which is used in a number of places but
+// only written to in the 'merge' function.
+
+// index_only == 1    => Don't leave any non-stage 0 entries in the cache and
+//                       don't update the working directory.
+//               0    => Leave unmerged entries in the cache and update
+//                       the working directory.
+static int index_only = 0; // cacheOnly
+
+static int git_read_tree(const struct tree *tree)
+{
+#if 0
+	fprintf(stderr, "GIT_INDEX_FILE='%s' git-read-tree %s\n",
+		getenv("GIT_INDEX_FILE"),
+		sha1_to_hex(tree->object.sha1));
+#endif
+	char cmd[80];
+	sprintf(cmd, "git-read-tree %s", sha1_to_hex(tree->object.sha1));
+	int rc = system(cmd);
+	return rc == -1 ? -1: WEXITSTATUS(rc);
+}
+
+static int git_merge_trees(const char *update_arg,
+			   struct tree *common,
+			   struct tree *head,
+			   struct tree *merge)
+{
+#if 0
+	fprintf(stderr, "GIT_INDEX_FILE='%s' git-read-tree %s -m %s %s %s\n",
+		getenv("GIT_INDEX_FILE"),
+		update_arg,
+		sha1_to_hex(common->object.sha1),
+		sha1_to_hex(head->object.sha1),
+		sha1_to_hex(merge->object.sha1));
+#endif
+	char cmd[256];
+	sprintf(cmd, "git-read-tree %s -m %s %s %s",
+		update_arg,
+		sha1_to_hex(common->object.sha1),
+		sha1_to_hex(head->object.sha1),
+		sha1_to_hex(merge->object.sha1));
+	int rc = system(cmd);
+	return rc == -1 ? -1: WEXITSTATUS(rc);
+}
+
+struct merge_tree_result merge_trees(struct tree *head,
+				     struct tree *merge,
+				     struct tree *common,
+				     const char *branch1Name,
+				     const char *branch2Name);
+
+
+// The entry point to the merge code
+
+// Merge the commits h1 and h2, return the resulting virtual
+// commit object and a flag indicating the cleaness of the merge.
+static
+struct merge_result merge(struct node *h1,
+			  struct node *h2,
+			  const char *branch1Name,
+			  const char *branch2Name,
+			  struct graph *graph,
+			  int callDepth /* =0 */,
+			  struct node *ancestor /* =None */)
+{
+	struct merge_result result = { NULL, 0 };
+
+	const char *msg;
+	int msglen;
+	output("Merging:");
+	msg = node_title(h1, &msglen);
+	output("%s %.*s", node_hex_sha1(h1), msglen, msg);
+	msg = node_title(h2, &msglen);
+	output("%s %.*s", node_hex_sha1(h2), msglen, msg);
+	if ( !ancestor && !graph )
+		die("graph is not initialized");
+	struct node_list *ca = NULL;
+	if ( ancestor )
+		node_list_insert(ancestor, &ca);
+	else
+		ca = graph_common_ancestors(graph, h1, h2);
+
+	output("found %u common ancestor(s):", node_list_count(ca));
+	struct node_list *x;
+	for_each_node_list(x,ca) {
+		msg = node_title(x->node, &msglen);
+		output("%s %.*s", node_hex_sha1(x->node), msglen, msg);
+	}
+
+	struct node *mergedCA = node_list_shift_node(&ca);
+
+	struct node_list *h;
+	for_each_commit(h,ca) {
+		output_indent = callDepth + 1;
+		result = merge(mergedCA, h->node,
+			       "Temporary merge branch 1",
+			       "Temporary merge branch 2",
+			       graph,
+			       callDepth + 1,
+			       NULL);
+		mergedCA = result.commit;
+		output_indent = callDepth;
+
+		if ( !mergedCA )
+			die("merge returned no commit");
+	}
+
+	if ( callDepth == 0 ) {
+		setup_index(0);
+		index_only = 0;
+	} else {
+		setup_index(1);
+		git_read_tree(h1->tree);
+		index_only = 1;
+	}
+
+	struct merge_tree_result mtr;
+	mtr = merge_trees(h1->tree, h2->tree,
+			  mergedCA->tree, branch1Name, branch2Name);
+
+	if ( graph && (mtr.clean || index_only) ) {
+		result.commit = node_alloc_virtual(mtr.tree, "merged tree");
+		struct node_list *parents = NULL;
+		node_list_insert(h1, &parents);
+		node_list_insert(h2, &parents->next);
+		node_set_parents(result.commit, parents);
+		graph_add_node(graph, result.commit);
+	} else
+		result.commit = NULL;
+
+	result.clean = mtr.clean;
+	return result;
+}
+
+#define READ_TREE_FOUND 2
+typedef int (*read_tree_rt_fn_t)(const char *sha1,
+				 const char *path,
+				 unsigned mode,
+				 void *data);
+
+// git-ls-tree -r -t <tree>
+static int read_tree_rt(struct tree *tree,
+			const char *base,
+			int baselen,
+			read_tree_rt_fn_t fn,
+			void *data)
+{
+	struct tree_desc desc;
+	struct name_entry entry;
+
+	if (parse_tree(tree))
+		return -1;
+
+	desc.buf = tree->buffer;
+	desc.size = tree->size;
+
+	while ( tree_entry(&desc, &entry) ) {
+		int retval;
+		char *path = xmalloc(baselen + entry.pathlen + 2);
+		memcpy(path, base, baselen);
+		memcpy(path + baselen, entry.path, entry.pathlen);
+		path[baselen + entry.pathlen] = '\0';
+
+		switch ( retval = fn(entry.sha1, path, entry.mode, data) ) {
+		case READ_TREE_RECURSIVE:
+			break;
+		case 0:
+			free(path);
+			continue;
+		default:
+			free(path);
+			return retval;
+		}
+		if (S_ISDIR(entry.mode)) {
+			path[baselen + entry.pathlen] = '/';
+			path[baselen + entry.pathlen + 1] = '\0';
+			retval = read_tree_rt(lookup_tree(entry.sha1),
+					      path,
+					      baselen + entry.pathlen + 1,
+					      fn, data);
+			path[baselen + entry.pathlen] = '\0';
+			if (retval) {
+				free(path);
+				return retval;
+			}
+		}
+		free(path);
+	}
+	return 0;
+}
+
+struct files_and_dirs
+{
+	struct path_list **files;
+	struct path_list **dirs;
+};
+
+static int save_files_dirs(const char *sha1,
+			   const char *path_,
+			   unsigned mode,
+			   void *data_)
+{
+	struct files_and_dirs *data = data_;
+	char *path = strdup(path_);
+
+	if (S_ISDIR(mode)) {
+		path_list_insert(path, data->dirs);
+		data->dirs = &(*data->dirs)->next;
+	} else {
+		path_list_insert(path, data->files);
+		data->files = &(*data->files)->next;
+	}
+	return READ_TREE_RECURSIVE;
+}
+
+int getFilesAndDirs(struct tree *tree,
+		    struct path_list **files,
+		    struct path_list **dirs)
+{
+	struct files_and_dirs data;
+	path_list_clear(files, 1);
+	path_list_clear(dirs, 1);
+	data.files = files;
+	data.dirs = dirs;
+	if ( read_tree_rt(tree, "", 0, save_files_dirs, &data) != 0 )
+		return 0;
+	return path_list_count(*files) + path_list_count(*dirs);
+}
+
+struct index_entry *index_entry_find(struct index_entry *ents, const char *path)
+{
+	struct index_entry *e;
+	for ( e = ents; e; e = e->next )
+		if ( strcmp(e->path, path) == 0 )
+			break;
+	return e;
+}
+
+struct index_entry *index_entry_get(struct index_entry **ents, const char *path)
+{
+	struct index_entry *e, **tail = ents;
+	for ( e = *ents; e; e = e->next ) {
+		if ( strcmp(e->path, path) == 0 )
+			return e;
+		tail = &e->next;
+	}
+	e = index_entry_alloc(path);
+	memset(e->stages, 0, sizeof(e->stages));
+	return *tail = e;
+}
+
+struct find_entry
+{
+	const char *path;
+	unsigned char *sha;
+	unsigned *mode;
+};
+
+static int find_entry(const char *sha,
+		      const char *path,
+		      unsigned mode,
+		      void *data_)
+{
+	struct find_entry *data = data_;
+	if ( strcmp(path, data->path) == 0 ) {
+		memcpy(data->sha, sha, 20);
+		*data->mode = mode;
+		return READ_TREE_FOUND;
+	}
+	return READ_TREE_RECURSIVE;
+}
+
+// Returns a CacheEntry object which doesn't have to correspond to
+// a real cache entry in Git's index.
+struct index_entry *index_entry_from_db(const char *path,
+					struct tree *o,
+					struct tree *a,
+					struct tree *b)
+{
+	struct index_entry *e = index_entry_alloc(path);
+	struct find_entry data;
+	data.path = path;
+	data.sha = e->stages[1].sha;
+	data.mode = &e->stages[1].mode;
+	if ( read_tree_rt(o, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
+		// fprintf(stderr, "1: %s:%s not found\n",
+		// 	sha1_to_hex(o->object.sha1), path);
+		memcpy(e->stages[1].sha, null_sha1, 20);
+		e->stages[1].mode = 0;
+	}
+	data.sha = e->stages[2].sha;
+	data.mode = &e->stages[2].mode;
+	if ( read_tree_rt(a, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
+		// fprintf(stderr, "2: %s:%s not found\n",
+		// 	sha1_to_hex(a->object.sha1), path);
+		memcpy(e->stages[2].sha, null_sha1, 20);
+		e->stages[2].mode = 0;
+	}
+	data.sha = e->stages[3].sha;
+	data.mode = &e->stages[3].mode;
+	if ( read_tree_rt(b, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
+		// fprintf(stderr, "3: %s:%s not found\n",
+		// 	sha1_to_hex(b->object.sha1), path);
+		memcpy(e->stages[3].sha, null_sha1, 20);
+		e->stages[3].mode = 0;
+	}
+	return e;
+}
+
+void free_index_entries(struct index_entry **ents)
+{
+	while ( *ents ) {
+		struct index_entry *next = (*ents)->next;
+		free(*ents);
+		*ents = next;
+	}
+}
+
+static int fget_mode(unsigned *mode, FILE *fp, int *ch)
+{
+	int p;
+	char buf[8];
+	for ( p = 0; (*ch = fgetc(fp)) != EOF && p < 6; ) {
+		if ( *ch == '\x20' || *ch == '\t' || *ch == '\n' || *ch == '\r' )
+			break;
+		if ( *ch < '0' || *ch > '7' )
+			return -1;
+		buf[p++] = *ch;
+	}
+	buf[p] = '\0';
+	*mode = strtoul(buf, 0, 8);
+	return 0;
+}
+
+static int fget_sha1(unsigned char *sha, FILE *fp, int *ch)
+{
+	char buf[40];
+	int p;
+	for ( p = 0; (*ch = fgetc(fp)) != EOF && p < 40; ) {
+		if ( ('0' <= *ch && *ch <= '9') ||
+		     ('a' <= *ch && *ch <= 'f') ||
+		     ('A' <= *ch && *ch <= 'F') )
+			buf[p++] = *ch;
+		else
+			return -1;
+	}
+	if ( p != 40 || get_sha1_hex(buf, sha) == -1 )
+		return -1;
+	return 0;
+}
+// Create a dictionary mapping file names to CacheEntry objects. The
+// dictionary contains one entry for every path with a non-zero stage entry.
+struct index_entry *unmergedCacheEntries()
+{
+	struct index_entry *unmerged = NULL;
+	FILE *fp = popen("git-ls-files -z --unmerged", "r");
+	if ( !fp )
+		return NULL;
+	int ch;
+	while ( !feof(fp) ) {
+		unsigned mode;
+		unsigned char sha[20];
+		char stage = '0';
+		char path[PATH_MAX];
+		int p;
+		// mode
+		if ( fget_mode(&mode, fp, &ch) )
+			goto wait_eol;
+		if ( '\x20' != ch )
+			goto wait_eol;
+		// SHA1
+		if ( fget_sha1(sha, fp, &ch) )
+			goto wait_eol;
+		if ( '\x20' != ch )
+			goto wait_eol;
+		// stage
+		if ( (ch = fgetc(fp)) != EOF ) {
+			stage = ch;
+			if ( ch < '1' || ch > '3' )
+				goto wait_eol;
+		}
+		if ( (ch = fgetc(fp)) == EOF || '\t' != ch )
+			goto wait_eol;
+		// path
+		for ( p = 0; (ch = fgetc(fp)) != EOF; ++p ) {
+			path[p] = ch;
+			if ( !ch )
+				break;
+			if ( p == sizeof(path) - 1 ) {
+				path[p] = '\0';
+				error("path too long: %s", path);
+				goto wait_eol;
+			}
+		}
+		if ( ch )
+			goto wait_eol;
+		// printf("unmerged %08o %s %c %s\n",mode,sha1_to_hex(sha),stage,path);
+		struct index_entry *e = index_entry_get(&unmerged, path);
+		e->stages[stage - '1' + 1].mode = mode;
+		memcpy(e->stages[stage - '1' + 1].sha, sha, 20);
+		continue;
+	wait_eol:
+		while ( (ch = fgetc(fp)) != EOF && ch );
+	}
+	pclose(fp);
+	return unmerged;
+}
+
+struct rename_entry
+{
+	struct rename_entry *next;
+
+	char *src;
+	unsigned char src_sha[20];
+	unsigned src_mode;
+	struct index_entry *src_entry;
+
+	char *dst;
+	unsigned char dst_sha[20];
+	unsigned dst_mode;
+	struct index_entry *dst_entry;
+
+	unsigned score;
+	unsigned processed:1;
+};
+
+struct rename_entry *find_rename_bysrc(struct rename_entry *e,
+				       const char *name)
+{
+	while ( e ) {
+		if ( strcmp(e->src, name) == 0 )
+			break;
+		e = e->next;
+	}
+	return e;
+}
+
+struct rename_entry *find_rename_bydst(struct rename_entry *e,
+				       const char *name)
+{
+	while ( e ) {
+		if ( strcmp(e->dst, name) == 0 )
+			break;
+		e = e->next;
+	}
+	return e;
+}
+
+void rename_entry_free(struct rename_entry *p)
+{
+	free(p->src);
+	free(p->dst);
+	free(p);
+}
+
+void free_rename_entries(struct rename_entry **list)
+{
+	while ( *list ) {
+		struct rename_entry *next = (*list)->next;
+		rename_entry_free(*list);
+		*list = next;
+	}
+}
+
+// Get information of all renames which occured between 'oTree' and
+// 'tree'. We need the three trees in the merge ('oTree', 'aTree' and
+// 'bTree') to be able to associate the correct cache entries with
+// the rename information. 'tree' is always equal to either aTree or bTree.
+struct rename_entry *getRenames(struct tree *tree,
+				struct tree *oTree,
+				struct tree *aTree,
+				struct tree *bTree,
+				struct index_entry **entries)
+{
+	struct rename_entry *renames = NULL;
+	struct rename_entry **rptr = &renames;
+	char cmd[PATH_MAX];
+	sprintf(cmd, "git-diff-tree -M --diff-filter=R -r -z %s %s",
+		sha1_to_hex(oTree->object.sha1),
+		sha1_to_hex(tree->object.sha1));
+	FILE *fp = popen(cmd, "r");
+	while ( !feof(fp) ) {
+		char buf[PATH_MAX+1];
+		int p, ch;
+		struct rename_entry *re = NULL;
+		if ( (ch = fgetc(fp)) == EOF )
+			break;
+		else if ( ':' != ch )
+			goto error;
+		re = xmalloc(sizeof(*re));
+		re->src = re->dst = NULL;
+		re->next = NULL;
+		re->processed = 0;
+		// mode1
+		if ( fget_mode(&re->src_mode, fp, &ch) )
+			goto error;
+		if ( '\x20' != ch )
+			goto error;
+		// mode2
+		if ( fget_mode(&re->dst_mode, fp, &ch) )
+			goto error;
+		if ( '\x20' != ch )
+			goto error;
+		// src sha1
+		if ( fget_sha1(re->src_sha, fp, &ch) )
+			goto error;
+		if ( '\x20' != ch )
+			goto error;
+		// dst sha1
+		if ( fget_sha1(re->dst_sha, fp, &ch) )
+			goto error;
+		if ( '\x20' != ch )
+			goto error;
+		// score
+		if ( (ch = fgetc(fp)) != EOF )
+			if ( ch != 'R' )
+				goto wait_3eol;
+		for ( p = 0; (ch = fgetc(fp)) != EOF; ++p ) {
+			buf[p] = ch;
+			if ( p == 3 ) {
+				buf[p] = '\0';
+				break;
+			}
+			if ( !ch )
+				break;
+			if ( ch < '0' || ch > '9' )
+				goto error;
+		}
+		if ( '\0' != ch && ((ch = fgetc(fp)) == EOF || '\0' != ch) )
+			goto error;
+		re->score = strtol(buf, 0, 10);
+		// name 1
+		for ( p = 0; (ch = fgetc(fp)) != EOF; ++p ) {
+			buf[p] = ch;
+			if ( !ch )
+				break;
+			if ( p == PATH_MAX ) {
+				buf[p] = '\0';
+				error("name1 too long: %s", buf);
+				goto error;
+			}
+		}
+		re->src = strdup(buf);
+		// name 2
+		for ( p = 0; (ch = fgetc(fp)) != EOF; ++p ) {
+			buf[p] = ch;
+			if ( !ch )
+				break;
+			if ( p == PATH_MAX ) {
+				buf[p] = '\0';
+				error("name2 too long: %s", buf);
+				goto error;
+			}
+		}
+		re->dst = strdup(buf);
+
+		re->src_entry = index_entry_find(*entries, re->src);
+		if ( !re->src_entry ) {
+			re->src_entry = index_entry_from_db(re->src, oTree, aTree, bTree);
+			re->src_entry->next = *entries;
+			*entries = re->src_entry;
+		}
+		re->dst_entry = index_entry_find(*entries, re->dst);
+		if ( !re->dst_entry ) {
+			re->dst_entry = index_entry_from_db(re->dst, oTree, aTree, bTree);
+			re->dst_entry->next = *entries;
+			*entries = re->dst_entry;
+		}
+		*rptr = re;
+		rptr = &re->next;
+		// printf("re %s:%s -> %s:%s R%d\n",
+		//        sha1_to_hex(re->src_sha), re->src,
+		//        sha1_to_hex(re->dst_sha), re->dst,
+		//        re->score);
+		// print_index_entry("re->src_entry ", re->src_entry);
+		// print_index_entry("re->dst_entry ", re->dst_entry);
+		continue;
+	wait_3eol:
+		while ( (ch = fgetc(fp)) != EOF && ch );
+		while ( (ch = fgetc(fp)) != EOF && ch );
+	error:
+		rename_entry_free(re);
+		while ( (ch = fgetc(fp)) != EOF && ch );
+	}
+	pclose(fp);
+	return renames;
+}
+
+int setIndexStages(const char *path,
+		   unsigned char *osha, unsigned omode,
+		   unsigned char *asha, unsigned amode,
+		   unsigned char *bsha, unsigned bmode,
+		   int clear /* =True */)
+{
+	FILE *fp = popen("git-update-index -z --index-info", "w");
+	if ( !fp )
+		return -1;
+	if ( clear ) {
+		fprintf(fp, "0 %s\t%s", sha1_to_hex(null_sha1), path);
+		fputc('\0', fp);
+	}
+	if ( omode ) {
+		fprintf(fp, "0%o %s 1\t%s", omode, sha1_to_hex(osha), path);
+		fputc('\0', fp);
+	}
+	if ( amode ) {
+		fprintf(fp, "0%o %s 2\t%s", amode, sha1_to_hex(asha), path);
+		fputc('\0', fp);
+	}
+	if ( bmode ) {
+		fprintf(fp, "0%o %s 3\t%s", bmode, sha1_to_hex(bsha), path);
+		fputc('\0', fp);
+	}
+	return pclose(fp);
+}
+
+static int remove_path(const char *name)
+{
+	int ret;
+	char *slash;
+
+	ret = unlink(name);
+	if ( ret )
+		return ret;
+	int len = strlen(name);
+	char *dirs = malloc(len+1);
+	memcpy(dirs, name, len);
+	dirs[len] = '\0';
+	while ( (slash = strrchr(name, '/')) ) {
+		*slash = '\0';
+		len = slash - name;
+		if ( rmdir(name) != 0 )
+			break;
+	}
+	free(dirs);
+	return ret;
+}
+
+int removeFile(int clean, const char *path)
+{
+	int updateCache = index_only || clean;
+	int updateWd = !index_only;
+
+	if ( updateCache ) {
+		FILE *fp = popen("git-update-index --force-remove -z --stdin", "w");
+		if ( !fp )
+			return -1;
+		fputs(path, fp);
+		fputc('\0', fp);
+		if ( pclose(fp) != 0 )
+			return -1;
+	}
+	if ( updateWd )
+	{
+		unlink(path);
+		if ( errno != ENOENT || errno != EISDIR )
+			return -1;
+		remove_path(path);
+	}
+	return 0;
+}
+
+char *uniquePath(const char *path, const char *branch)
+{
+	char *newpath = xmalloc(strlen(path) + 1 + strlen(branch) + 8 + 1);
+	strcpy(newpath, path);
+	strcat(newpath, "~");
+	char *p = newpath + strlen(newpath);
+	strcpy(p, branch);
+	for ( ; *p; ++p )
+		if ( '/' == *p )
+			*p = '_';
+	int suffix = 0;
+	struct stat st;
+	while ( path_list_has_path(currentFileSet, newpath) ||
+		path_list_has_path(currentDirectorySet, newpath) ||
+		lstat(newpath, &st) == 0 ) {
+		sprintf(p, "_%d", suffix++);
+	}
+	path_list_insert(newpath, &currentFileSet);
+	return newpath;
+}
+
+int mkdir_p(const char *path, unsigned long mode, int create_last)
+{
+	char *buf = strdup(path);
+	char *p;
+
+	for ( p = buf; *p; ++p ) {
+		if ( *p != '/' )
+			continue;
+		*p = '\0';
+		if (mkdir(buf, mode)) {
+			int e = errno;
+			if ( e == EEXIST ) {
+				struct stat st;
+				if ( !stat(buf, &st) && S_ISDIR(st.st_mode) )
+					goto next; /* ok */
+				errno = e;
+			}
+			free(buf);
+			return -1;
+		}
+	next:
+		*p = '/';
+	}
+	free(buf);
+	if ( create_last && mkdir(path, mode) )
+		return -1;
+	return 0;
+}
+
+/* stolen from builtin-cat-file.c */
+static void flush_buffer(int fd, const char *buf, unsigned long size)
+{
+	while (size > 0) {
+		long ret = xwrite(fd, buf, size);
+		if (ret < 0) {
+			/* Ignore epipe */
+			if (errno == EPIPE)
+				break;
+			die("git-cat-file: %s", strerror(errno));
+		} else if (!ret) {
+			die("git-cat-file: disk full?");
+		}
+		size -= ret;
+		buf += ret;
+	}
+}
+
+void updateFileExt(const unsigned char *sha,
+		   unsigned mode,
+		   const char *path,
+		   int updateCache,
+		   int updateWd)
+{
+	if ( index_only )
+		updateWd = 0;
+
+	if ( updateWd ) {
+		char type[20];
+		void *buf;
+		unsigned long size;
+
+		buf = read_sha1_file(sha, type, &size);
+		if (!buf)
+			die("cannot read object %s '%s'", sha1_to_hex(sha), path);
+		if ( strcmp(type, blob_type) != 0 )
+			die("blob expected for %s '%s'", sha1_to_hex(sha), path);
+
+		if ( S_ISREG(mode) ) {
+			if ( mkdir_p(path, 0777, 0 /* don't create last element */) )
+				die("failed to create path %s: %s", path, strerror(errno));
+			unlink(path);
+			if ( mode & 0100 )
+				mode = 0777;
+			else
+				mode = 0666;
+			int fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
+			if ( fd < 0 )
+				die("failed to open %s: %s", path, strerror(errno));
+			flush_buffer(fd, buf, size);
+			close(fd);
+		} else if ( S_ISLNK(mode) ) {
+			char *linkTarget = malloc(size + 1);
+			memcpy(linkTarget, buf, size);
+			linkTarget[size] = '\0';
+			mkdir_p(path, 0777, 0);
+			symlink(linkTarget, path);
+		} else
+			die("do not know what to do with %06o %s '%s'",
+			    mode, sha1_to_hex(sha), path);
+	}
+	if ( updateCache )
+	{
+		FILE *fp;
+		// XXX just always use "git update-index --index-info"?
+		if ( updateWd ) {
+			fp = popen("git-update-index --add -z --stdin", "w");
+			fputs(path, fp);
+			fputc('\0', fp);
+		} else {
+			fp = popen("git-update-index --add -z --index-info", "w");
+			fprintf(fp, "%06o %s\t%s", mode, sha1_to_hex(sha), path);
+			fputc('\0', fp);
+		}
+		pclose(fp);
+	}
+}
+
+void updateFile(int clean,
+		const unsigned char *sha,
+		unsigned mode,
+		const char *path)
+{
+	updateFileExt(sha, mode, path, index_only || clean, !index_only);
+}
+
+// Low level file merging, update and removal
+// ------------------------------------------
+struct merge_file_info
+{
+	unsigned char sha[20];
+	unsigned mode;
+	unsigned clean:1;
+	unsigned merge:1;
+};
+
+static char *git_unpack_file(unsigned char *sha1, char *path)
+{
+	void *buf;
+	char type[20];
+	unsigned long size;
+	int fd;
+
+	buf = read_sha1_file(sha1, type, &size);
+	if (!buf || strcmp(type, blob_type))
+		die("unable to read blob object %s", sha1_to_hex(sha1));
+
+	strcpy(path, ".merge_file_XXXXXX");
+	fd = mkstemp(path);
+	if (fd < 0)
+		die("unable to create temp-file");
+	if (write(fd, buf, size) != size)
+		die("unable to write temp-file");
+	close(fd);
+	return path;
+}
+
+struct merge_file_info
+mergeFile(const char *oPath, unsigned char *oSha, unsigned oMode,
+	  const char *aPath, unsigned char *aSha, unsigned aMode,
+	  const char *bPath, unsigned char *bSha, unsigned bMode,
+	  const char *branch1Name, const char *branch2Name)
+{
+	struct merge_file_info result;
+	result.merge = 0;
+	result.clean = 1;
+
+	if ( (S_IFMT & aMode) != (S_IFMT & bMode) ) {
+		result.clean = 0;
+		if ( S_ISREG(aMode) ) {
+			result.mode = aMode;
+			memcpy(result.sha, aSha, 20);
+		} else {
+			result.mode = bMode;
+			memcpy(result.sha, bSha, 20);
+		}
+	} else {
+		if ( memcmp(aSha, oSha, 20) != 0 && memcmp(bSha, oSha, 20) != 0 )
+			result.merge = 1;
+
+		result.mode = aMode == oMode ? bMode: aMode;
+
+		if ( memcmp(aSha, oSha, 20) == 0 )
+			memcpy(result.sha, bSha, 20);
+		else if ( memcmp(bSha, oSha, 20) == 0 )
+			memcpy(result.sha, aSha, 20);
+		else if ( S_ISREG(aMode) ) {
+
+			int code = 1;
+			char orig[PATH_MAX];
+			char src1[PATH_MAX];
+			char src2[PATH_MAX];
+
+			git_unpack_file(oSha, orig);
+			git_unpack_file(aSha, src1);
+			git_unpack_file(bSha, src2);
+
+#if 0
+			[out, code] = runProgram(["merge",
+				 "-L", branch1Name + "/" + aPath,
+				 "-L", "orig/" + oPath,
+				 "-L", branch2Name + "/" + bPath,
+				 src1, orig, src2],
+				 returnCode=True)
+#endif
+			char cmd[PATH_MAX];
+			// TODO labels
+			snprintf(cmd, sizeof(cmd), "merge %s %s %s", src1, orig, src2);
+			printf("%s\n", cmd);
+			code = system(cmd);
+			if ( code == -1 )
+				die("Failed to execute 'merge'. merge(1) is used as the "
+				    "file-level merge tool. Is 'merge' in your path?");
+			snprintf(cmd, sizeof(cmd), "git-hash-object -t blob -w %s", src1);
+			FILE *fp = popen(cmd, "r");
+			if ( !fp )
+				die("cannot run git-hash-object: %s", strerror(errno));
+			int ch;
+			if ( fget_sha1(result.sha, fp, &ch) )
+				die("invalid output from git-hash-object");
+			pclose(fp);
+
+			unlink(orig);
+			unlink(src1);
+			unlink(src2);
+
+			result.clean = WEXITSTATUS(code) == 0;
+		} else {
+			if ( !(S_ISLNK(aMode) || S_ISLNK(bMode)) )
+				die("cannot merge modes?");
+
+			memcpy(result.sha, aSha, 20);
+
+			if ( memcmp(aSha, bSha, 20) != 0 )
+				result.clean = 0;
+		}
+	}
+
+	return result;
+}
+
+static void memswp(void *p1, void *p2, unsigned n)
+{
+	unsigned char *a = p1, *b = p2;
+	while ( n-- ) {
+		*a ^= *b;
+		*b ^= *a;
+		*a ^= *b;
+		++a;
+		++b;
+	}
+}
+
+int processRenames(struct rename_entry *renamesA,
+		   struct rename_entry *renamesB,
+		   const char *branchNameA,
+		   const char *branchNameB)
+{
+	int cleanMerge = 1;
+	//    printf("process renames %s:%s -> %s:%s\n",
+	//	   branchNameA, renamesA ? renamesA->src: "(none)",
+	//	   branchNameB, renamesB ? renamesB->dst: "(none)");
+
+	struct path_list *srcNames = NULL;
+	const struct rename_entry *sre;
+
+	for (sre = renamesA; sre; sre = sre->next)
+		if (!path_list_has_path(srcNames, sre->src))
+			path_list_insert(sre->src, &srcNames);
+	for (sre = renamesB; sre; sre = sre->next)
+		if (!path_list_has_path(srcNames, sre->src))
+			path_list_insert(sre->src, &srcNames);
+
+	struct path_list *src;
+	for_each_path(src,srcNames) {
+		struct rename_entry *renames1, *renames2, *ren1, *ren2;
+		const char *branchName1, *branchName2;
+		ren1 = find_rename_bysrc(renamesA, src->path);
+		ren2 = find_rename_bysrc(renamesB, src->path);
+		if ( ren1 ) {
+			renames1 = renamesA;
+			renames2 = renamesB;
+			branchName1 = branchNameA;
+			branchName2 = branchNameB;
+		} else {
+			renames1 = renamesB;
+			renames2 = renamesA;
+			branchName1 = branchNameB;
+			branchName2 = branchNameA;
+			struct rename_entry *tmp = ren2;
+			ren2 = ren1;
+			ren1 = tmp;
+		}
+
+		ren1->dst_entry->processed = 1;
+		ren1->src_entry->processed = 1;
+
+		if ( ren1->processed )
+			continue;
+		ren1->processed = 1;
+
+		if ( ren2 ) {
+			// Renamed in 1 and renamed in 2
+			if ( strcmp(ren1->src, ren2->src) != 0 )
+				die("ren1.srcName != ren2.srcName");
+			ren2->dst_entry->processed = 1;
+			ren2->processed = 1;
+			if ( strcmp(ren1->dst, ren2->dst) != 0 ) {
+				output("CONFLICT (rename/rename): "
+				       "Rename %s->%s in branch %s "
+				       "rename %s->%s in %s",
+				       src->path, ren1->dst, branchName1,
+				       src->path, ren2->dst, branchName2);
+				cleanMerge = 0;
+				char *dstName1 = ren1->dst, *dstName2 = ren2->dst;
+				if ( path_list_has_path(currentDirectorySet, ren1->dst) ) {
+					dstName1 = uniquePath(ren1->dst, branchName1);
+					output("%s is a directory in %s adding as %s instead",
+					       ren1->dst, branchName2, dstName1);
+					removeFile(0, ren1->dst);
+				}
+				if ( path_list_has_path(currentDirectorySet, ren2->dst) ) {
+					dstName2 = uniquePath(ren2->dst, branchName2);
+					output("%s is a directory in %s adding as %s instead",
+					       ren2->dst, branchName1, dstName2);
+					removeFile(0, ren2->dst);
+				}
+				setIndexStages(dstName1,
+					       NULL, 0,
+					       ren1->dst_sha, ren1->dst_mode,
+					       NULL, 0,
+					       1 /* clear */);
+				setIndexStages(dstName2,
+					       NULL, 0,
+					       NULL, 0,
+					       ren2->dst_sha, ren2->dst_mode,
+					       1 /* clear */);
+			} else {
+				removeFile(1, ren1->src);
+				struct merge_file_info mfi;
+				mfi = mergeFile(ren1->src, ren1->src_sha, ren1->src_mode,
+						ren1->dst, ren1->dst_sha, ren1->dst_mode,
+						ren2->dst, ren2->dst_sha, ren2->dst_mode,
+						branchName1, branchName2);
+				if ( mfi.merge || !mfi.clean )
+					output("Renaming %s->%s", src->path, ren1->dst);
+
+				if ( mfi.merge )
+					output("Auto-merging %s", ren1->dst);
+
+				if ( !mfi.clean ) {
+					output("CONFLICT (content): merge conflict in %s",
+					       ren1->dst);
+					cleanMerge = 0;
+
+					if ( !index_only )
+						setIndexStages(ren1->dst,
+							       ren1->src_sha, ren1->src_mode,
+							       ren1->dst_sha, ren1->dst_mode,
+							       ren2->dst_sha, ren2->dst_mode,
+							       1 /* clear */);
+				}
+				updateFile(mfi.clean, mfi.sha, mfi.mode, ren1->dst);
+			}
+		} else {
+			// Renamed in 1, maybe changed in 2
+			removeFile(1, ren1->src);
+
+			unsigned char srcShaOtherBranch[20], dstShaOtherBranch[20];
+			unsigned srcModeOtherBranch, dstModeOtherBranch;
+
+			int stage = renamesA == renames1 ? 3: 2;
+
+			memcpy(srcShaOtherBranch, ren1->src_entry->stages[stage].sha, 20);
+			srcModeOtherBranch = ren1->src_entry->stages[stage].mode;
+
+			memcpy(dstShaOtherBranch, ren1->dst_entry->stages[stage].sha, 20);
+			dstModeOtherBranch = ren1->dst_entry->stages[stage].mode;
+
+			int tryMerge = 0;
+			char *newPath;
+			struct rename_entry *dst2;
+
+			if ( path_list_has_path(currentDirectorySet, ren1->dst) ) {
+				newPath = uniquePath(ren1->dst, branchName1);
+				output("CONFLICT (rename/directory): Rename %s->%s in %s "
+				       " directory %s added in %s",
+				       ren1->src, ren1->dst, branchName1,
+				       ren1->dst, branchName2);
+				output("Renaming %s to %s instead", ren1->src, newPath);
+				cleanMerge = 0;
+				removeFile(0, ren1->dst);
+				updateFile(0, ren1->dst_sha, ren1->dst_mode, newPath);
+			} else if ( memcmp(srcShaOtherBranch, null_sha1, 20) == 0 ) {
+				output("CONFLICT (rename/delete): Rename %s->%s in %s "
+				       "and deleted in %s",
+				       ren1->src, ren1->dst, branchName1,
+				       branchName2);
+				cleanMerge = 0;
+				updateFile(0, ren1->dst_sha, ren1->dst_mode, ren1->dst);
+			} else if ( memcmp(dstShaOtherBranch, null_sha1, 20) != 0 ) {
+				newPath = uniquePath(ren1->dst, branchName2);
+				output("CONFLICT (rename/add): Rename %s->%s in %s. "
+				       "%s added in %s",
+				       ren1->src, ren1->dst, branchName1,
+				       ren1->dst, branchName2);
+				output("Adding as %s instead", newPath);
+				updateFile(0, dstShaOtherBranch, dstModeOtherBranch, newPath);
+				cleanMerge = 0;
+				tryMerge = 1;
+			} else if ( (dst2 = find_rename_bydst(renames2, ren1->dst)) ) {
+				char *newPath1 = uniquePath(ren1->dst, branchName1);
+				char *newPath2 = uniquePath(dst2->dst, branchName2);
+				output("CONFLICT (rename/rename): Rename %s->%s in %s. "
+				       "Rename %s->%s in %s",
+				       ren1->src, ren1->dst, branchName1,
+				       dst2->src, dst2->dst, branchName2);
+				output("Renaming %s to %s and %s to %s instead",
+				       ren1->src, newPath1, dst2->src, newPath2);
+				removeFile(0, ren1->dst);
+				updateFile(0, ren1->dst_sha, ren1->dst_mode, newPath1);
+				updateFile(0, dst2->dst_sha, dst2->dst_mode, newPath2);
+				dst2->processed = 1;
+				cleanMerge = 0;
+			} else
+				tryMerge = 1;
+
+			if ( tryMerge ) {
+				char *oname = ren1->src;
+				char *aname = ren1->dst;
+				char *bname = ren1->src;
+				unsigned char osha[20], asha[20], bsha[20];
+				unsigned omode = ren1->src_mode;
+				unsigned amode = ren1->dst_mode;
+				unsigned bmode = srcModeOtherBranch;
+				memcpy(osha, ren1->src_sha, 20);
+				memcpy(asha, ren1->dst_sha, 20);
+				memcpy(bsha, srcShaOtherBranch, 20);
+				const char *aBranch = branchName1;
+				const char *bBranch = branchName2;
+
+				if ( renamesA != renames1 ) {
+					memswp(&aname, &bname, sizeof(aname));
+					memswp(asha, bsha, 20);
+					memswp(&aBranch, &bBranch, sizeof(aBranch));
+				}
+				struct merge_file_info mfi;
+				mfi = mergeFile(oname, osha, omode,
+						aname, asha, amode,
+						bname, bsha, bmode,
+						aBranch, bBranch);
+
+				if ( mfi.merge || !mfi.clean )
+					output("Renaming %s => %s", ren1->src, ren1->dst);
+				if ( mfi.merge )
+					output("Auto-merging %s", ren1->dst);
+				if ( !mfi.clean ) {
+					output("CONFLICT (rename/modify): Merge conflict in %s",
+					       ren1->dst);
+					cleanMerge = 0;
+
+					if ( !index_only )
+						setIndexStages(ren1->dst,
+							       osha, omode,
+							       asha, amode,
+							       bsha, bmode,
+							       1 /* clear */);
+				}
+				updateFile(mfi.clean, mfi.sha, mfi.mode, ren1->dst);
+			}
+		}
+	}
+	path_list_clear(&srcNames, 0);
+	return cleanMerge;
+}
+
+static unsigned char *has_sha(const unsigned char *sha)
+{
+	return memcmp(sha, null_sha1, 20) == 0 ? NULL: (unsigned char *)sha;
+}
+
+static int sha_eq(const unsigned char *a, const unsigned char *b)
+{
+	if ( !a && !b )
+		return 2;
+	return a && b && memcmp(a, b, 20) == 0;
+}
+
+// Per entry merge function
+// ------------------------
+// Merge one cache entry.
+int processEntry(struct index_entry *entry,
+		 const char *branch1Name,
+		 const char *branch2Name)
+{
+	//    printf("processing entry, clean cache: %s\n", index_only ? "yes": "no");
+	//    print_index_entry("\tpath: ", entry);
+	int cleanMerge = 1;
+	const char *path = entry->path;
+	unsigned char *oSha = has_sha(entry->stages[1].sha);
+	unsigned char *aSha = has_sha(entry->stages[2].sha);
+	unsigned char *bSha = has_sha(entry->stages[3].sha);
+	unsigned oMode = entry->stages[1].mode;
+	unsigned aMode = entry->stages[2].mode;
+	unsigned bMode = entry->stages[3].mode;
+
+	if ( oSha && (!aSha || !bSha) ) {
+		//
+		// Case A: Deleted in one
+		//
+		if ( (!aSha && !bSha) ||
+		     (sha_eq(aSha, oSha) && !bSha) ||
+		     (!aSha && sha_eq(bSha, oSha)) ) {
+			// Deleted in both or deleted in one and unchanged in the other
+			if ( aSha )
+				output("Removing %s", path);
+			removeFile(1, path);
+		} else {
+			// Deleted in one and changed in the other
+			cleanMerge = 0;
+			if ( !aSha ) {
+				output("CONFLICT (delete/modify): %s deleted in %s "
+				       "and modified in %s. Version %s of %s left in tree.",
+				       path, branch1Name,
+				       branch2Name, branch2Name, path);
+				updateFile(0, bSha, bMode, path);
+			} else {
+				output("CONFLICT (delete/modify): %s deleted in %s "
+				       "and modified in %s. Version %s of %s left in tree.",
+				       path, branch2Name,
+				       branch1Name, branch1Name, path);
+				updateFile(0, aSha, aMode, path);
+			}
+		}
+
+	} else if ( (!oSha && aSha && !bSha) ||
+		    (!oSha && !aSha && bSha) ) {
+		//
+		// Case B: Added in one.
+		//
+		const char *addBranch;
+		const char *otherBranch;
+		unsigned mode;
+		const unsigned char *sha;
+		const char *conf;
+
+		if ( aSha ) {
+			addBranch = branch1Name;
+			otherBranch = branch2Name;
+			mode = aMode;
+			sha = aSha;
+			conf = "file/directory";
+		} else {
+			addBranch = branch2Name;
+			otherBranch = branch1Name;
+			mode = bMode;
+			sha = bSha;
+			conf = "directory/file";
+		}
+		if ( path_list_has_path(currentDirectorySet, path) ) {
+			cleanMerge = 0;
+			const char *newPath = uniquePath(path, addBranch);
+			output("CONFLICT (%s): There is a directory with name %s in %s. "
+			       "Adding %s as %s",
+			       conf, path, otherBranch, path, newPath);
+			removeFile(0, path);
+			updateFile(0, sha, mode, newPath);
+		} else {
+			output("Adding %s", path);
+			updateFile(1, sha, mode, path);
+		}
+	} else if ( !oSha && aSha && bSha ) {
+		//
+		// Case C: Added in both (check for same permissions).
+		//
+		if ( sha_eq(aSha, bSha) ) {
+			if ( aMode != bMode ) {
+				cleanMerge = 0;
+				output("CONFLICT: File %s added identically in both branches, "
+				       "but permissions conflict %06o->%06o",
+				       path, aMode, bMode);
+				output("CONFLICT: adding with permission: %06o", aMode);
+				updateFile(0, aSha, aMode, path);
+			} else {
+				// This case is handled by git-read-tree
+				assert(0 && "This case must be handled by git-read-tree");
+			}
+		} else {
+			cleanMerge = 0;
+			const char *newPath1 = uniquePath(path, branch1Name);
+			const char *newPath2 = uniquePath(path, branch2Name);
+			output("CONFLICT (add/add): File %s added non-identically "
+			       "in both branches. Adding as %s and %s instead.",
+			       path, newPath1, newPath2);
+			removeFile(0, path);
+			updateFile(0, aSha, aMode, newPath1);
+			updateFile(0, bSha, bMode, newPath2);
+		}
+
+	} else if ( oSha && aSha && bSha ) {
+		//
+		// case D: Modified in both, but differently.
+		//
+		output("Auto-merging %s\n", path);
+		struct merge_file_info mfi;
+		mfi = mergeFile(path, oSha, oMode,
+				path, aSha, aMode,
+				path, bSha, bMode,
+				branch1Name, branch2Name);
+
+		if ( mfi.clean )
+			updateFile(1, mfi.sha, mfi.mode, path);
+		else {
+			cleanMerge = 0;
+			output("CONFLICT (content): Merge conflict in %s", path);
+
+			if ( index_only )
+				updateFile(0, mfi.sha, mfi.mode, path);
+			else
+				updateFileExt(mfi.sha, mfi.mode, path,
+					      0 /* updateCache */, 1 /* updateWd */);
+		}
+	} else
+		die("Fatal merge failure, shouldn't happen.");
+
+	return cleanMerge;
+}
+
+struct merge_tree_result merge_trees(struct tree *head,
+				     struct tree *merge,
+				     struct tree *common,
+				     const char *branch1Name,
+				     const char *branch2Name)
+{
+	int code;
+	struct merge_tree_result result = { NULL, 0 };
+	if ( !memcmp(common->object.sha1, merge->object.sha1, 20) ) {
+		output("Already uptodate!");
+		result.tree = head;
+		result.clean = 1;
+		return result;
+	}
+
+	code = git_merge_trees(index_only ? "-i": "-u", common, head, merge);
+
+	if ( code != 0 )
+		die("merging of trees %s and %s failed",
+		    sha1_to_hex(head->object.sha1),
+		    sha1_to_hex(merge->object.sha1));
+
+	result.tree = git_write_tree();
+
+	if ( !result.tree ) {
+		struct path_list *filesM = NULL, *dirsM = NULL;
+
+		getFilesAndDirs(head, &currentFileSet, &currentDirectorySet);
+		getFilesAndDirs(merge, &filesM, &dirsM);
+
+		path_list_union_update(&currentFileSet, filesM);
+		path_list_union_update(&currentDirectorySet, dirsM);
+
+		struct index_entry *entries = unmergedCacheEntries();
+		struct rename_entry *renamesHead, *renamesMerge;
+		renamesHead  = getRenames(head, common, head, merge, &entries);
+		renamesMerge = getRenames(merge, common, head, merge, &entries);
+		result.clean = processRenames(renamesHead, renamesMerge,
+					      branch1Name, branch2Name);
+		struct index_entry *e;
+		for ( e = entries; e; e = e->next ) {
+			if ( e->processed )
+				continue;
+			if ( !processEntry(e, branch1Name, branch2Name) )
+				result.clean = 0;
+			if ( result.clean || index_only )
+				result.tree = git_write_tree();
+			else
+				result.tree = NULL;
+		}
+		free_rename_entries(&renamesMerge);
+		free_rename_entries(&renamesHead);
+		free_index_entries(&entries);
+	} else {
+		result.clean = 1;
+		printf("merging of trees %s and %s resulted in %s\n",
+		       sha1_to_hex(head->object.sha1),
+		       sha1_to_hex(merge->object.sha1),
+		       sha1_to_hex(result.tree->object.sha1));
+	}
+
+	return result;
+}
+
+static void collect_nodes(struct node *node, struct node_list **res)
+{
+    node_list_insert(node, res);
+    struct node_list *p;
+    for ( p = node->parents; p; p = p->next )
+	collect_nodes(node, res);
+}
+
+struct node_list *reachable_nodes(struct node *n1, struct node *n2)
+{
+    struct node_list *res = NULL;
+    collect_nodes(n1, &res);
+    collect_nodes(n2, &res);
+    return res;
+}
+
+struct graph *graph_build(struct node_list *commits)
+{
+	struct graph *graph = malloc(sizeof(struct graph));
+	memset(graph->commits, 0, sizeof(graph->commits));
+
+	char cmd[256];
+	strcpy(cmd, "git-rev-list --parents");
+	struct node_list *cp;
+	for_each_node_list(cp,commits) {
+		graph_add_node(graph, cp->node);
+		strcat(cmd, " ");
+		strcat(cmd, node_hex_sha1(cp->node));
+	}
+	assert(strlen(cmd) < sizeof(cmd));
+
+	FILE *fp = popen(cmd, "r");
+	if (!fp)
+		die("%s failed: %s", cmd, strerror(errno));
+	while (!feof(fp)) {
+		unsigned char sha[20];
+		int ch;
+		if (fget_sha1(sha, fp, &ch))
+			break;
+		if (EOF == ch)
+			break;
+		// a commit
+		struct node *node = graph_node_bysha(graph, sha);
+		if (!node)
+		{
+			node = node_alloc(lookup_commit(sha));
+			graph_add_node(graph, node);
+		}
+		// ...and its parents. I assume a parent cannot be mentioned
+		// before the children.
+		struct node_list *parents = NULL;
+		while ('\n' != ch) {
+			if (fget_sha1(sha, fp, &ch)) {
+				die("invalid output from %s, "
+				    "sha1 (parents) expected",
+				    cmd);
+				break;
+			}
+			if (EOF == ch)
+				break;
+			struct node *pn = graph_node_bysha(graph, sha);
+			if (!pn) {
+				pn = node_alloc(lookup_commit(sha));
+				graph_add_node(graph, pn);
+			}
+			node_list_insert(pn, &parents);
+		}
+		node_set_parents(node, parents);
+	}
+	pclose(fp);
+	return graph;
+}
+
+static int get_sha1_0(const char *ref, unsigned char *sha)
+{
+	size_t n = strlen(ref);
+	char *t = xmalloc(n + 4);
+	memcpy(t, ref, n);
+	strcpy(t + n, "^0");
+	int rc = get_sha1(t, sha);
+	free(t);
+	return rc;
+}
+
+int main(int argc, char *argv[])
+{
+	static const char *bases[2];
+	static unsigned bases_count = 0;
+
+	original_index_file = getenv("GIT_INDEX_FILE");
+
+	if (!original_index_file)
+		original_index_file = strdup(git_path("index"));
+
+	temporary_index_file = strdup(git_path("mrg-rcrsv-tmp-idx"));
+
+	if (argc < 4)
+		die("Usage: %s <base>... -- <head> <remote> ...\n", argv[0]);
+
+	int i;
+	for (i = 1; i < argc; ++i) {
+		if (!strcmp(argv[i], "--"))
+			break;
+		if (bases_count < sizeof(bases)/sizeof(*bases))
+			bases[bases_count++] = argv[i];
+	}
+	if (argc - i != 3) /* "--" "<head>" "<remote>" */
+		die("Not handling anything other than two heads merge.");
+
+	unsigned char sha1[20], sha2[20];
+	const char *branch1, *branch2;
+
+	branch1 = argv[++i];
+	if (get_sha1_0(branch1, sha1) != 0)
+		die("invalid first branch %s", branch1);
+
+	branch2 = argv[++i];
+	if (get_sha1_0(branch2, sha2) != 0)
+		die("invalid second branch %s", branch2);
+
+	printf("Merging %s with %s\n", branch1, branch2);
+
+	struct merge_result result;
+	struct node *h1 = node_alloc(lookup_commit(sha1));
+	struct node *h2 = node_alloc(lookup_commit(sha2));
+
+	if (bases_count == 1) {
+		unsigned char shabase[20];
+		if (get_sha1_0(bases[0], shabase) != 0)
+			die("invalid base commit %s", bases[0]);
+		struct node *ancestor = node_alloc(lookup_commit(shabase));
+		result = merge(h1, h2, branch1, branch2, NULL, 0, ancestor);
+	} else {
+		struct node_list *commits = NULL;
+		node_list_insert(h1, &commits);
+		node_list_insert(h2, &commits->next);
+		struct graph *graph = graph_build(commits);
+		result = merge(h1, h2, branch1, branch2, graph, 0, NULL);
+	}
+	return result.clean ? 0: 1;
+}
+
+// vim: sw=8 noet
diff --git a/path-list.c b/path-list.c
new file mode 100644
index 0000000..f969116
--- /dev/null
+++ b/path-list.c
@@ -0,0 +1,68 @@
+#include <stdio.h>
+#include "cache.h"
+#include "path-list.h"
+
+static struct path_list *path_list_pool = NULL;
+
+struct path_list *path_list_insert(char *path, struct path_list **list)
+{
+    struct path_list *e;
+    if ( path_list_pool ) {
+        e = path_list_pool;
+        path_list_pool = path_list_pool->next;
+    } else
+        e = xmalloc(sizeof(struct path_list));
+    e->next = *list;
+    e->path = path;
+    *list = e;
+    return e;
+}
+
+void path_list_clear(struct path_list **list, int free_path)
+{
+    while ( *list ) {
+        struct path_list *next = (*list)->next;
+	if (free_path)
+	    free((*list)->path);
+        (*list)->path = NULL;
+        (*list)->next = path_list_pool;
+        path_list_pool = *list;
+        *list = next;
+    }
+}
+
+int path_list_count(const struct path_list *list)
+{
+    int n = 0;
+    for ( ; list; list = list->next )
+        ++n;
+    return n;
+}
+
+int path_list_has_path(const struct path_list *list, const char *path)
+{
+    for ( ; list; list = list->next )
+        if ( strcmp(path, list->path) == 0 )
+            return 1;
+    return 0;
+}
+
+// in place
+void path_list_union_update(struct path_list **dst, const struct path_list *src)
+{
+    const struct path_list *i = src;
+    while ( i ) {
+        if ( !path_list_has_path(*dst, i->path) )
+	    path_list_insert(i->path, dst);
+	i = i->next;
+    }
+}
+
+void print_path_list(const char *text, const struct path_list *p)
+{
+    if ( text )
+        printf("%s\n", text);
+    for ( ; p; p = p->next )
+        printf("%s\n", p->path);
+}
+
diff --git a/path-list.h b/path-list.h
new file mode 100644
index 0000000..7e79f17
--- /dev/null
+++ b/path-list.h
@@ -0,0 +1,31 @@
+#ifndef _PATH_LIST_H_
+#define _PATH_LIST_H_
+
+struct path_list
+{
+    struct path_list *next;
+    char *path;
+};
+
+#define for_each_path(p,list) for ( p = (list); p; p = p->next )
+
+void print_path_list(const char *text, const struct path_list *p);
+
+static inline struct path_list *path_list_shift(struct path_list **list)
+{
+    struct path_list *cur = *list;
+    *list = cur->next;
+    return cur;
+}
+
+int path_list_count(const struct path_list *list);
+int path_list_has_path(const struct path_list *list, const char *path);
+void path_list_union_update(struct path_list **dst, const struct path_list *src);
+void path_list_clear(struct path_list **list, int free_path);
+static inline void path_list_free(struct path_list *list)
+{
+    path_list_clear(&list, 1);
+}
+struct path_list *path_list_insert(char *path, struct path_list **list);
+
+#endif /* _PATH_LIST_H_ */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* CFT: merge-recursive in C (test updates)
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
@ 2006-06-26 23:42 ` Alex Riesen
  2006-06-26 23:54 ` CFT: merge-recursive in C Linus Torvalds
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-26 23:42 UTC (permalink / raw)
  To: git; +Cc: Junio C Hamano, Fredrik Kuivinen

I had to change the tests a little, to avoid skipping test on systems
without python and to force t3402-rebase-merge.sh test the converted
program (this one shouldn't be merged at all of course. I think we'd
still want have git-merge-recursive.py around and test it too. It is
reference implementation, to say the least).

---

diff --git a/t/t3402-rebase-merge.sh b/t/t3402-rebase-merge.sh
index d34c6cf..c13d154 100755
--- a/t/t3402-rebase-merge.sh
+++ b/t/t3402-rebase-merge.sh
@@ -7,12 +7,6 @@ test_description='git rebase --merge tes
 
 . ./test-lib.sh
 
-if test "$no_python"; then
-	echo "Skipping: no python => no recursive merge"
-	test_done
-	exit 0
-fi
-
 T="A quick brown fox
 jumps over the lazy dog."
 for i in 1 2 3 4 5 6 7 8 9 10
@@ -51,7 +45,7 @@ test_expect_success setup '
 '
 
 test_expect_success 'reference merge' '
-	git merge -s recursive "reference merge" HEAD master
+	git merge -s recur "reference merge" HEAD master
 '
 
 test_expect_success rebase '
diff --git a/t/t6021-merge-criss-cross.sh b/t/t6021-merge-criss-cross.sh
index 2623813..e8606c7 100755
--- a/t/t6021-merge-criss-cross.sh
+++ b/t/t6021-merge-criss-cross.sh
@@ -10,12 +10,6 @@ # nice decription of what this is about.
 test_description='Test criss-cross merge'
 . ./test-lib.sh
 
-if test "$no_python"; then
-	echo "Skipping: no python => no recursive merge"
-	test_done
-	exit 0
-fi
-
 test_expect_success 'prepare repository' \
 'echo "1
 2

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
  2006-06-26 23:42 ` CFT: merge-recursive in C (test updates) Alex Riesen
@ 2006-06-26 23:54 ` Linus Torvalds
  2006-06-27  0:07   ` Linus Torvalds
  2006-06-27  0:17   ` Alex Riesen
  2006-06-27  0:25 ` Junio C Hamano
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Linus Torvalds @ 2006-06-26 23:54 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen

On Tue, 27 Jun 2006, Alex Riesen wrote:
> 
> To my deep disappointment, it didn't work out as good as I hoped: one
> program I see most often and for longest time in the process list
> (git-diff-tree) is a too complex thing to be put directly into
> merge-recursive.c, so any help in this direction will be greatly
> appreciated.

Are you sure?

git-diff-tree is one of the simplest git operations. We've got absolutely 
_tons_ of infrastructure in place to do it efficiently, since it's done 
all over the map (a "git-rev-list" with path limiting will do a diff-tree 
against all the commits).

Some of the interfaces might be a bit non-obvious, but the diff stuff was 
some of the first ones to be libified exactly because they end up being so 
fundamental.

		Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:54 ` CFT: merge-recursive in C Linus Torvalds
@ 2006-06-27  0:07   ` Linus Torvalds
  2006-06-27  0:24     ` Alex Riesen
  2006-06-27  0:17   ` Alex Riesen
  1 sibling, 1 reply; 27+ messages in thread
From: Linus Torvalds @ 2006-06-27  0:07 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen



On Mon, 26 Jun 2006, Linus Torvalds wrote:
> 
> git-diff-tree is one of the simplest git operations. We've got absolutely 
> _tons_ of infrastructure in place to do it efficiently, since it's done 
> all over the map (a "git-rev-list" with path limiting will do a diff-tree 
> against all the commits).

Side note - I think merge-recursive could/should be rewritten to use 
"git-merge-tree" instead of "git-read-tree -u -m". I suspect that the 
git-merge-tree output is in fact a lot closer to what git-merge-recursive 
actually wants to have.

			Linus

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:54 ` CFT: merge-recursive in C Linus Torvalds
  2006-06-27  0:07   ` Linus Torvalds
@ 2006-06-27  0:17   ` Alex Riesen
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-27  0:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano, Fredrik Kuivinen

Linus Torvalds, Tue, Jun 27, 2006 01:54:18 +0200:
> > 
> > To my deep disappointment, it didn't work out as good as I hoped: one
> > program I see most often and for longest time in the process list
> > (git-diff-tree) is a too complex thing to be put directly into
> > merge-recursive.c, so any help in this direction will be greatly
> > appreciated.
> 
> Are you sure?
> 
> git-diff-tree is one of the simplest git operations. We've got absolutely 
> _tons_ of infrastructure in place to do it efficiently, since it's done 
> all over the map (a "git-rev-list" with path limiting will do a diff-tree 
> against all the commits).
> 
> Some of the interfaces might be a bit non-obvious, but the diff stuff was 
> some of the first ones to be libified exactly because they end up being so 
> fundamental.
> 

That (non-obvious) was actually the problem here. I needed a diff-tree
without any output on stdout, with "-M" (rename detection). The
precise command I gave up to implement was:

  git-diff-tree -M --diff-filter=R -r -z <tree1> <tree2>

I stopped somewhere around diff_tree, being confused by show_entry.
I took a look at it again, and it seem that show_entry does not
actually "show" anything but calls diff_options->add_remove, right?
So I could define my callback, setup the options (which I certanly can
find after looking closer and longer at builtin-diff-tree.c) and wrap
diff_options with my own struct (I need to pass arguments to the
callback reentrantly: it is a recursive algorithm).

Well, it wasn't that clear (unless I missed something by a mile) last
week... But thanks for you suspicions, they actually forced me to look
at diff-tree again. Will do ... unless (I hope) someone beats me to it.

Bye!

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27  0:07   ` Linus Torvalds
@ 2006-06-27  0:24     ` Alex Riesen
  2006-06-27  0:27       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Riesen @ 2006-06-27  0:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, Junio C Hamano, Fredrik Kuivinen

Linus Torvalds, Tue, Jun 27, 2006 02:07:02 +0200:
> > 
> > git-diff-tree is one of the simplest git operations. We've got absolutely 
> > _tons_ of infrastructure in place to do it efficiently, since it's done 
> > all over the map (a "git-rev-list" with path limiting will do a diff-tree 
> > against all the commits).
> 
> Side note - I think merge-recursive could/should be rewritten to use 
> "git-merge-tree" instead of "git-read-tree -u -m". I suspect that the 
> git-merge-tree output is in fact a lot closer to what git-merge-recursive 
> actually wants to have.
> 

Yep. And does not touch the index, too. Cool...

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
  2006-06-26 23:42 ` CFT: merge-recursive in C (test updates) Alex Riesen
  2006-06-26 23:54 ` CFT: merge-recursive in C Linus Torvalds
@ 2006-06-27  0:25 ` Junio C Hamano
  2006-06-27  0:38   ` Alex Riesen
  2006-06-27  7:28   ` Notes on diffcore API Junio C Hamano
  2006-06-27  7:52 ` CFT: merge-recursive in C Johannes Schindelin
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 27+ messages in thread
From: Junio C Hamano @ 2006-06-27  0:25 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Fredrik Kuivinen

fork0@t-online.de (Alex Riesen) writes:

> To my deep disappointment, it didn't work out as good as I hoped: one
> program I see most often and for longest time in the process list
> (git-diff-tree) is a too complex thing to be put directly into
> merge-recursive.c, so any help in this direction will be greatly
> appreciated.

Actually, diff-tree is (and to similar degree the internal diff
machinery is) quite reusable as library piece, far more reusable
than other parts of the core git.  If you present what you want
to achieve nicely and ask politely I might even get conned into
helping you interface with the rest of your code ;-).

I am guessing that you want to find out how to do the diff-tree -M
used by the recursive merge without spitting out patch text nor
raw output.  That's quite doable and should be easy.  Most
likely you would use NO_OUTPUT option when you call diff_tree().

First look at builtin-diff.c::builtin_diff_tree() to see how you
can call the diff machinery given two tree object names.  diff_tree()
itself does not emit the diff, but leaves the result in "diff
queue".

After calling diff_tree(), inspect diff_queued_diff() and use
the result to do whatever sensible.  The queue is an array of
diff_filepair that records the (path, sha1, mode) among other
things from old tree and from new tree (the one from the old
tree is called "one", and the new tree is called "two").

So if you have one->path = "old-name.c" and two->path = "new-name.c"
then you see the old-name.c file was renamed to new-name.c

When you are done, do not forget to call diff_flush() to get rid
of queued_diff(); otherwise you would leak.

Have fun.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27  0:24     ` Alex Riesen
@ 2006-06-27  0:27       ` Junio C Hamano
  0 siblings, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2006-06-27  0:27 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

fork0@t-online.de (Alex Riesen) writes:

> Linus Torvalds, Tue, Jun 27, 2006 02:07:02 +0200:
>> > 
>> > git-diff-tree is one of the simplest git operations. We've got absolutely 
>> > _tons_ of infrastructure in place to do it efficiently, since it's done 
>> > all over the map (a "git-rev-list" with path limiting will do a diff-tree 
>> > against all the commits).
>> 
>> Side note - I think merge-recursive could/should be rewritten to use 
>> "git-merge-tree" instead of "git-read-tree -u -m". I suspect that the 
>> git-merge-tree output is in fact a lot closer to what git-merge-recursive 
>> actually wants to have.
>> 
>
> Yep. And does not touch the index, too. Cool...

Actually "does not touch the index" part is a defect in
merge-tree.  It is fine at the recursive level, but at the top
level we need to make sure the local changes do not interfere
with the merge.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27  0:25 ` Junio C Hamano
@ 2006-06-27  0:38   ` Alex Riesen
  2006-06-27  7:28   ` Notes on diffcore API Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-27  0:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Fredrik Kuivinen

Junio C Hamano, Tue, Jun 27, 2006 02:25:46 +0200:
> 
> Have fun.
> 

Wow. Thats more of explanation that I hoped. Just let me sleep on it a bit :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Notes on diffcore API
  2006-06-27  0:25 ` Junio C Hamano
  2006-06-27  0:38   ` Alex Riesen
@ 2006-06-27  7:28   ` Junio C Hamano
  2006-06-27  8:41     ` Alex Riesen
  1 sibling, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2006-06-27  7:28 UTC (permalink / raw)
  To: Alex Riesen, Timo Hirvonen; +Cc: git

I really should have done this six months ago, but I guess being
late is much better than never...

Somebody might want to do proper asciidoc and throw it in
Documentation/technical/.

-- >8 --
Notes on diffcore API
=====================

The diff generation mechanism in git is designed to be
self-contained and you should be able to call it as set of
library functions, unlike other parts of the system where "we
run once and let exit() to clean up afterwards" mentality is
dominant.  This is mainly because from early on "diff-tree
--stdin" needed to be able to process hundreds of parent-child
tree pairs without leaking.

NOTE: this note does not describe how combined diff works.  It
is quite a different animal.

The diffcore machinery works in 4 phases:

 1. setting up the machinery, including command line parsing.
 2. feeding input pairs to the machinery.
 3. letting it munge input pairs.
 4. flushing the output.

The first phase sets up the operational parameters (e.g. use of
rename detector, output format).  The second phase feeds pairs
of 'old' and 'new' files to the machinery as the front-end finds
them (e.g. diff-files compares each path found in the index and
in the working tree, and feeds the information from the index as
'old' and the information from the working tree as 'new';
diff-tree compares entries in two trees or entries in the tree
of the first parent commit and the tree of the commit).  In the
third phase, the pairs collected in the previous phase are
split, matched up, and filtered to form a different set of
pairs.  The last phase formats the resulting set of pairs for
the output.

Setting it up
-------------

The diffcore machinery takes one structure, `struct diff_options`,
to record the set of options that affects its behaviour.  These
options affect different parts of its operation, but can roughly
be classified into three groups: the ones that affects how the
input set of pairs are transformed in phase 3, and the ones that
affects how the resulting set of pairs are formatted in phase 4.

First call diff_setup() to initialize the diff_options
structure.  This gives the minimum default set of options:

 - output format defaults to --raw format;
 - output lines are LF terminated;
 - no diffcore transformation (phase 3) is used;

If you are writing a top-level diff command, you can then call
diff_opt_parse() to parse the common diff options and fill the
information in diff_options structure, but if your usage does
not require end-user customizability, you can set up the fields in
diff_options yourself without calling this function.

Then call diff_setup_done() -- this makes sure the set of
options are consistent and derives a reasonable default
(e.g. --find-copies-harder without -C does not make sense, patch
output is always recursive).

Feeding Input
-------------

Your main program feeds 'file pairs' to the diffcore machinery
by using these three functions: diff_addremove(), diff_change()
and diff_unmerge().  The first one records a path appears not in
'old' tree but in 'new' tree (or vice versa), the second one
records a path is different between 'old' and 'new', and the
third one says the comparison is meaningless for the path
because it is unmerged (this is only used by diff-index and
diff-files).

When you want to do something diff-tree does, which is quite
common, you can give two tree object names to diff_tree_sha1()
function and let it walk the trees and call these functions for
you.

To signal the end of input, call `diffcore_std()`.  This
starts the diffcore transformation described next.

Diffcore Transformation
-----------------------

The input file pairs recorded in the previous phase are
collected in diff_queued_diff (a global variable -- which means
that you cannot have two diffs running in parallel with the
current setup).  This is an expandable array of pointers to
`struct diff_filepair` structure.

The `struct diff_filepair` structure has (as the name suggests)
two pointers to `struct diff_filespec` to record the 'old' and
the 'new' file in this pair (the old one is called 'one', and
the new one 'two'), along with some information used by various
diffcore transformation.  `struct diff_filespec` records the
blob object name, pathname, size and mode among other things.
Two things to watch out for are:

 - a non-existent path is denoted by mode=0 (e.g. in a filepair
   for a deleted file, one->mode != 0 and two->mode == 0).

 - 0{40} SHA-1 is used when the filespec talks about the file in
   the working tree.

Documentation/diffcore.txt should be consulted for the details
of what each transformation does.  A short version:

 - diffcore-break breaks a filepair that modifies 'one' to 'two'
   into two filepairs that deletes 'one' and creates 'two' if
   'one' and 'two' are sufficiently dissimilar.

 - diffcore-rename matches up a filepair that deletes 'one' and
   another filepair that creates 'two' and makes them into one
   filepair.

 - diffcore-merge-broken picks up two filepairs that were
   originally one but broken by diffcore-break but did not get
   matched up by diffcore-rename.

 - diffcore-pickaxe filters out filepairs whose 'one' and 'two'
   have the same number of occurrences of the specified string.

 - diffcore-order reorders the resulting filepairs according to
   the given input.

 - after all of the above, --diff-filter is applied to remove
   the uninteresting classes of output (e.g. --diff-filter=A
   shows only additions).

Flushing output
---------------

After diffcore transformation runs, the result is still in the
same diff_queued_diff variable.  Before calling the standard
output routines, you can inspect each file pair in the queue to
see its status (e.g. what renames to what).

Especially interesting is the 'status' field of the filepair
structure.  At this point in the processing chain, each file
pair is marked with `M` (modified), `C` (copied), `R` (renamed),
etc.

Once you are done, calling diff_flush() to perform the output
and free the data structure.  If you ran the diff primarily
because you wanted to read the diff_queued_diff and you do not
want any output from this phase, set the putput_format to
DIFF_FORMAT_NO_OUTPUT before calling diff_flush() -- otherwise
you would leak memory the big way.

The raw, patch-text, diffstat and summary output all happens in
this final phase; recent work to make --patch, --stat, --raw
etc.  independent flags by Timo are primarily about phase 2 and
this phase.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
                   ` (2 preceding siblings ...)
  2006-06-27  0:25 ` Junio C Hamano
@ 2006-06-27  7:52 ` Johannes Schindelin
  2006-06-27  8:58   ` Alex Riesen
  2006-06-27 12:17 ` Alex Riesen
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2006-06-27  7:52 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen

Hi,

On Tue, 27 Jun 2006, Alex Riesen wrote:

> I finally got pis^Witched enough by my platform at work and decided
> to start the effort of converting Fredriks git-merge-recursive to C.

Darn. I was working on the same thing since a few days.

A few remarks:

- have you considered using run-command() instead of system()?
- in setup_index(), you set GIT_INDEX_FILE, but I do not think that the 
  rest of git picks up on it. environment.cc:get_index_file() checks if 
  the variable was set already, but not if it changed.
- You work with linked lists all the time. This is slow, especially for 
  the checks, if a file/directory is already there. Sorted lists would be
  way faster there. Since you encapsulated that, it is no problem to 
  change that later (before inclusion).
- is not "struct commit_list" more appropriate than "struct graph"?
- I always wondered why merge-recursive did not call merge-base, but did 
  its own thing. Hmm?

> It still uses some calls to git programs (git-update-index,
> git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
> the labels (-L) missing - I was unsure how to tackle this on windows -
> it has only argv[1].

See above.

> And it needs to be cleaned up a bit (a lot) - Python is not exactly
> what you'd call "C-conversion-friendly-language", and I am not an
> expert (as in "git-merge-recursive.py is my first Python experience").

No problem. Since I already worked on it, that should be easy.

> To my deep disappointment, it didn't work out as good as I hoped: one
> program I see most often and for longest time in the process list
> (git-diff-tree) is a too complex thing to be put directly into
> merge-recursive.c, so any help in this direction will be greatly
> appreciated.

Maybe something like this (ripped from my fragment of merge-recursive.c):

static struct container *get_renames(struct tree *tree,
		struct tree *o, struct tree *a, struct tree *b,
		struct container *cache_entries)
{
	/*
	 * Get information of all renames which occured between 'oTree' and
	 * 'tree'. We need the three trees in the merge ('oTree', 'aTree' and
	 * 'bTree') to be able to associate the correct cache entries with the
	 * rename information. 'tree' is always equal to either aTree or bTree.
	 */
	int i;
	struct diff_options diff_opts;
	struct container *result
		= xcalloc(1, sizeof(struct container));

	diff_setup(&diff_opts);
	diff_opts.recursive = 1;
	diff_opts.detect_rename = DIFF_DETECT_RENAME;

	if (diff_setup_done(&diff_opts) < 0)
		die("diff_setup_done failed");

	diff_tree_sha1(o->object.sha1, tree->object.sha1,
			"", &diff_opts);
	diffcore_std(&diff_opts);

	for (i = 0; i < diff_queued_diff.nr; i++) {
		struct diff_filepair *p = diff_queued_diff.queue[i];

		if (p->status == 'R') {
			struct entry *entry = get_rename(result, p->one->path);
			struct rename *rename = entry->priv;

			rename->src.mode = p->one->mode;
			rename->src.sha1 = p->one->sha1;
			rename->src.data
				= get_cache_entry(cache_entries, p->one->path)->priv;
			fake_stage_data(rename->src.data, o, a, b);

			rename->dest.path = p->two->path;
			rename->dest.mode = p->two->mode;
			rename->dest.sha1 = p->two->sha1;
			rename->dest.data
				= get_cache_entry(cache_entries, p->two->path)->priv;
			fake_stage_data(rename->dest.data, o, a, b);

			rename->score = p->score;
		}
	}

	return result;
}

It is not tested, evidently, since I did not get the merge-base code 
integrated yet. But it should give you an idea.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Notes on diffcore API
  2006-06-27  7:28   ` Notes on diffcore API Junio C Hamano
@ 2006-06-27  8:41     ` Alex Riesen
  2006-06-27 23:33       ` Junio C Hamano
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Riesen @ 2006-06-27  8:41 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Timo Hirvonen, git

On 6/27/06, Junio C Hamano <junkio@cox.net> wrote:
> -- >8 --
> Notes on diffcore API
> =====================

Thanks!

> Diffcore Transformation
> -----------------------
>
> The input file pairs recorded in the previous phase are
> collected in diff_queued_diff (a global variable -- which means
> that you cannot have two diffs running in parallel with the
> current setup).  This is an expandable array of pointers to
> `struct diff_filepair` structure.
>

merge-recursive shouldn't have any problems with that, as the
renames are just read in the current implementation.
Still, it is somehow uncomfortable to see the amount of APIs
with the above restriction. Never know when it'll bite.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27  7:52 ` CFT: merge-recursive in C Johannes Schindelin
@ 2006-06-27  8:58   ` Alex Riesen
  2006-06-27 10:51     ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Riesen @ 2006-06-27  8:58 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen

On 6/27/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> > I finally got pis^Witched enough by my platform at work and decided
> > to start the effort of converting Fredriks git-merge-recursive to C.
>
> Darn. I was working on the same thing since a few days.

I didn't know :)

> - have you considered using run-command() instead of system()?

No. What run-program?

> - in setup_index(), you set GIT_INDEX_FILE, but I do not think that the
>   rest of git picks up on it. environment.cc:get_index_file() checks if
>   the variable was set already, but not if it changed.

Not even sure it's needed. Leftover from conversion

> - You work with linked lists all the time. This is slow, especially for
>   the checks, if a file/directory is already there. Sorted lists would be
>   way faster there. Since you encapsulated that, it is no problem to
>   change that later (before inclusion).

Right, that's why it is mostly encapsulated.

> - is not "struct commit_list" more appropriate than "struct graph"?

Not even properly considered it yet. It probably is.

> - I always wondered why merge-recursive did not call merge-base, but did
>   its own thing. Hmm?

No idea yet.

> > To my deep disappointment, it didn't work out as good as I hoped: one
> > program I see most often and for longest time in the process list
> > (git-diff-tree) is a too complex thing to be put directly into
> > merge-recursive.c, so any help in this direction will be greatly
> > appreciated.
>
> Maybe something like this (ripped from my fragment of merge-recursive.c):
>
> static struct container *get_renames(struct tree *tree,
>                 struct tree *o, struct tree *a, struct tree *b,
>                 struct container *cache_entries)
...
> It is not tested, evidently, since I did not get the merge-base code
> integrated yet. But it should give you an idea.
>

Thanks! It was something I was getting at after Junio explained it.
Will have to wait until after work.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27  8:58   ` Alex Riesen
@ 2006-06-27 10:51     ` Johannes Schindelin
  2006-06-27 11:53       ` Alex Riesen
  2006-06-27 17:05       ` Junio C Hamano
  0 siblings, 2 replies; 27+ messages in thread
From: Johannes Schindelin @ 2006-06-27 10:51 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen

Hi,

On Tue, 27 Jun 2006, Alex Riesen wrote:

> On 6/27/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > > I finally got pis^Witched enough by my platform at work and decided
> > > to start the effort of converting Fredriks git-merge-recursive to C.
> > 
> > Darn. I was working on the same thing since a few days.
> 
> I didn't know :)

Vice versa, I guess.

> > - have you considered using run-command() instead of system()?
> 
> No. What run-program?

run-command.c:run_command(). Call it like this:

int return_code = run_command("git-read-tree", sha1_to_hex(sha1), NULL);

> > - in setup_index(), you set GIT_INDEX_FILE, but I do not think that the
> >   rest of git picks up on it. environment.cc:get_index_file() checks if
> >   the variable was set already, but not if it changed.
> 
> Not even sure it's needed. Leftover from conversion

I think it _is_ needed, in order not to mess up the current index. Setting 
the environment variable works for exec()ed processes, but I think we need 
to add a set_index_file(const char *) to environment.c.

> > - I always wondered why merge-recursive did not call merge-base, but did
> >   its own thing. Hmm?
> 
> No idea yet.

Frederik?

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27 10:51     ` Johannes Schindelin
@ 2006-06-27 11:53       ` Alex Riesen
  2006-06-27 17:05       ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-27 11:53 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen

On 6/27/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > > - have you considered using run-command() instead of system()?
> >
> > No. What run-program?
>
> run-command.c:run_command(). Call it like this:
>
> int return_code = run_command("git-read-tree", sha1_to_hex(sha1), NULL);
>

Oh, I see. Will convert right away.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
                   ` (3 preceding siblings ...)
  2006-06-27  7:52 ` CFT: merge-recursive in C Johannes Schindelin
@ 2006-06-27 12:17 ` Alex Riesen
  2006-06-27 12:43   ` Johannes Schindelin
  2006-06-28  6:37 ` Uwe Zeisberger
  2006-06-28 15:06 ` Christopher Faylor
  6 siblings, 1 reply; 27+ messages in thread
From: Alex Riesen @ 2006-06-27 12:17 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen

On 6/27/06, Alex Riesen <fork0@t-online.de> wrote:
> Hi all.
>
> I finally got pis^Witched enough by my platform at work and decided
> to start the effort of converting Fredriks git-merge-recursive to C.
> At the moment it is the only one annoyingly slow thing there.
>

Just tested it on my project. It's still the slow thing (even a bit
slower, looks CPU bound).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27 12:17 ` Alex Riesen
@ 2006-06-27 12:43   ` Johannes Schindelin
  2006-06-27 14:09     ` Alex Riesen
  0 siblings, 1 reply; 27+ messages in thread
From: Johannes Schindelin @ 2006-06-27 12:43 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen

Hi,

On Tue, 27 Jun 2006, Alex Riesen wrote:

> On 6/27/06, Alex Riesen <fork0@t-online.de> wrote:
> > 
> > I finally got pis^Witched enough by my platform at work and decided
> > to start the effort of converting Fredriks git-merge-recursive to C.
> > At the moment it is the only one annoyingly slow thing there.
> 
> Just tested it on my project. It's still the slow thing (even a bit
> slower, looks CPU bound).

Should improve when using git-merge-tree, and a faster path_list.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27 12:43   ` Johannes Schindelin
@ 2006-06-27 14:09     ` Alex Riesen
  0 siblings, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-27 14:09 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen

On 6/27/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > >
> > > I finally got pis^Witched enough by my platform at work and decided
> > > to start the effort of converting Fredriks git-merge-recursive to C.
> > > At the moment it is the only one annoyingly slow thing there.
> >
> > Just tested it on my project. It's still the slow thing (even a bit
> > slower, looks CPU bound).
>
> Should improve when using git-merge-tree, and a faster path_list.
>

For the moment the most visible next offenders are calls to update-index.
I think about batching them (add, add, add, flush, rm, rm, flush,
stages, flush),
but maybe someone have a better idea?

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-27 10:51     ` Johannes Schindelin
  2006-06-27 11:53       ` Alex Riesen
@ 2006-06-27 17:05       ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2006-06-27 17:05 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: Alex Riesen, git, Fredrik Kuivinen

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > - I always wondered why merge-recursive did not call merge-base, but did
>> >   its own thing. Hmm?
>> 
>> No idea yet.

A somewhat related issue is that when one head is given I'd
strongly prefer that merge-recursive did not call merge-base nor
did its own thing (that is, for the top-level).  Otherwise we
cannot use it for historyless three tree merge that we need for
rebase/revert/cherry-pick.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Notes on diffcore API
  2006-06-27  8:41     ` Alex Riesen
@ 2006-06-27 23:33       ` Junio C Hamano
  2006-06-28  7:36         ` Johannes Schindelin
  0 siblings, 1 reply; 27+ messages in thread
From: Junio C Hamano @ 2006-06-27 23:33 UTC (permalink / raw)
  To: Alex Riesen; +Cc: Junio C Hamano, Timo Hirvonen, git

"Alex Riesen" <raa.lkml@gmail.com> writes:

> On 6/27/06, Junio C Hamano <junkio@cox.net> wrote:
>> -- >8 --
>> Notes on diffcore API
>> =====================
>
> Thanks!
>
>> Diffcore Transformation
>> -----------------------
>>
>> The input file pairs recorded in the previous phase are
>> collected in diff_queued_diff (a global variable -- which means
>> that you cannot have two diffs running in parallel with the
>> current setup).  This is an expandable array of pointers to
>> `struct diff_filepair` structure.
>>
>
> merge-recursive shouldn't have any problems with that, as the
> renames are just read in the current implementation.
> Still, it is somehow uncomfortable to see the amount of APIs
> with the above restriction. Never know when it'll bite.

I think it is simply the matter of moving diff_queued_diff a
field in diff_optionss structure and adding an extra parameter
to point at the current diff_options to handful functions if we
ever need to support it.  I haven't bothered doing that because
we haven't had the need to run more than one diff at once.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
                   ` (4 preceding siblings ...)
  2006-06-27 12:17 ` Alex Riesen
@ 2006-06-28  6:37 ` Uwe Zeisberger
  2006-06-28  7:32   ` Alex Riesen
  2006-06-28  9:22   ` Junio C Hamano
  2006-06-28 15:06 ` Christopher Faylor
  6 siblings, 2 replies; 27+ messages in thread
From: Uwe Zeisberger @ 2006-06-28  6:37 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git

Hello Alex,

> +// does not belong here
Some C compiler (e.g. Sun Forte) don't like C++-style comments.

(So the line could read:

  /* "//" does not belong here :-) */

)

Best regards
Uwe

-- 
Uwe Zeisberger

http://www.google.com/search?q=1+degree+celsius+in+kelvin

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-28  6:37 ` Uwe Zeisberger
@ 2006-06-28  7:32   ` Alex Riesen
  2006-06-28  9:22   ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Alex Riesen @ 2006-06-28  7:32 UTC (permalink / raw)
  To: Uwe Zeisberger, Alex Riesen, git

On 6/28/06, Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de> wrote:
> > +// does not belong here
> Some C compiler (e.g. Sun Forte) don't like C++-style comments.

Yep, I'll change them

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: Notes on diffcore API
  2006-06-27 23:33       ` Junio C Hamano
@ 2006-06-28  7:36         ` Johannes Schindelin
  0 siblings, 0 replies; 27+ messages in thread
From: Johannes Schindelin @ 2006-06-28  7:36 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Alex Riesen, Timo Hirvonen, git

Hi,

On Tue, 27 Jun 2006, Junio C Hamano wrote:

> "Alex Riesen" <raa.lkml@gmail.com> writes:
> 
> > On 6/27/06, Junio C Hamano <junkio@cox.net> wrote:
> >> -- >8 --
> >> Notes on diffcore API
> >> =====================
> >
> > Thanks!
> >
> >> Diffcore Transformation
> >> -----------------------
> >>
> >> The input file pairs recorded in the previous phase are
> >> collected in diff_queued_diff (a global variable -- which means
> >> that you cannot have two diffs running in parallel with the
> >> current setup).  This is an expandable array of pointers to
> >> `struct diff_filepair` structure.
> >>
> >
> > merge-recursive shouldn't have any problems with that, as the
> > renames are just read in the current implementation.
> > Still, it is somehow uncomfortable to see the amount of APIs
> > with the above restriction. Never know when it'll bite.
> 
> I think it is simply the matter of moving diff_queued_diff a
> field in diff_optionss structure and adding an extra parameter
> to point at the current diff_options to handful functions if we
> ever need to support it.  I haven't bothered doing that because
> we haven't had the need to run more than one diff at once.

And we shouldn't bother until we need it. It has a small performance 
impact, and the code gets more ugly.

Ciao,
Dscho

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-28  6:37 ` Uwe Zeisberger
  2006-06-28  7:32   ` Alex Riesen
@ 2006-06-28  9:22   ` Junio C Hamano
  1 sibling, 0 replies; 27+ messages in thread
From: Junio C Hamano @ 2006-06-28  9:22 UTC (permalink / raw)
  To: Uwe Zeisberger; +Cc: Alex Riesen, git

Uwe Zeisberger <zeisberg@informatik.uni-freiburg.de> writes:

> Hello Alex,
>
>> +// does not belong here
> Some C compiler (e.g. Sun Forte) don't like C++-style comments.

Heh, I said something like that last year and was scolded by
Linus who responded "what century are you living in?" ;-).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
                   ` (5 preceding siblings ...)
  2006-06-28  6:37 ` Uwe Zeisberger
@ 2006-06-28 15:06 ` Christopher Faylor
  2006-06-29  0:38   ` Alex Riesen
  6 siblings, 1 reply; 27+ messages in thread
From: Christopher Faylor @ 2006-06-28 15:06 UTC (permalink / raw)
  To: Junio C Hamano, Fredrik Kuivinen, Alex Riesen, git

On Tue, Jun 27, 2006 at 01:38:38AM +0200, Alex Riesen wrote:
>It still uses some calls to git programs (git-update-index,
>git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
>the labels (-L) missing - I was unsure how to tackle this on windows -
>it has only argv[1].

Actually, Windows should behave the same as Linux wrt argv handling.
You can use argv[1] ... argv[n] modulo any differences in command line
quoting.

On Windows the arguments are broken into individual components by the
runtime, e.g., MSVCRT.dll or Cygwin1.dll.

cgf

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-28 15:06 ` Christopher Faylor
@ 2006-06-29  0:38   ` Alex Riesen
  2006-06-29  0:49     ` Christopher Faylor
  0 siblings, 1 reply; 27+ messages in thread
From: Alex Riesen @ 2006-06-29  0:38 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: git

Christopher Faylor, Wed, Jun 28, 2006 17:06:47 +0200:
> >It still uses some calls to git programs (git-update-index,
> >git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
> >the labels (-L) missing - I was unsure how to tackle this on windows -
> >it has only argv[1].
> 
> Actually, Windows should behave the same as Linux wrt argv handling.
> You can use argv[1] ... argv[n] modulo any differences in command line
> quoting.

which leaves us (without quoting) with exactly one argument. argv[1],
aka GetCommandLine.

> On Windows the arguments are broken into individual components by the
> runtime, e.g., MSVCRT.dll or Cygwin1.dll.

And the rules for quoting are the same for ms and cygwin? It's just
passing arguments between cygwin programs and windows natives never
works as one might them expect. Try passing "^" to a batch script (to
a perl script with cmd wrapper around it).

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: CFT: merge-recursive in C
  2006-06-29  0:38   ` Alex Riesen
@ 2006-06-29  0:49     ` Christopher Faylor
  0 siblings, 0 replies; 27+ messages in thread
From: Christopher Faylor @ 2006-06-29  0:49 UTC (permalink / raw)
  To: Alex Riesen, git

On Thu, Jun 29, 2006 at 02:38:37AM +0200, Alex Riesen wrote:
>Christopher Faylor, Wed, Jun 28, 2006 17:06:47 +0200:
>>>It still uses some calls to git programs (git-update-index,
>>>git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
>>>the labels (-L) missing - I was unsure how to tackle this on windows -
>>>it has only argv[1].
>>
>>Actually, Windows should behave the same as Linux wrt argv handling.
>>You can use argv[1] ...  argv[n] modulo any differences in command line
>>quoting.
>
>which leaves us (without quoting) with exactly one argument.  argv[1],
>aka GetCommandLine.
>
>>On Windows the arguments are broken into individual components by the
>>runtime, e.g., MSVCRT.dll or Cygwin1.dll.
>
>And the rules for quoting are the same for ms and cygwin?

Probably not but I was the one who raised the issue of quoting, not you.
Quoting is irrelevant to the general assertion that there is only an
argv[1] on Windows.  If that is the behavior that you're seeing then
something is wrong in the way the program is being invoked.

Maybe I'm missing something here and you're talking about some specific
case in git.  It's hard to see how anyone could make this assertion
otherwise.

cgf

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2006-06-29  0:49 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-26 23:38 CFT: merge-recursive in C Alex Riesen
2006-06-26 23:42 ` CFT: merge-recursive in C (test updates) Alex Riesen
2006-06-26 23:54 ` CFT: merge-recursive in C Linus Torvalds
2006-06-27  0:07   ` Linus Torvalds
2006-06-27  0:24     ` Alex Riesen
2006-06-27  0:27       ` Junio C Hamano
2006-06-27  0:17   ` Alex Riesen
2006-06-27  0:25 ` Junio C Hamano
2006-06-27  0:38   ` Alex Riesen
2006-06-27  7:28   ` Notes on diffcore API Junio C Hamano
2006-06-27  8:41     ` Alex Riesen
2006-06-27 23:33       ` Junio C Hamano
2006-06-28  7:36         ` Johannes Schindelin
2006-06-27  7:52 ` CFT: merge-recursive in C Johannes Schindelin
2006-06-27  8:58   ` Alex Riesen
2006-06-27 10:51     ` Johannes Schindelin
2006-06-27 11:53       ` Alex Riesen
2006-06-27 17:05       ` Junio C Hamano
2006-06-27 12:17 ` Alex Riesen
2006-06-27 12:43   ` Johannes Schindelin
2006-06-27 14:09     ` Alex Riesen
2006-06-28  6:37 ` Uwe Zeisberger
2006-06-28  7:32   ` Alex Riesen
2006-06-28  9:22   ` Junio C Hamano
2006-06-28 15:06 ` Christopher Faylor
2006-06-29  0:38   ` Alex Riesen
2006-06-29  0:49     ` Christopher Faylor

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox