Git development
 help / color / mirror / Atom feed
* Re: Improved three-way blob merging code
From: Alex Riesen @ 2006-06-29  7:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Davide Libenzi
In-Reply-To: <Pine.LNX.4.64.0606282157210.12404@g5.osdl.org>

On 6/29/06, Linus Torvalds <torvalds@osdl.org> wrote:
> +static void *three_way_filemerge(mmfile_t *base, mmfile_t *our, mmfile_t *their, unsigned long *size)
> +{
...
> +       if (t1 && t2 && t3) {
> +               int code = run_command("merge", t2, t1, t3, NULL);

This does not use the labels of merge(1) and the merged file will contain
the names of temporary files at conflict places, which is very confusing if
you happen to loose context while doing a merge with lots of conflicts.

^ permalink raw reply

* Re: CFT: merge-recursive in C (updated)
From: Alex Riesen @ 2006-06-29  8:43 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, Johannes Schindelin, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <7vzmfwy97w.fsf@assigned-by-dhcp.cox.net>

[cc list restored, I'm lost in the maze of git update-index, all cache
changing functions looking almost the same]

On 6/29/06, Junio C Hamano <junkio@cox.net> wrote:
> > this broke t6022-merge-rename.sh (the second test). It produces an
> > index with this:
> >
> > .../t/trash$ git-diff-index white
> > :100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B
> > :100644 100644 ba41fb96393979b22691106b06bf5231eab57b85 0000000000000000000000000000000000000000 M      N
> >
> > whereas git-merge-recursive (and the previous version, without pipe):
> >
> > .../t/trash$ git-diff-index white
> > :100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B
> >
> > I can see that "git update-index --add" is somehow different from a
> > pipe to "git update-index --index-info", but not very clear. Does this
> > "zero-sha1" mean that the file "N" is not in the index?
>
> When diff-index and diff-files compare a tree entry or an index
> entry with a file in the working tree, they do not compute the
> blob hash value for the file in the working tree.  0{40} is used
> on the RHS in such a case.  When the working tree file matches
> the corresponding index entry, then we know RHS matches what is
> in the index, so both sides have the blob hash value.

Ok. Am I correct in the assumption, that if the file in working tree has
the same SHA1 as LHS, than the next "git-update-index --refresh" will
remove the entry from git-diff-index output?
This is what actually happens, if I do "git-update-index --refresh", so I
suspect that I have an SHA1 update gone missing somewhere.

^ permalink raw reply

* [PATCH] cogito: export user env in tutorial-script
From: Gerrit Pape @ 2006-06-29  8:58 UTC (permalink / raw)
  To: git

The GIT_* environment variables set in switch_user() must be exported to be
available to the programs run afterwards.

Signed-off-by: Gerrit Pape <pape@smarden.org>

---

 Documentation/tutorial-script/script.sh |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

f4d96a4ceda9b7ac7e2e29f4b017edf60360904c
diff --git a/Documentation/tutorial-script/script.sh b/Documentation/tutorial-script/script.sh
index b99e3c9..b0e49c8 100755
--- a/Documentation/tutorial-script/script.sh
+++ b/Documentation/tutorial-script/script.sh
@@ -52,10 +52,10 @@ switch_user () {
 		*) echo "I don't know you, $1"; exit 1;;
 	esac
 	HOME=$(pwd)
-	GIT_AUTHOR_NAME="$1"
-	GIT_AUTHOR_EMAIL="$1@example.com"
-	GIT_COMMITTER_NAME="$1"
-	GIT_COMMITTER_EMAIL="$1@example.com"
+	export GIT_AUTHOR_NAME="$1"
+	export GIT_AUTHOR_EMAIL="$1@example.com"
+	export GIT_COMMITTER_NAME="$1"
+	export GIT_COMMITTER_EMAIL="$1@example.com"
 }
 
 
-- 
1.3.3

^ permalink raw reply related

* [PATCH] cogito: selftests need bash
From: Gerrit Pape @ 2006-06-29  8:58 UTC (permalink / raw)
  To: git

Force bash for selftest scripts, as they fail with posix sh.

Signed-off-by: Gerrit Pape <pape@smarden.org>

---

 t/Makefile |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

0f3a68d543ebb55564bb4edca8a26722f6de8da8
diff --git a/t/Makefile b/t/Makefile
index 6882e23..b8e6eec 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -8,7 +8,7 @@ #GIT_TEST_OPTS=--verbose --debug
 T = $(wildcard t[0-9][0-9][0-9][0-9]-*.sh)
 
 all:
-	@$(foreach t,$T,echo "*** $t ***"; sh $t $(GIT_TEST_OPTS) || exit; )
+	@$(foreach t,$T,echo "*** $t ***"; bash $t $(GIT_TEST_OPTS) || exit; )
 	@rm -fr trash
 
 clean:
-- 
1.3.3

^ permalink raw reply related

* Re: [PATCH] Makefile: set USE_PIC on Linux x86_64 for linking with Git.pm
From: Sergey Vlasov @ 2006-06-29  9:04 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Marco Roeland, git, pasky
In-Reply-To: <7vbqsdynvu.fsf@assigned-by-dhcp.cox.net>

[-- Attachment #1: Type: text/plain, Size: 2668 bytes --]

On Wed, 28 Jun 2006 14:24:37 -0700 Junio C Hamano wrote:

> Marco Roeland <marco.roeland@xs4all.nl> writes:
> 
> > Even for Linux someone mentioned that probably i386 is the exception in
> > _not_ needing the -fPIC linkage. It might even be specific to the Perl
> > "xs" implementation specifics?

In general, -fPIC is required when building shared libraries.  On some
systems (e.g., Linux/i386) you can get away without -fPIC, but with a
penalty on memory use and load time: non-PIC code will need relocations,
therefore its pages will no longer be shared between different
processes, and relocations will be performed immediately after loading
the shared library.

> USE_PIC is for pleasing Perly git and nothing else right now.
> 
> > So I should have added "Works for me (TM)"! ;-)
> 
> That would have been more explicit way to tell me that this is a
> partial solution and I should solicit help from people on other
> platforms.
> 
> By the way, I had an impression that compiling things with -fPIC
> when not necessary was generally a bad idea from performance
> point of view.  If that is the case we might want to compile,
> under USE_PIC, everything with -fPIC in a separate area to
> compile and link with Git.xs, without affecting the C-only core
> code.

This is exactly what libtool does (if both static and shared libraries
are compiled, each file is compiled twice - once with -fPIC -DPIC, and
once without these options).

But I suspect that even libtool won't help with Perl anyway, unless we
create a proper libgit.so and then link our Perl extension with it.

> I suspect this would largely depend on the architecture.  I ran
> git-fsck-objects compiled with and without -fPIC (after "make
> clean" to rebuild everything) on a fully packed copy of the
> linux-2.6 repository on my x86_64 box, and did not see
> meaningful differences:
> 
> : gitster; /usr/bin/time ../git.junio/git-fsck-objects-no-pic --full
> 109.71user 5.01system 1:54.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (14major+1834967minor)pagefaults 0swaps
> : gitster; /usr/bin/time ../git.junio/git-fsck-objects-with-pic --full
> 109.05user 4.97system 1:54.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
> 0inputs+0outputs (0major+1834981minor)pagefaults 0swaps
> : gitster;

This is because most of time is spent inside SHA-1 and zlib routines,
which are the same in these cases.  Using mozilla-sha1 code might show
some difference.

And the effect of -fPIC on x86_64 is smaller than on i386, because
x86_64 has 2x more registers than i386, therefore loss of one register
is less noticeable.

[-- Attachment #2: Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply

* Re: [RFC] GIT user survey
From: Jakub Narebski @ 2006-06-29  9:49 UTC (permalink / raw)
  To: git
In-Reply-To: <20060624221523.GA4335@spinlock.ch>

Matthias Kestenholz <lists@spinlock.ch> wrote:
> Adrien Beau (adrienbeau@gmail.com) wrote:
>> 
>> The results of the Mercurial survey have been posted there:
>> http://www.selenic.com/mercurial/wiki/index.cgi/UserSurvey
>> 
>> An interesting read.
> 
> I find the answers to the question, what people most like to see
> improved interesting: The improvement which got mentioned most often
> was "merge across rename", something which git does already.
> 
> It seems, that partial checkouts and truncated history are the
> only things left to implement for git from this list.

I think that at least some of the infrastructure for partial checkouts
is in place due to preliminary work for subproject (gitlink or bind)
support in git: git-read-tree and git-write-tree --prefix=<prefix>/
option suppport. But I might be mistaken.

Truncating history "by hand" is possible even now, using graft file.
There is recurring talk about "shallow clones", lately about "lazy clones".
There were mentioned here also split-history idea and sparse clone idea.  

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH 1/2] rebase: check for errors from git-commit
From: Eric Wong @ 2006-06-28  9:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v1wt94oom.fsf@assigned-by-dhcp.cox.net>

Junio C Hamano <junkio@cox.net> wrote:
> Eric Wong <normalperson@yhbt.net> writes:
> 
> >  I've grown used to having 'set -e' at the beginning of my shell
> >  scripts.  IMHO it'd be a good idea to start moving towards this
> >  eventually (even though shell scripts seem to be getting phased-out
> >  somewhat).
> >
> >  git-rebase.sh |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> >
> > diff --git a/git-rebase.sh b/git-rebase.sh
> > index 9ad1c44..47c8e8f 100755
> > --- a/git-rebase.sh
> > +++ b/git-rebase.sh
> > @@ -59,8 +59,8 @@ continue_merge () {
> >  
> >  	if test -n "`git-diff-index HEAD`"
> >  	then
> > +		git-commit -C "`cat $dotest/current`" || die 'Commit failed'
> >  		printf "Committed: %0${prec}d" $msgnum
> > -		git-commit -C "`cat $dotest/current`"
> 
> Anticipating failure from "git-commit" is the right thing to do,
> but this is a "Now what?" situation.  What is the expected
> course of action to recover from this for the end user, and how
> can we phrase the error message to help that process?

I would expect git-commit to show the correct error message (or the
pre-commit hook), die "$RESOLVEMSG" might be a better option, though.

-- 
Eric Wong

^ permalink raw reply

* Re: CFT: merge-recursive in C (updated)
From: Alex Riesen @ 2006-06-28  9:34 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Fredrik Kuivinen,
	Linus Torvalds
In-Reply-To: <20060627223249.GA8177@steel.home>

[-- Attachment #1: Type: text/plain, Size: 455 bytes --]

On 6/28/06, Alex Riesen <fork0@t-online.de> wrote:
> Johannes Schindelin, Tue, Jun 27, 2006 18:42:51 +0200:
> > > - use a pipe to "git-update-index --index-info" instead of using command
> > > line

...and to take it a step further, a patch (0002) to avoid too many calls to
git-write-tree and to git-update-index. Brought merge times on my test
monster (~25k files) down to 2min 30sec (from something around 11 min).
The 0001 patch is just C++ comments.

[-- Attachment #2: 0001-update-comments-and-remove-C.txt --]
[-- Type: text/plain, Size: 13049 bytes --]

From 36147880afad3a2d53f6474bec069e29d82a74a8 Mon Sep 17 00:00:00 2001
From: Riesen <ARiesen@oekap852.hbi.ad.harman.com>
Date: Wed, 28 Jun 2006 09:46:47 +0200
Subject: update comments and remove C++ //...
---
 graph.c           |   14 ++++--
 merge-recursive.c |  131 +++++++++++++++++++++++++----------------------------
 path-list.c       |    2 -
 3 files changed, 72 insertions(+), 75 deletions(-)

diff --git a/graph.c b/graph.c
index fa2bfee..1a28302 100644
--- a/graph.c
+++ b/graph.c
@@ -5,7 +5,7 @@ #include "cache.h"
 #include "commit.h"
 #include "graph.h"
 
-// does not belong here
+/* does not belong here */
 struct tree *git_write_tree()
 {
 #if 0
@@ -189,8 +189,10 @@ struct node_list *node_list_find_node(co
 	return (struct node_list*)p;
 }
 
-// a & b. a and are invalid after the call,
-// the result will contain all the common nodes
+/*
+ * a & b. a and are invalid after the call,
+ * the result will contain all the common nodes
+ */
 struct node_list *node_list_intersect(struct node_list *a,
 				      struct node_list *b)
 {
@@ -214,7 +216,7 @@ struct node *graph_add_node(struct graph
 {
 	struct node_list **bucket;
 	if ( node->virtual )
-		// virtual nodes hashed by lowest byte of virtual_id
+		/* virtual nodes hashed by lowest byte of virtual_id */
 		bucket = graph->commits + (node->virtual_id & 0xff);
 	else
 		bucket = graph->commits + node->commit->object.sha1[0];
@@ -249,4 +251,6 @@ void node_list_print(const char *msg, co
 }
 #endif
 
-// vim: sw=8 noet
+/*
+vim: sw=8 noet
+*/
diff --git a/merge-recursive.c b/merge-recursive.c
index 9bbb426..ba5b024 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1,7 +1,7 @@
-//
-// Recursive Merge algorithm stolen from git-merge-recursive.py by
-// Fredrik Kuivinen.
-//
+/*
+ * Recursive Merge algorithm stolen from git-merge-recursive.py by
+ * Fredrik Kuivinen.
+ */
 #include <stdarg.h>
 #include <string.h>
 #include <assert.h>
@@ -47,7 +47,7 @@ struct index_entry
 
 struct index_entry *index_entry_alloc(const char *path)
 {
-	size_t n = strlen(path); // index_entry::path has room for \0
+	size_t n = strlen(path); /* index_entry::path has room for \0 */
 	struct index_entry *p = xmalloc(sizeof(struct index_entry) + n);
 	if ( !p )
 		return NULL;
@@ -102,14 +102,16 @@ static void setup_index(int temp)
 	setenv("GIT_INDEX_FILE", idx, 1);
 }
 
-// This is a global variable which is used in a number of places but
-// only written to in the 'merge' function.
-
-// index_only == 1    => Don't leave any non-stage 0 entries in the cache and
-//                       don't update the working directory.
-//               0    => Leave unmerged entries in the cache and update
-//                       the working directory.
-static int index_only = 0; // cacheOnly
+/*
+ * This is a global variable which is used in a number of places but
+ * only written to in the 'merge' function.
+ *
+ * index_only == 1    => Don't leave any non-stage 0 entries in the cache and
+ *                       don't update the working directory.
+ *               0    => Leave unmerged entries in the cache and update
+ *                       the working directory.
+ */
+static int index_only = 0;
 
 static int git_read_tree(const struct tree *tree)
 {
@@ -157,10 +159,10 @@ struct merge_tree_result merge_trees(str
 
 static int fget_sha1(unsigned char *sha, FILE *fp, int *ch);
 
-// The entry point to the merge code
-
-// Merge the commits h1 and h2, return the resulting virtual
-// commit object and a flag indicating the cleaness of the merge.
+/*
+ * Merge the commits h1 and h2, return the resulting virtual
+ * commit object and a flag indicating the cleaness of the merge.
+ */
 static
 struct merge_result merge(struct node *h1,
 			  struct node *h2,
@@ -262,7 +264,7 @@ typedef int (*read_tree_rt_fn_t)(const c
 				 unsigned mode,
 				 void *data);
 
-// git-ls-tree -r -t <tree>
+/* git-ls-tree -r -t <tree> */
 static int read_tree_rt(struct tree *tree,
 			const char *base,
 			int baselen,
@@ -391,8 +393,10 @@ static int find_entry(const char *sha,
 	return READ_TREE_RECURSIVE;
 }
 
-// Returns a CacheEntry object which doesn't have to correspond to
-// a real cache entry in Git's index.
+/*
+ * Returns a index_entry instance which doesn't have to correspond to
+ * a real cache entry in Git's index.
+ */
 struct index_entry *index_entry_from_db(const char *path,
 					struct tree *o,
 					struct tree *a,
@@ -404,24 +408,18 @@ struct index_entry *index_entry_from_db(
 	data.sha = e->stages[1].sha;
 	data.mode = &e->stages[1].mode;
 	if ( read_tree_rt(o, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
-		// fprintf(stderr, "1: %s:%s not found\n",
-		// 	sha1_to_hex(o->object.sha1), path);
 		memcpy(e->stages[1].sha, null_sha1, 20);
 		e->stages[1].mode = 0;
 	}
 	data.sha = e->stages[2].sha;
 	data.mode = &e->stages[2].mode;
 	if ( read_tree_rt(a, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
-		// fprintf(stderr, "2: %s:%s not found\n",
-		// 	sha1_to_hex(a->object.sha1), path);
 		memcpy(e->stages[2].sha, null_sha1, 20);
 		e->stages[2].mode = 0;
 	}
 	data.sha = e->stages[3].sha;
 	data.mode = &e->stages[3].mode;
 	if ( read_tree_rt(b, "", 0, find_entry, &data) != READ_TREE_FOUND ) {
-		// fprintf(stderr, "3: %s:%s not found\n",
-		// 	sha1_to_hex(b->object.sha1), path);
 		memcpy(e->stages[3].sha, null_sha1, 20);
 		e->stages[3].mode = 0;
 	}
@@ -469,8 +467,11 @@ static int fget_sha1(unsigned char *sha,
 		return -1;
 	return 0;
 }
-// Create a dictionary mapping file names to CacheEntry objects. The
-// dictionary contains one entry for every path with a non-zero stage entry.
+
+/*
+ * Create a dictionary mapping file names to CacheEntry objects. The
+ * dictionary contains one entry for every path with a non-zero stage entry.
+ */
 struct index_entry *unmergedCacheEntries()
 {
 	struct index_entry *unmerged = NULL;
@@ -484,17 +485,17 @@ struct index_entry *unmergedCacheEntries
 		char stage = '0';
 		char path[PATH_MAX];
 		int p;
-		// mode
+		/* mode */
 		if ( fget_mode(&mode, fp, &ch) )
 			goto wait_eol;
 		if ( '\x20' != ch )
 			goto wait_eol;
-		// SHA1
+		/* SHA1 */
 		if ( fget_sha1(sha, fp, &ch) )
 			goto wait_eol;
 		if ( '\x20' != ch )
 			goto wait_eol;
-		// stage
+		/* stage */
 		if ( (ch = fgetc(fp)) != EOF ) {
 			stage = ch;
 			if ( ch < '1' || ch > '3' )
@@ -502,7 +503,7 @@ struct index_entry *unmergedCacheEntries
 		}
 		if ( (ch = fgetc(fp)) == EOF || '\t' != ch )
 			goto wait_eol;
-		// path
+		/* path */
 		for ( p = 0; (ch = fgetc(fp)) != EOF; ++p ) {
 			path[p] = ch;
 			if ( !ch )
@@ -515,7 +516,6 @@ struct index_entry *unmergedCacheEntries
 		}
 		if ( ch )
 			goto wait_eol;
-		// printf("unmerged %08o %s %c %s\n",mode,sha1_to_hex(sha),stage,path);
 		struct index_entry *e = index_entry_get(&unmerged, path);
 		e->stages[stage - '1' + 1].mode = mode;
 		memcpy(e->stages[stage - '1' + 1].sha, sha, 20);
@@ -583,10 +583,12 @@ void free_rename_entries(struct rename_e
 	}
 }
 
-// Get information of all renames which occured between 'oTree' and
-// 'tree'. We need the three trees in the merge ('oTree', 'aTree' and
-// 'bTree') to be able to associate the correct cache entries with
-// the rename information. 'tree' is always equal to either aTree or bTree.
+/*
+ * Get information of all renames which occured between 'oTree' and
+ * 'tree'. We need the three trees in the merge ('oTree', 'aTree' and
+ * 'bTree') to be able to associate the correct cache entries with
+ * the rename information. 'tree' is always equal to either aTree or bTree.
+ */
 struct rename_entry *getRenames(struct tree *tree,
 				struct tree *oTree,
 				struct tree *aTree,
@@ -831,7 +833,6 @@ void updateFileExt(FILE *fp,
 	}
 	if ( updateCache )
 	{
-		// XXX just always use "git update-index --index-info"?
 		fprintf(fp, "%06o %s\t%s", mode, sha1_to_hex(sha), path);
 		fputc('\0', fp);
 	}
@@ -846,8 +847,8 @@ void updateFile(FILE *fp,
 	updateFileExt(fp, sha, mode, path, index_only || clean, !index_only);
 }
 
-// Low level file merging, update and removal
-// ------------------------------------------
+/* Low level file merging, update and removal */
+
 struct merge_file_info
 {
 	unsigned char sha[20];
@@ -991,9 +992,6 @@ int processRenames(struct rename_entry *
 		   const char *branchNameB)
 {
 	int cleanMerge = 1;
-	//    printf("process renames %s:%s -> %s:%s\n",
-	//	   branchNameA, renamesA ? renamesA->src: "(none)",
-	//	   branchNameB, renamesB ? renamesB->dst: "(none)");
 
 	struct path_list srcNames = {NULL, 0, 0};
 	const struct rename_entry *sre;
@@ -1033,7 +1031,7 @@ int processRenames(struct rename_entry *
 		ren1->processed = 1;
 
 		if ( ren2 ) {
-			// Renamed in 1 and renamed in 2
+			/* Renamed in 1 and renamed in 2 */
 			if ( strcmp(ren1->src, ren2->src) != 0 )
 				die("ren1.srcName != ren2.srcName");
 			ren2->dst_entry->processed = 1;
@@ -1097,7 +1095,7 @@ int processRenames(struct rename_entry *
 				updateFile(fp, mfi.clean, mfi.sha, mfi.mode, ren1->dst);
 			}
 		} else {
-			// Renamed in 1, maybe changed in 2
+			/* Renamed in 1, maybe changed in 2 */
 			removeFile(fp, 1, ren1->src);
 
 			unsigned char srcShaOtherBranch[20], dstShaOtherBranch[20];
@@ -1224,15 +1222,15 @@ static int sha_eq(const unsigned char *a
 	return a && b && memcmp(a, b, 20) == 0;
 }
 
-// Per entry merge function
-// ------------------------
-// Merge one cache entry.
+/* Per entry merge function */
 int processEntry(struct index_entry *entry,
 		 const char *branch1Name,
 		 const char *branch2Name)
 {
-	//    printf("processing entry, clean cache: %s\n", index_only ? "yes": "no");
-	//    print_index_entry("\tpath: ", entry);
+	/*
+	printf("processing entry, clean cache: %s\n", index_only ? "yes": "no");
+	print_index_entry("\tpath: ", entry);
+	*/
 	int cleanMerge = 1;
 	const char *path = entry->path;
 	unsigned char *oSha = has_sha(entry->stages[1].sha);
@@ -1244,18 +1242,17 @@ int processEntry(struct index_entry *ent
 	FILE *fp = git_update_index_pipe();
 
 	if ( oSha && (!aSha || !bSha) ) {
-		//
-		// Case A: Deleted in one
-		//
+		/* Case A: Deleted in one */
 		if ( (!aSha && !bSha) ||
 		     (sha_eq(aSha, oSha) && !bSha) ||
 		     (!aSha && sha_eq(bSha, oSha)) ) {
-			// Deleted in both or deleted in one and unchanged in the other
+			/* Deleted in both or deleted in one and
+			 * unchanged in the other */
 			if ( aSha )
 				output("Removing %s", path);
 			removeFile(fp, 1, path);
 		} else {
-			// Deleted in one and changed in the other
+			/* Deleted in one and changed in the other */
 			cleanMerge = 0;
 			if ( !aSha ) {
 				output("CONFLICT (delete/modify): %s deleted in %s "
@@ -1274,9 +1271,7 @@ int processEntry(struct index_entry *ent
 
 	} else if ( (!oSha && aSha && !bSha) ||
 		    (!oSha && !aSha && bSha) ) {
-		//
-		// Case B: Added in one.
-		//
+		/* Case B: Added in one. */
 		const char *addBranch;
 		const char *otherBranch;
 		unsigned mode;
@@ -1309,9 +1304,7 @@ int processEntry(struct index_entry *ent
 			updateFile(fp, 1, sha, mode, path);
 		}
 	} else if ( !oSha && aSha && bSha ) {
-		//
-		// Case C: Added in both (check for same permissions).
-		//
+		/* Case C: Added in both (check for same permissions). */
 		if ( sha_eq(aSha, bSha) ) {
 			if ( aMode != bMode ) {
 				cleanMerge = 0;
@@ -1321,7 +1314,7 @@ int processEntry(struct index_entry *ent
 				output("CONFLICT: adding with permission: %06o", aMode);
 				updateFile(fp, 0, aSha, aMode, path);
 			} else {
-				// This case is handled by git-read-tree
+				/* This case is handled by git-read-tree */
 				assert(0 && "This case must be handled by git-read-tree");
 			}
 		} else {
@@ -1337,9 +1330,7 @@ int processEntry(struct index_entry *ent
 		}
 
 	} else if ( oSha && aSha && bSha ) {
-		//
-		// case D: Modified in both, but differently.
-		//
+		/* case D: Modified in both, but differently. */
 		output("Auto-merging %s\n", path);
 		struct merge_file_info mfi;
 		mfi = mergeFile(path, oSha, oMode,
@@ -1472,15 +1463,15 @@ struct graph *graph_build(struct node_li
 			break;
 		if (EOF == ch)
 			break;
-		// a commit
+		/* a commit */
 		struct node *node = graph_node_bysha(graph, sha);
 		if (!node)
 		{
 			node = node_alloc(lookup_commit(sha));
 			graph_add_node(graph, node);
 		}
-		// ...and its parents. I assume a parent cannot be mentioned
-		// before the children.
+		/* ...and its parents. I assume a parent cannot be mentioned
+		   before the children. */
 		struct node_list *parents = NULL;
 		while ('\n' != ch) {
 			if (fget_sha1(sha, fp, &ch)) {
@@ -1573,4 +1564,6 @@ int main(int argc, char *argv[])
 	return result.clean ? 0: 1;
 }
 
-// vim: sw=8 noet
+/*
+vim: sw=8 noet
+*/
diff --git a/path-list.c b/path-list.c
index fbfc103..95395ab 100644
--- a/path-list.c
+++ b/path-list.c
@@ -67,7 +67,7 @@ int path_list_has_path(const struct path
 	return exact_match;
 }
 
-// in place
+/* in place */
 void path_list_union_update(struct path_list *dst, const struct path_list *src)
 {
 	char **new_paths;
-- 
1.4.1.rc1.g8b09-dirty


[-- Attachment #3: 0002-fix-processEntries-to-avoid-multiple-calls-to-write-tree-and-update-index.txt --]
[-- Type: text/plain, Size: 5295 bytes --]

From b2fbfc22526fb2a1f56ef56836b7412c53524e60 Mon Sep 17 00:00:00 2001
From: Riesen <ARiesen@oekap852.hbi.ad.harman.com>
Date: Wed, 28 Jun 2006 10:45:05 +0200
Subject: fix processEntries to avoid multiple calls to write-tree and update-index
---
 merge-recursive.c |   47 ++++++++++++++++++++++++++++++++++++-----------
 1 files changed, 36 insertions(+), 11 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index ba5b024..5b7ed51 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -19,6 +19,13 @@ #include "run-command.h"
 #include "graph.h"
 #include "path-list.h"
 
+#define DEBUG
+#ifdef DEBUG
+#define debug(args, ...) fprintf(stderr, args, ## __VA_ARGS__)
+#else
+#define debug(args, ...)
+#endif
+
 #define for_each_commit(p,list) for ( p = (list); p; p = p->next )
 
 struct merge_result
@@ -345,9 +352,14 @@ int getFilesAndDirs(struct tree *tree,
 	path_list_clear(dirs, 1);
 	data.files = files;
 	data.dirs = dirs;
-	if ( read_tree_rt(tree, "", 0, save_files_dirs, &data) != 0 )
+	debug("getFilesAndDirs ...\n");
+	if ( read_tree_rt(tree, "", 0, save_files_dirs, &data) != 0 ) {
+		debug("\tgetFilesAndDirs done (0)\n");
 		return 0;
-	return path_list_count(files) + path_list_count(dirs);
+	}
+	int n = path_list_count(files) + path_list_count(dirs);
+	debug("\tgetFilesAndDirs done (%d)\n", n);
+	return n;
 }
 
 struct index_entry *index_entry_find(struct index_entry *ents, const char *path)
@@ -478,6 +490,7 @@ struct index_entry *unmergedCacheEntries
 	FILE *fp = popen("git-ls-files -z --unmerged", "r");
 	if ( !fp )
 		return NULL;
+	debug("unmergedCacheEntries...\n");
 	int ch;
 	while ( !feof(fp) ) {
 		unsigned mode;
@@ -524,6 +537,7 @@ struct index_entry *unmergedCacheEntries
 		while ( (ch = fgetc(fp)) != EOF && ch );
 	}
 	pclose(fp);
+	debug("\tunmergedCacheEntries done\n");
 	return unmerged;
 }
 
@@ -595,6 +609,7 @@ struct rename_entry *getRenames(struct t
 				struct tree *bTree,
 				struct index_entry **entries)
 {
+	debug("getRenames ...\n");
 	struct rename_entry *renames = NULL;
 	struct rename_entry **rptr = &renames;
 	struct diff_options opts;
@@ -639,6 +654,7 @@ struct rename_entry *getRenames(struct t
 	}
 	opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_flush(&opts);
+	debug("\tgetRenames done\n");
 	return renames;
 }
 
@@ -991,6 +1007,7 @@ int processRenames(struct rename_entry *
 		   const char *branchNameA,
 		   const char *branchNameB)
 {
+	debug("processRenames...\n");
 	int cleanMerge = 1;
 
 	struct path_list srcNames = {NULL, 0, 0};
@@ -1207,6 +1224,7 @@ int processRenames(struct rename_entry *
 	if (pclose(fp)) {
 		die("git update-index --index-info failed");
 	}
+	debug("\tprocessRenames done\n");
 	return cleanMerge;
 }
 
@@ -1223,7 +1241,8 @@ static int sha_eq(const unsigned char *a
 }
 
 /* Per entry merge function */
-int processEntry(struct index_entry *entry,
+int processEntry(FILE *fp,
+		 struct index_entry *entry,
 		 const char *branch1Name,
 		 const char *branch2Name)
 {
@@ -1239,7 +1258,6 @@ int processEntry(struct index_entry *ent
 	unsigned oMode = entry->stages[1].mode;
 	unsigned aMode = entry->stages[2].mode;
 	unsigned bMode = entry->stages[3].mode;
-	FILE *fp = git_update_index_pipe();
 
 	if ( oSha && (!aSha || !bSha) ) {
 		/* Case A: Deleted in one */
@@ -1353,8 +1371,6 @@ int processEntry(struct index_entry *ent
 	} else
 		die("Fatal merge failure, shouldn't happen.");
 
-	if (pclose(fp))
-		die("updating entry failed in git update-index");
 	return cleanMerge;
 }
 
@@ -1373,6 +1389,7 @@ struct merge_tree_result merge_trees(str
 		return result;
 	}
 
+	debug("merge_trees ...\n");
 	code = git_merge_trees(index_only ? "-i": "-u", common, head, merge);
 
 	if ( code != 0 )
@@ -1397,17 +1414,22 @@ struct merge_tree_result merge_trees(str
 		renamesMerge = getRenames(merge, common, head, merge, &entries);
 		result.clean = processRenames(renamesHead, renamesMerge,
 					      branch1Name, branch2Name);
+		debug("\tprocessing entries...\n");
+		FILE *fp = git_update_index_pipe();
 		struct index_entry *e;
 		for ( e = entries; e; e = e->next ) {
 			if ( e->processed )
 				continue;
-			if ( !processEntry(e, branch1Name, branch2Name) )
+			if ( !processEntry(fp, e, branch1Name, branch2Name) )
 				result.clean = 0;
-			if ( result.clean || index_only )
-				result.tree = git_write_tree();
-			else
-				result.tree = NULL;
 		}
+		if (pclose(fp))
+			die("updating entry failed in git update-index");
+		if ( result.clean || index_only )
+			result.tree = git_write_tree();
+		else
+			result.tree = NULL;
+		debug("\t\tprocessing entries done\n");
 		free_rename_entries(&renamesMerge);
 		free_rename_entries(&renamesHead);
 		free_index_entries(&entries);
@@ -1419,6 +1441,7 @@ struct merge_tree_result merge_trees(str
 		       sha1_to_hex(result.tree->object.sha1));
 	}
 
+	debug("\tmerge_trees done\n");
 	return result;
 }
 
@@ -1558,7 +1581,9 @@ int main(int argc, char *argv[])
 		struct node_list *commits = NULL;
 		node_list_insert(h1, &commits);
 		node_list_insert(h2, &commits->next);
+		debug("building graph...\n");
 		struct graph *graph = graph_build(commits);
+		debug("\tbuilding graph...\n");
 		result = merge(h1, h2, branch1, branch2, graph, 0, NULL);
 	}
 	return result.clean ? 0: 1;
-- 
1.4.1.rc1.g8b09-dirty


^ permalink raw reply related

* Re: [PATCH] Makefile: set USE_PIC on Linux x86_64 for linking with Git.pm
From: Josef Weidendorfer @ 2006-06-29  9:58 UTC (permalink / raw)
  To: Sergey Vlasov; +Cc: Junio C Hamano, Marco Roeland, git, pasky
In-Reply-To: <20060629130400.c280de67.vsu@altlinux.ru>

On Thursday 29 June 2006 11:04, Sergey Vlasov wrote:
> And the effect of -fPIC on x86_64 is smaller than on i386, because
> x86_64 has 2x more registers than i386, therefore loss of one register
> is less noticeable.

According to the x86_64 ABI, x86_64 has no explicit GOT pointer.
Meaning: no additional register needed at all, as x86_64 has IP-relative
addressing. Thus, compiling with -fPIC on x86_64 probably has no
negative implications at all (?).

Josef

^ permalink raw reply

* [PATCH] autoconf: Use autoconf to check for libraries: openssl/crypto, curl, expat
From: Jakub Narebski @ 2006-06-29 11:59 UTC (permalink / raw)
  To: git
In-Reply-To: <200606290301.51657.jnareb@gmail.com>

./configure script checks now if the following libraries are present:
 * -lcrypto (checks for SHA1_Init, sets NO_OPENSSL=YesPlease if not found)
 * -lcurl   (checks for curl_easy_setopt, sets NO_CURL=YesPlease if not found)
 * -lexpat  (checks for XML_ParserCreate, sets NO_EXPAT=YesPlease if not found)

Appropriate lines in config.mak are generated using MY_APPEND_LINE macro by adding
lines to temporary file config.mak.append

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

Second patch in series introducing nonintrusive autoconf support to
git build process.

I'm not that sure about -lcrypto equals openssl.

 configure.ac |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/configure.ac b/configure.ac
index 4003ff6..55d7a9b 100644
--- a/configure.ac
+++ b/configure.ac
@@ -6,6 +6,21 @@ AC_INIT([git], [1.4.0], [git@vger.kernel
 
 AC_CONFIG_SRCDIR([git.c])
 
+# Definitions of macros
+# MY_APPEND_LINE(LINE)
+# --------------------
+# Append LINE to file config.mak.append
+AC_DEFUN([MY_APPEND_LINE],
+[[echo "$1" >> config.mak.append]])# AC_APPEND_LINE
+
+# Checks for libraries.
+AC_MSG_NOTICE(CHECKS for libraries)
+AC_CHECK_LIB([crypto], [SHA1_Init],,MY_APPEND_LINE(NO_OPENSSL=YesPlease))
+AC_CHECK_LIB([curl], [curl_easy_setopt],,MY_APPEND_LINE(NO_CURL=YesPlease))
+AC_CHECK_LIB([expat], [XML_ParserCreate],,MY_APPEND_LINE(NO_EXPAT=YesPlease))
+
 # Output files
-AC_CONFIG_FILES([config.mak])
+AC_CONFIG_FILES([config.mak:config.mak.in:config.mak.append], 
+[rm -f config.mak.append], 
+[echo >> config.mak.append])
 AC_OUTPUT
-- 
1.4.0

^ permalink raw reply related

* Re: [PATCH] autoconf: Use autoconf to write installation directories to config.mak
From: Matthias Lederhofer @ 2006-06-29 12:46 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200606290301.51657.jnareb@gmail.com>

> This is beginning of patch series introducing installation configuration
> using autoconf (and no other autotools) to git. The idea is to generate
> config.mak using ./configure (generated from configure.ac) from
> config.mak.in, so one can use autoconf as an _alternative_ to ordinary
> Makefile, and creating one's own config.mak.

Are you sure this should be named config.mak? From INSTALL:
> You can place local settings in config.mak and the Makefile
> will include them.  Note that config.mak is not distributed;
> the name is reserved for local settings.

So with another filename either include it
- before config.mak: the user may override ./configure options with
  config.mak
- after config.mak: ./configure overrides config.mak
At least do not overwrite config.mak if it exists.

^ permalink raw reply

* Re: CFT: merge-recursive in C (updated)
From: Johannes Schindelin @ 2006-06-29 13:16 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <20060627223249.GA8177@steel.home>

Hi,

On Wed, 28 Jun 2006, Alex Riesen wrote:

> > > Path list optimization should be next (and I'll be glad if someone does 
> > > this before me).
> > 
> > See below.
> > 
> 
> Aah, thanks. Merged, tried, tested, left in patch.
> The reallocs can cause some undesirable heap fragmentation, don't you
> think?

I do not care too deeply about the heap fragmentation there: it is just 4 
bytes per file name. Besides, I chose an increment of 32 (probably I 
should use alloc_nr() instead).

> I tried to replace that code completely with a call to git-merge-base
> (it does not happen too often). So far it passed all tests.
> 
> > I have some commits in a private branch to split out get_merge_bases() 
> > from merge-base.c, so I'll try that next.
> 
> Thanks, that'd be nice.

Will reply with two patches.

> > BTW, before actually finishing this, we might want to do away with 
> > capitalizedFunctionNames and 4-space indent.
> 
> 4-space indent should be gone by now, the names pending (they were
> important in initial debugging after conversion).

Okay.

Ciao,
Dscho

^ permalink raw reply

* [PATCH 1/2] refactor merge_bases() as preparation to libify merge-base
From: Johannes Schindelin @ 2006-06-29 13:16 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <20060627223249.GA8177@steel.home>


---
 merge-base.c |   64 ++++++++++++++++++++++++++++++++++++++++++----------------
 1 files changed, 46 insertions(+), 18 deletions(-)

diff --git a/merge-base.c b/merge-base.c
index 4856ca0..7d87c20 100644
--- a/merge-base.c
+++ b/merge-base.c
@@ -124,8 +124,6 @@ static struct commit *interesting(struct
  * to contaminate D and E.
  */
 
-static int show_all = 0;
-
 static void mark_reachable_commits(struct commit_list *result,
 				   struct commit_list *list)
 {
@@ -167,34 +165,33 @@ static void mark_reachable_commits(struc
 	}
 }
 
-static int merge_base(struct commit *rev1, struct commit *rev2)
+struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2)
 {
 	struct commit_list *list = NULL;
 	struct commit_list *result = NULL;
 	struct commit_list *tmp = NULL;
 
-	if (rev1 == rev2) {
-		printf("%s\n", sha1_to_hex(rev1->object.sha1));
-		return 0;
-	}
+	if (rev1 == rev2)
+		return commit_list_insert(rev1, &result);
 
 	parse_commit(rev1);
 	parse_commit(rev2);
 
-	rev1->object.flags |= 1;
-	rev2->object.flags |= 2;
+	rev1->object.flags |= PARENT1;
+	rev2->object.flags |= PARENT2;
 	insert_by_date(rev1, &list);
 	insert_by_date(rev2, &list);
 
 	while (interesting(list)) {
 		struct commit *commit = list->item;
 		struct commit_list *parents;
-		int flags = commit->object.flags & 7;
+		int flags = commit->object.flags
+			& (PARENT1 | PARENT2 | UNINTERESTING);
 
 		tmp = list;
 		list = list->next;
 		free(tmp);
-		if (flags == 3) {
+		if (flags == (PARENT1 | PARENT2)) {
 			insert_by_date(commit, &result);
 
 			/* Mark parents of a found merge uninteresting */
@@ -213,21 +210,52 @@ static int merge_base(struct commit *rev
 	}
 
 	if (!result)
-		return 1;
+		return NULL;
 
 	if (result->next && list)
 		mark_reachable_commits(result, list);
 
+	/* cull duplicates */
+	for (tmp = result, list = NULL; tmp; ) {
+		struct commit *commit = tmp->item;
+		struct commit_list *next = tmp->next;
+		if (commit->object.flags & UNINTERESTING) {
+			if (list != NULL)
+				list->next = next;
+			free(tmp);
+		} else {
+			if (list == NULL)
+				result = tmp;
+			list = tmp;
+			commit->object.flags |= UNINTERESTING;
+		}
+
+		tmp = next;
+	}
+
+	/* reset flags */
+	clear_commit_marks(rev1, PARENT1 | PARENT2 | UNINTERESTING);
+	clear_commit_marks(rev2, PARENT1 | PARENT2 | UNINTERESTING);
+
+	return result;
+}
+
+static int show_all = 0;
+
+static int merge_base(struct commit *rev1, struct commit *rev2)
+{
+	struct commit_list *result = get_merge_bases(rev1, rev2);
+
+	if (!result)
+		return 1;
+
 	while (result) {
-		struct commit *commit = result->item;
-		result = result->next;
-		if (commit->object.flags & UNINTERESTING)
-			continue;
-		printf("%s\n", sha1_to_hex(commit->object.sha1));
+		printf("%s\n", sha1_to_hex(result->item->object.sha1));
 		if (!show_all)
 			return 0;
-		commit->object.flags |= UNINTERESTING;
+		result = result->next;
 	}
+
 	return 0;
 }
 
-- 
1.4.1.rc1.g87c00-dirty

^ permalink raw reply related

* [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Johannes Schindelin @ 2006-06-29 13:17 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <20060627223249.GA8177@steel.home>


most of this patch is just a "sub-file rename", i.e. moving code
literally (sue me, SCO!) from merge-base.c to commit.c.

---
 commit.c          |  240 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 commit.h          |    2 
 merge-base.c      |  238 -----------------------------------------------------
 merge-recursive.c |   23 ++---
 4 files changed, 250 insertions(+), 253 deletions(-)

diff --git a/commit.c b/commit.c
index 946615d..891c6d0 100644
--- a/commit.c
+++ b/commit.c
@@ -842,3 +842,243 @@ void sort_in_topological_order_fn(struct
 	}
 	free(nodes);
 }
+
+/* merge-rebase stuff */
+
+#define PARENT1 1
+#define PARENT2 2
+#define UNINTERESTING 4
+
+static struct commit *interesting(struct commit_list *list)
+{
+	while (list) {
+		struct commit *commit = list->item;
+		list = list->next;
+		if (commit->object.flags & UNINTERESTING)
+			continue;
+		return commit;
+	}
+	return NULL;
+}
+
+/*
+ * A pathological example of how this thing works.
+ *
+ * Suppose we had this commit graph, where chronologically
+ * the timestamp on the commit are A <= B <= C <= D <= E <= F
+ * and we are trying to figure out the merge base for E and F
+ * commits.
+ *
+ *                  F
+ *                 / \
+ *            E   A   D
+ *             \ /   /  
+ *              B   /
+ *               \ /
+ *                C
+ *
+ * First we push E and F to list to be processed.  E gets bit 1
+ * and F gets bit 2.  The list becomes:
+ *
+ *     list=F(2) E(1), result=empty
+ *
+ * Then we pop F, the newest commit, from the list.  Its flag is 2.
+ * We scan its parents, mark them reachable from the side that F is
+ * reachable from, and push them to the list:
+ *
+ *     list=E(1) D(2) A(2), result=empty
+ *
+ * Next pop E and do the same.
+ *
+ *     list=D(2) B(1) A(2), result=empty
+ *
+ * Next pop D and do the same.
+ *
+ *     list=C(2) B(1) A(2), result=empty
+ *
+ * Next pop C and do the same.
+ *
+ *     list=B(1) A(2), result=empty
+ *
+ * Now it is B's turn.  We mark its parent, C, reachable from B's side,
+ * and push it to the list:
+ *
+ *     list=C(3) A(2), result=empty
+ *
+ * Now pop C and notice it has flags==3.  It is placed on the result list,
+ * and the list now contains:
+ *
+ *     list=A(2), result=C(3)
+ *
+ * We pop A and do the same.
+ * 
+ *     list=B(3), result=C(3)
+ *
+ * Next, we pop B and something very interesting happens.  It has flags==3
+ * so it is also placed on the result list, and its parents are marked
+ * uninteresting, retroactively, and placed back on the list:
+ *
+ *    list=C(7), result=C(7) B(3)
+ * 
+ * Now, list does not have any interesting commit.  So we find the newest
+ * commit from the result list that is not marked uninteresting.  Which is
+ * commit B.
+ *
+ *
+ * Another pathological example how this thing used to fail to mark an
+ * ancestor of a merge base as UNINTERESTING before we introduced the
+ * postprocessing phase (mark_reachable_commits).
+ *
+ *		  2
+ *		  H
+ *	    1    / \
+ *	    G   A   \
+ *	    |\ /     \ 
+ *	    | B       \
+ *	    |  \       \
+ *	     \  C       F
+ *	      \  \     / 
+ *	       \  D   /   
+ *		\ |  /
+ *		 \| /
+ *		  E
+ *
+ *	 list			A B C D E F G H
+ *	 G1 H2			- - - - - - 1 2
+ *	 H2 E1 B1		- 1 - - 1 - 1 2
+ *	 F2 E1 B1 A2		2 1 - - 1 2 1 2
+ *	 E3 B1 A2		2 1 - - 3 2 1 2
+ *	 B1 A2			2 1 - - 3 2 1 2
+ *	 C1 A2			2 1 1 - 3 2 1 2
+ *	 D1 A2			2 1 1 1 3 2 1 2
+ *	 A2			2 1 1 1 3 2 1 2
+ *	 B3			2 3 1 1 3 2 1 2
+ *	 C7			2 3 7 1 3 2 1 2
+ *
+ * At this point, unfortunately, everybody in the list is
+ * uninteresting, so we fail to complete the following two
+ * steps to fully marking uninteresting commits.
+ *
+ *	 D7			2 3 7 7 3 2 1 2
+ *	 E7			2 3 7 7 7 2 1 2
+ *
+ * and we ended up showing E as an interesting merge base.
+ * The postprocessing phase re-injects C and continues traversal
+ * to contaminate D and E.
+ */
+
+static void mark_reachable_commits(struct commit_list *result,
+				   struct commit_list *list)
+{
+	struct commit_list *tmp;
+
+	/*
+	 * Postprocess to fully contaminate the well.
+	 */
+	for (tmp = result; tmp; tmp = tmp->next) {
+		struct commit *c = tmp->item;
+		/* Reinject uninteresting ones to list,
+		 * so we can scan their parents.
+		 */
+		if (c->object.flags & UNINTERESTING)
+			commit_list_insert(c, &list);
+	}
+	while (list) {
+		struct commit *c = list->item;
+		struct commit_list *parents;
+
+		tmp = list;
+		list = list->next;
+		free(tmp);
+
+		/* Anything taken out of the list is uninteresting, so
+		 * mark all its parents uninteresting.  We do not
+		 * parse new ones (we already parsed all the relevant
+		 * ones).
+		 */
+		parents = c->parents;
+		while (parents) {
+			struct commit *p = parents->item;
+			parents = parents->next;
+			if (!(p->object.flags & UNINTERESTING)) {
+				p->object.flags |= UNINTERESTING;
+				commit_list_insert(p, &list);
+			}
+		}
+	}
+}
+
+struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2)
+{
+	struct commit_list *list = NULL;
+	struct commit_list *result = NULL;
+	struct commit_list *tmp = NULL;
+
+	if (rev1 == rev2)
+		return commit_list_insert(rev1, &result);
+
+	parse_commit(rev1);
+	parse_commit(rev2);
+
+	rev1->object.flags |= PARENT1;
+	rev2->object.flags |= PARENT2;
+	insert_by_date(rev1, &list);
+	insert_by_date(rev2, &list);
+
+	while (interesting(list)) {
+		struct commit *commit = list->item;
+		struct commit_list *parents;
+		int flags = commit->object.flags
+			& (PARENT1 | PARENT2 | UNINTERESTING);
+
+		tmp = list;
+		list = list->next;
+		free(tmp);
+		if (flags == (PARENT1 | PARENT2)) {
+			insert_by_date(commit, &result);
+
+			/* Mark parents of a found merge uninteresting */
+			flags |= UNINTERESTING;
+		}
+		parents = commit->parents;
+		while (parents) {
+			struct commit *p = parents->item;
+			parents = parents->next;
+			if ((p->object.flags & flags) == flags)
+				continue;
+			parse_commit(p);
+			p->object.flags |= flags;
+			insert_by_date(p, &list);
+		}
+	}
+
+	if (!result)
+		return NULL;
+
+	if (result->next && list)
+		mark_reachable_commits(result, list);
+
+	/* cull duplicates */
+	for (tmp = result, list = NULL; tmp; ) {
+		struct commit *commit = tmp->item;
+		struct commit_list *next = tmp->next;
+		if (commit->object.flags & UNINTERESTING) {
+			if (list != NULL)
+				list->next = next;
+			free(tmp);
+		} else {
+			if (list == NULL)
+				result = tmp;
+			list = tmp;
+			commit->object.flags |= UNINTERESTING;
+		}
+
+		tmp = next;
+	}
+
+	/* reset flags */
+	clear_commit_marks(rev1, PARENT1 | PARENT2 | UNINTERESTING);
+	clear_commit_marks(rev2, PARENT1 | PARENT2 | UNINTERESTING);
+
+	return result;
+}
diff --git a/commit.h b/commit.h
index 7c9ca3f..89b9dad 100644
--- a/commit.h
+++ b/commit.h
@@ -105,4 +105,6 @@ struct commit_graft *read_graft_line(cha
 int register_commit_graft(struct commit_graft *, int);
 int read_graft_file(const char *graft_file);
 
+extern struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2);
+
 #endif /* COMMIT_H */
diff --git a/merge-base.c b/merge-base.c
index 7d87c20..b41f76c 100644
--- a/merge-base.c
+++ b/merge-base.c
@@ -2,244 +2,6 @@ #include <stdlib.h>
 #include "cache.h"
 #include "commit.h"
 
-#define PARENT1 1
-#define PARENT2 2
-#define UNINTERESTING 4
-
-static struct commit *interesting(struct commit_list *list)
-{
-	while (list) {
-		struct commit *commit = list->item;
-		list = list->next;
-		if (commit->object.flags & UNINTERESTING)
-			continue;
-		return commit;
-	}
-	return NULL;
-}
-
-/*
- * A pathological example of how this thing works.
- *
- * Suppose we had this commit graph, where chronologically
- * the timestamp on the commit are A <= B <= C <= D <= E <= F
- * and we are trying to figure out the merge base for E and F
- * commits.
- *
- *                  F
- *                 / \
- *            E   A   D
- *             \ /   /  
- *              B   /
- *               \ /
- *                C
- *
- * First we push E and F to list to be processed.  E gets bit 1
- * and F gets bit 2.  The list becomes:
- *
- *     list=F(2) E(1), result=empty
- *
- * Then we pop F, the newest commit, from the list.  Its flag is 2.
- * We scan its parents, mark them reachable from the side that F is
- * reachable from, and push them to the list:
- *
- *     list=E(1) D(2) A(2), result=empty
- *
- * Next pop E and do the same.
- *
- *     list=D(2) B(1) A(2), result=empty
- *
- * Next pop D and do the same.
- *
- *     list=C(2) B(1) A(2), result=empty
- *
- * Next pop C and do the same.
- *
- *     list=B(1) A(2), result=empty
- *
- * Now it is B's turn.  We mark its parent, C, reachable from B's side,
- * and push it to the list:
- *
- *     list=C(3) A(2), result=empty
- *
- * Now pop C and notice it has flags==3.  It is placed on the result list,
- * and the list now contains:
- *
- *     list=A(2), result=C(3)
- *
- * We pop A and do the same.
- * 
- *     list=B(3), result=C(3)
- *
- * Next, we pop B and something very interesting happens.  It has flags==3
- * so it is also placed on the result list, and its parents are marked
- * uninteresting, retroactively, and placed back on the list:
- *
- *    list=C(7), result=C(7) B(3)
- * 
- * Now, list does not have any interesting commit.  So we find the newest
- * commit from the result list that is not marked uninteresting.  Which is
- * commit B.
- *
- *
- * Another pathological example how this thing used to fail to mark an
- * ancestor of a merge base as UNINTERESTING before we introduced the
- * postprocessing phase (mark_reachable_commits).
- *
- *		  2
- *		  H
- *	    1    / \
- *	    G   A   \
- *	    |\ /     \ 
- *	    | B       \
- *	    |  \       \
- *	     \  C       F
- *	      \  \     / 
- *	       \  D   /   
- *		\ |  /
- *		 \| /
- *		  E
- *
- *	 list			A B C D E F G H
- *	 G1 H2			- - - - - - 1 2
- *	 H2 E1 B1		- 1 - - 1 - 1 2
- *	 F2 E1 B1 A2		2 1 - - 1 2 1 2
- *	 E3 B1 A2		2 1 - - 3 2 1 2
- *	 B1 A2			2 1 - - 3 2 1 2
- *	 C1 A2			2 1 1 - 3 2 1 2
- *	 D1 A2			2 1 1 1 3 2 1 2
- *	 A2			2 1 1 1 3 2 1 2
- *	 B3			2 3 1 1 3 2 1 2
- *	 C7			2 3 7 1 3 2 1 2
- *
- * At this point, unfortunately, everybody in the list is
- * uninteresting, so we fail to complete the following two
- * steps to fully marking uninteresting commits.
- *
- *	 D7			2 3 7 7 3 2 1 2
- *	 E7			2 3 7 7 7 2 1 2
- *
- * and we ended up showing E as an interesting merge base.
- * The postprocessing phase re-injects C and continues traversal
- * to contaminate D and E.
- */
-
-static void mark_reachable_commits(struct commit_list *result,
-				   struct commit_list *list)
-{
-	struct commit_list *tmp;
-
-	/*
-	 * Postprocess to fully contaminate the well.
-	 */
-	for (tmp = result; tmp; tmp = tmp->next) {
-		struct commit *c = tmp->item;
-		/* Reinject uninteresting ones to list,
-		 * so we can scan their parents.
-		 */
-		if (c->object.flags & UNINTERESTING)
-			commit_list_insert(c, &list);
-	}
-	while (list) {
-		struct commit *c = list->item;
-		struct commit_list *parents;
-
-		tmp = list;
-		list = list->next;
-		free(tmp);
-
-		/* Anything taken out of the list is uninteresting, so
-		 * mark all its parents uninteresting.  We do not
-		 * parse new ones (we already parsed all the relevant
-		 * ones).
-		 */
-		parents = c->parents;
-		while (parents) {
-			struct commit *p = parents->item;
-			parents = parents->next;
-			if (!(p->object.flags & UNINTERESTING)) {
-				p->object.flags |= UNINTERESTING;
-				commit_list_insert(p, &list);
-			}
-		}
-	}
-}
-
-struct commit_list *get_merge_bases(struct commit *rev1, struct commit *rev2)
-{
-	struct commit_list *list = NULL;
-	struct commit_list *result = NULL;
-	struct commit_list *tmp = NULL;
-
-	if (rev1 == rev2)
-		return commit_list_insert(rev1, &result);
-
-	parse_commit(rev1);
-	parse_commit(rev2);
-
-	rev1->object.flags |= PARENT1;
-	rev2->object.flags |= PARENT2;
-	insert_by_date(rev1, &list);
-	insert_by_date(rev2, &list);
-
-	while (interesting(list)) {
-		struct commit *commit = list->item;
-		struct commit_list *parents;
-		int flags = commit->object.flags
-			& (PARENT1 | PARENT2 | UNINTERESTING);
-
-		tmp = list;
-		list = list->next;
-		free(tmp);
-		if (flags == (PARENT1 | PARENT2)) {
-			insert_by_date(commit, &result);
-
-			/* Mark parents of a found merge uninteresting */
-			flags |= UNINTERESTING;
-		}
-		parents = commit->parents;
-		while (parents) {
-			struct commit *p = parents->item;
-			parents = parents->next;
-			if ((p->object.flags & flags) == flags)
-				continue;
-			parse_commit(p);
-			p->object.flags |= flags;
-			insert_by_date(p, &list);
-		}
-	}
-
-	if (!result)
-		return NULL;
-
-	if (result->next && list)
-		mark_reachable_commits(result, list);
-
-	/* cull duplicates */
-	for (tmp = result, list = NULL; tmp; ) {
-		struct commit *commit = tmp->item;
-		struct commit_list *next = tmp->next;
-		if (commit->object.flags & UNINTERESTING) {
-			if (list != NULL)
-				list->next = next;
-			free(tmp);
-		} else {
-			if (list == NULL)
-				result = tmp;
-			list = tmp;
-			commit->object.flags |= UNINTERESTING;
-		}
-
-		tmp = next;
-	}
-
-	/* reset flags */
-	clear_commit_marks(rev1, PARENT1 | PARENT2 | UNINTERESTING);
-	clear_commit_marks(rev2, PARENT1 | PARENT2 | UNINTERESTING);
-
-	return result;
-}
-
 static int show_all = 0;
 
 static int merge_base(struct commit *rev1, struct commit *rev2)
diff --git a/merge-recursive.c b/merge-recursive.c
index 9bbb426..c0de38a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -186,22 +186,15 @@ struct merge_result merge(struct node *h
 		node_list_insert(ancestor, &ca);
 	else {
 		struct node_list **pca = &ca;
-		char cmd[100];
-		sprintf(cmd, "git-merge-base --all %s %s",
-			node_hex_sha1(h1),
-			node_hex_sha1(h2));
-		FILE *fp = popen(cmd, "r");
-		while (!feof(fp)) {
-			unsigned char sha1[20];
-			int ch;
-			if (fget_sha1(sha1, fp, &ch) == 0) {
-				struct node *n;
-				n = node_alloc(lookup_commit(sha1));
-				node_list_insert(n, pca);
-				pca = &(*pca)->next;
-			}
+		struct commit_list *iter, *merge_bases
+			= get_merge_bases(h1->commit, h2->commit);
+
+		for (iter = merge_bases; iter; iter = iter->next) {
+			struct node *n = node_alloc(iter->item);
+			node_list_insert(n, pca);
+			pca = &(*pca)->next;
 		}
-		pclose(fp);
+		free_commit_list(merge_bases);
 	}
 
 	output("found %u common ancestor(s):", node_list_count(ca));
-- 
1.4.1.rc1.g87c00-dirty

^ permalink raw reply related

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Johannes Schindelin @ 2006-06-29 13:21 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <Pine.LNX.4.63.0606291517010.29667@wbgn013.biozentrum.uni-wuerzburg.de>

Hi,

Aargh! Of course this is [PATCH 2/2]. BTW, no signoff, since the whole 
merge-recursive is not mergeable yet (passes the tests, but we have a 
small way to go).

Ciao,
Dscho

^ permalink raw reply

* [RFC/PATCH] autoconf: Use autoconf to check for some types and library functions
From: Jakub Narebski @ 2006-06-29 13:36 UTC (permalink / raw)
  To: git
In-Reply-To: <200606291359.43640.jnareb@gmail.com>

./configure script checks now for existence of the following types
and structure members:
 * dirent.d_ino  in dirent.h (NO_D_INO_IN_DIRENT)
 * dirent.d_type in dirent.h (NO_D_TYPE_IN_DIRENT)
 * 'struct sockaddr_storage' in netinet/in.h (NO_SOCKADDR_STORAGE)

./configure script checks now for the following library functions:
 * strcasestr (NO_STRCASESTR)
 * strlcpy (NO_STRLCPY)
 * setenv (NO_SETENV)
in default C library and in libraries which have AC_CHECK_LIB done for
them (crypto, curl, expat).

NOTE: not all checks are implemented!

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---

This patch needs review by someone better versed in compiling git on
different platforms, namely AC_CHECK_MEMBER and AC_CHECK_TYPE needs
checking if all header files where git search for specified structure
member or specified type.

I don't know (yet) how to implement checking for NEEDS_SSL_WITH_CRYPTO
(probably also checlking for crypto library needs correction),
NEEDS_LIBICONV, NEEDS_SOCKET, NO_IPV6, NO_ICONV, Python < 2.3 and
Python == 2.3, if to check for NO_MMAP, and for NO_ACCURATE_DIFF.

This patch is to be considered preliminary!

 configure.ac |   28 ++++++++++++++++++++++++++--
 1 files changed, 26 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index 55d7a9b..fbd46e2 100644
--- a/configure.ac
+++ b/configure.ac
@@ -14,17 +14,41 @@ AC_CONFIG_SRCDIR([git.c])
 AC_DEFUN([MY_APPEND_LINE],
 [[echo "$1" >> config.mak.append]])# AC_APPEND_LINE
 
+
 # Checks for libraries.
-AC_MSG_NOTICE(CHECKS for libraries)
+AC_MSG_NOTICE([CHECKS for libraries])
 AC_CHECK_LIB([crypto], [SHA1_Init],,MY_APPEND_LINE(NO_OPENSSL=YesPlease))
 AC_CHECK_LIB([curl], [curl_easy_setopt],,MY_APPEND_LINE(NO_CURL=YesPlease))
 AC_CHECK_LIB([expat], [XML_ParserCreate],,MY_APPEND_LINE(NO_EXPAT=YesPlease))
 
+
+# Checks for typedefs, structures, and compiler characteristics.
+AC_MSG_NOTICE([CHECKS for typedefs, structures, and compiler characteristics])
+
+AC_CHECK_MEMBER(struct dirent.d_ino,,
+MY_APPEND_LINE(NO_D_INO_IN_DIRENT=YesPlease),
+[#include <dirent.h>])
+AC_CHECK_MEMBER(struct dirent.d_type,,
+MY_APPEND_LINE(NO_D_TYPE_IN_DIRENT=YesPlease),
+[#include <dirent.h>])
+
+AC_CHECK_TYPE(struct sockaddr_storage,,
+MY_APPEND_LINE(NO_SOCKADDR_STORAGE=YesPlease),
+[#include <netinet/in.h>])
+
+
+# Checks for library functions.
+AC_MSG_NOTICE([CHECKS for library functions])
+AC_CHECK_FUNC(strcasestr,,MY_APPEND_LINE(NO_STRCASESTR=YesPlease))
+AC_CHECK_FUNC(strlcpy,,MY_APPEND_LINE(NO_STRLCPY=YesPlease))
+AC_CHECK_FUNC(setenv,,MY_APPEND_LINE(NO_SETENV=YesPlease))
+
+
 # Output files
 AC_CONFIG_FILES([config.mak:config.mak.in:config.mak.append], 
 [rm -f config.mak.append], 
-- 
1.4.0

^ permalink raw reply related

* Re: [PATCH] autoconf: Use autoconf to write installation directories to config.mak
From: Jakub Narebski @ 2006-06-29 13:48 UTC (permalink / raw)
  To: git
In-Reply-To: <E1FvvuX-0002Lr-Nt@moooo.ath.cx>

Matthias Lederhofer wrote:

>> This is beginning of patch series introducing installation configuration
>> using autoconf (and no other autotools) to git. The idea is to generate
>> config.mak using ./configure (generated from configure.ac) from
>> config.mak.in, so one can use autoconf as an _alternative_ to ordinary
>> Makefile, and creating one's own config.mak.
> 
> Are you sure this should be named config.mak? From INSTALL:
>> You can place local settings in config.mak and the Makefile
>> will include them.  Note that config.mak is not distributed;
>> the name is reserved for local settings.
> 
> So with another filename either include it
> - before config.mak: the user may override ./configure options with
>   config.mak
> - after config.mak: ./configure overrides config.mak

The idea was to use ./configure to _generate_ config.mak, which the user can
then edit.

But perhaps using another filename for results of ./configure script 
(and including it in main Makefile) would be better idea.

> At least do not overwrite config.mak if it exists.

But one might want to run ./configure with different options, to finally
arrive at the set which is satisfactionary. So unless some magic to detect
if config.mak was generated from ./configure script, or generated by user
is used...

-- 
Jakub Narebski
Warsaw, Poland
ShadeHawk on #git

^ permalink raw reply

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Alex Riesen @ 2006-06-29 14:12 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <Pine.LNX.4.63.0606291519440.29667@wbgn013.biozentrum.uni-wuerzburg.de>

On 6/29/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> Aargh! Of course this is [PATCH 2/2]. BTW, no signoff, since the whole
> merge-recursive is not mergeable yet (passes the tests, but we have a
> small way to go).

How did you get it to pass the tests? Maybe you still have git-merge-recursive
as default merge strategy?

^ permalink raw reply

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Alex Riesen @ 2006-06-29 14:13 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <Pine.LNX.4.63.0606291517010.29667@wbgn013.biozentrum.uni-wuerzburg.de>

On 6/29/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> most of this patch is just a "sub-file rename", i.e. moving code
> literally (sue me, SCO!) from merge-base.c to commit.c.
>

Aah, thanks! Will try it later today.

^ permalink raw reply

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Alex Riesen @ 2006-06-29 14:14 UTC (permalink / raw)
  To: Johannes Schindelin; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <Pine.LNX.4.63.0606291517010.29667@wbgn013.biozentrum.uni-wuerzburg.de>

On 6/29/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
>
> most of this patch is just a "sub-file rename", i.e. moving code
> literally (sue me, SCO!) from merge-base.c to commit.c.
>

BTW, you probably want to post merge-recursive.c changes separately.

^ permalink raw reply

* [PATCH] autoconf: Cleanup generation of config.mak.append by ./configure
From: Jakub Narebski @ 2006-06-29 15:04 UTC (permalink / raw)
  To: git
In-Reply-To: <200606291536.18667.jnareb@gmail.com>

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 configure.ac |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/configure.ac b/configure.ac
index fbd46e2..f48307c 100644
--- a/configure.ac
+++ b/configure.ac
@@ -6,12 +6,14 @@ AC_INIT([git], [1.4.0], [git@vger.kernel
 
 AC_CONFIG_SRCDIR([git.c])
 
+echo "# config.mak.append.  Generated by configure." >> config.mak.append
+
 # Definitions of macros
 # MY_APPEND_LINE(LINE)
 # --------------------
 # Append LINE to file config.mak.append
 AC_DEFUN([MY_APPEND_LINE],
-[[echo "$1" >> config.mak.append]])# AC_APPEND_LINE
+[[echo "$1" >> config.mak.append]])# MY_APPEND_LINE
 
 
 # Checks for libraries.
@@ -45,6 +47,5 @@ AC_CHECK_FUNC(setenv,,MY_APPEND_LINE(NO_
 
 # Output files
 AC_CONFIG_FILES([config.mak:config.mak.in:config.mak.append], 
-[rm -f config.mak.append], 
-[echo >> config.mak.append])
+[rm -f config.mak.append])
 AC_OUTPUT
-- 
1.4.0

^ permalink raw reply related

* Re: [RFC] Cache negative delta pairs
From: Nicolas Pitre @ 2006-06-29 15:42 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git
In-Reply-To: <7v4py4y7wo.fsf@assigned-by-dhcp.cox.net>

On Wed, 28 Jun 2006, Junio C Hamano wrote:

> Interesting idea.  I think this matters more because for a
> repository with many unrelated undeltifiable files, we do the
> computation for objects that results in _no_ delta.  For normal
> nearly fully packed repositories, once an object is deltified
> against something else, subsequent repacking of the same set of
> objects (or a superset thereof) will very likely reuse the delta
> without recomputation, so as long as each object _can_ be
> deltified with at least _one_ other object, you should not see
> improvement on them.
> 
> So I am curious where the speed-up comes from for "normal" repos
> in your experiments.

My GIT repo currently has 23622 objects and 7610 of them are currently 
undeltified.  Those objects are of course candidates for delta matching 
each time git-repack is run.

> If it turns out that in "normal" repos the
> objects that hit your negative cache are stored undeltified,
> then that suggests that it might be worthwhile to consider using
> a cache of "inherently undeltifiable objects", In other words, a
> negative cache of O(N) entries, instead of O(N^2) entries,

Actually... I'm not so sure.  Those objects are not "inherently 
undeltifiable".  They just happen to not have other objects to easily 
delta against in the given set of objects.  Think of a file with only 
one revision for example.  As soon as there is a second revision of that 
file added to the set of objects then the former revision would have a 
high probability of being deltifiable.

So the negative cache should not be O(N^2) either.  It just has to be 
O(N*window).

> Another interpretation of your result is that we may be using a
> delta window that is unnecessarily too deep, and your negative
> cache is collecting less optimum candidates that we attempt to
> deltify against "just in case".  Can you easily instrument your
> code to see where in the sorted delta candidate list the pairs
> that hit your the negative cache are?  That is, in find_deltas()
> function, we have "while (--j > 0)" loop that attempts to delta
> with the entry that is j (modulo window size) entries away from
> the current one, then j-1, j-2, ...; I am interested in the
> distribution of "j" value for the pair "n,m" that hits your
> negative cache for normal repositories, and I am speculating
> that the value would probably be small relative to the delta
> window size.

My past experiments showed that the best window size for compression is 
a bit larger than the current default of 10.  It was rather around 15 
for the kernel repository, with higher values than 15 not providing 
significant improvements anymore.  But that is clearly a function of the 
repository nature (the average number of revisions for each file).  But 
the window size is directly connected to the computational cost.

> Another idea is to have a cache of "paths at which inherently
> undeltifiable objects live in".  For example, we currently do
> not delta OpenOffice documents (*.odt, *.odp, etc) very well.
> If one has a repository that tracks the history of "file.odp",
> we know each revision of "file.odp" would not delta against any
> other version anyway, and could skip attempting to deltify them.

I'm afraid this could lead to bad behavior eventually.  Better to just 
attempt a delta once, and when an object has not found any delta 
base candidate then just write its sha1 and corresponding window to the 
cache.  This would imply an initial cost to create the cache the first 
time, but after that the created cache could be relied upon as hard 
information and not just as guess heuristics.

> >  - size. The cache is a packed sequence of binary sha1 pairs. I was
> >    concerned that it would grow too large (obviously for n blobs you can
> >    end up with n^2/2 entries), but it doesn't seem unreasonable for most
> >    repos (either you don't have a lot of files, or if you do, they delta
> >    reasonably well). My test repo's cache is only 144K. The git cache is
> >    about 2.7M. The linux-2.6 cache is 22M.

This is way suboptimal.  First there is no reason for the cache to ever 
grow to N^2.  At worst it should be N*10 where 10 is the current window 
size.

Next I think this can be made just N*2.  Consider that the criteria for 
skipping over delta matching for a given object is the fact 
that we already know that such object doesn't delta against 
none of the objects found in a given window.  Therefore we only have to 
compute a hash for the object names found in that window and store that 
in the cache.  So the cache entries would then be a pair of sha1: first 
the sha1 of the victim object, and the sha1 of all sha1 names for the 
objects against which the victim object was found not to delta well 
against.

And this can be pushed even further by just including the sha1 of the 
victim object inside the list of objects therefore computing a hash of 
all objects (the victim and the window) for which no delta results. The 
cache is therefore a list of hash values corresponding to bad 
victim+window combinations.

So given my GIT repository such a cache would be 7610 * 40 = 304400 
bytes if we stick to the full 40 bytes of sha1 to hash bad combinations.


Nicolas

^ permalink raw reply

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Johannes Schindelin @ 2006-06-29 16:14 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <81b0412b0606290712h4960ee8et7ea219d4dd6428b4@mail.gmail.com>

Hi,

On Thu, 29 Jun 2006, Alex Riesen wrote:

> On 6/29/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > Aargh! Of course this is [PATCH 2/2]. BTW, no signoff, since the whole
> > merge-recursive is not mergeable yet (passes the tests, but we have a
> > small way to go).
> 
> How did you get it to pass the tests? Maybe you still have git-merge-recursive
> as default merge strategy?

Darn. Yes.

Sorry for the noise.

Ciao,
Dscho

^ permalink raw reply

* Re: [PATCH] move get_merge_bases() to core lib; use it in merge-recursive
From: Johannes Schindelin @ 2006-06-29 16:14 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git, Junio C Hamano, Fredrik Kuivinen, Linus Torvalds
In-Reply-To: <81b0412b0606290714v66a32976j531e2077ce6c1d77@mail.gmail.com>

Hi,

On Thu, 29 Jun 2006, Alex Riesen wrote:

> On 6/29/06, Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote:
> > 
> > most of this patch is just a "sub-file rename", i.e. moving code
> > literally (sue me, SCO!) from merge-base.c to commit.c.
> > 
> 
> BTW, you probably want to post merge-recursive.c changes separately.

My point being: it makes no sense to split off get_merge_bases() if nobody 
uses it except for git-merge-base.

Ciao,
Dscho

^ permalink raw reply

* Re: [RFC] Cache negative delta pairs
From: Nicolas Pitre @ 2006-06-29 16:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Jeff King, git
In-Reply-To: <Pine.LNX.4.64.0606291053280.1213@localhost.localdomain>

On Thu, 29 Jun 2006, Nicolas Pitre wrote:

> And this can be pushed even further by just including the sha1 of the 
> victim object inside the list of objects therefore computing a hash of 
> all objects (the victim and the window) for which no delta results. The 
> cache is therefore a list of hash values corresponding to bad 
> victim+window combinations.
> 
> So given my GIT repository such a cache would be 7610 * 40 = 304400 
> bytes if we stick to the full 40 bytes of sha1 to hash bad combinations.

Correction: the 40 bytes figure is for _ascii_ representation of sha1 
values.  The cache doesn't need ascii and therefore this number can be 
reduced by half.


Nicolas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox