Git development

Git development
 help / color / mirror / Atom feed

* Re: Improved three-way blob merging code
From: Alex Riesen @ 2006-06-29  7:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Junio C Hamano, Git Mailing List, Davide Libenzi
In-Reply-To: <Pine.LNX.4.64.0606282157210.12404@g5.osdl.org>

On 6/29/06, Linus Torvalds <torvalds@osdl.org> wrote:
> +static void *three_way_filemerge(mmfile_t *base, mmfile_t *our, mmfile_t *their, unsigned long *size)
> +{
...
> +       if (t1 && t2 && t3) {
> +               int code = run_command("merge", t2, t1, t3, NULL);

This does not use the labels of merge(1) and the merged file will contain
the names of temporary files at conflict places, which is very confusing if
you happen to loose context while doing a merge with lots of conflicts.

^ permalink raw reply

* Re: [PATCH] autoconf: Use autoconf to write installation directories to config.mak
From: Uwe Zeisberger @ 2006-06-29  7:18 UTC (permalink / raw)
  To: Jakub Narebski; +Cc: git
In-Reply-To: <200606290301.51657.jnareb@gmail.com>

Jakub Narebski wrote:
> This is beginning of patch series introducing installation configuration
> using autoconf (and no other autotools) to git. The idea is to generate
> config.mak using ./configure (generated from configure.ac) from
> config.mak.in, so one can use autoconf as an _alternative_ to ordinary
> Makefile, and creating one's own config.mak.
> 
> This patch includes minimal configure.ac and config.mak.in, so one 
> can set installation directories using ./configure script
> e.g. ./configure --prefix=/usr

autoconf does to much things, even with that little configure.ac.  (But
I agree, it's much better than automake :-)

E.g.

	./configure --prefix=$HOME/usr --mandir=$HOME/usr/share/man

is supported by the configure script, but the manpages are installed in
$HOME/usr/man all the same.

BTW: Even if I specify mandir=... in config.mak, it is not respected,
because only the toplevel Makefile includes config.mak.  (I didn't test
it, but I think I could export mandir in config.mak ...)

Best regards
Uwe

-- 
Uwe Zeisberger

http://www.google.com/search?q=1+hertz+in+sec**-1

^ permalink raw reply

* Re: Problem with GITK
From: Uwe Zeisberger @ 2006-06-29  7:06 UTC (permalink / raw)
  To: Paolo Ciarrocchi; +Cc: Git Mailing List
In-Reply-To: <4d8e3fd30606281340n147821e2hcbd4ccaf9551173f@mail.gmail.com>

Hi Paolo,

Paolo Ciarrocchi wrote:
> there is a strange orange line in the following screenshot from gitk:
> http://paolo.ciarrocchi.googlepages.com/Screenshotgit.png
> 
> paolo@Italia:~/linux-2.6$ git version
> git version 1.4.1.rc1.g47e5

Maybe

	http://www.gelato.unsw.edu.au/archives/git/0604/19509.html

can help you!?

Best regards
Uwe

-- 
Uwe Zeisberger

exit vi, lesson IV:
Z Z

NB: may write current file

^ permalink raw reply

* What's in git.git
From: Junio C Hamano @ 2006-06-29  6:41 UTC (permalink / raw)
  To: git

Hopefully 'master' which I tagged and pushed out as v1.4.1-rc2
is very close to the final.  I would like to merge format-patch
fixes by Johannes, see nobody yells at me for a few days, and
call it 1.4.1 this weekend.

Thanks to Pavel Roskin and Marco Roeland, the Perly Git series
by Pasky should be in much better shape now.  Timo's diff output
format options updates is looking better and I am gaining
confidence in it, especially after I've done small testsuite and
fixed things here and there.  I would want a bit more tests to
cover wider combinations to avoid regression but basically it
looks ready.  These two series are still in "pu" and I'd like to
pull them into "next" after 1.4.1.

A late comer is Linus's merge-tree/merge-files work.  I'm
parking it on "pu" but haven't read it through yet.

----------------------------------------------------------------

* The 'master' branch has these since the last announcement.

   Andreas Ericsson:
      git wrapper: fix command name in an error message.

   Dennis Stosberg:
      Solaris needs inclusion of signal.h for signal()
      Fix pkt-line.h to compile with a non-GCC compiler
      Fix expr usage for FreeBSD

   Eric Wong:
      rebase: allow --merge option to handle patches merged upstream
      rebase: cleanup rebasing with --merge
      rebase: allow --skip to work with --merge
      git-svn: SVN 1.1.x library compatibility
      git-svn: several graft-branches improvements
      git-svn: add the commit-diff command
      git-svn: add --follow-parent and --no-metadata options to fetch
      git-svn: be verbose by default on fetch/commit, add -q/--quiet option
      rebase: get rid of outdated MRESOLVEMSG
      rebase: check for errors from git-commit

   Jeff King:
      quote.c: silence compiler warnings from EMIT macro

   Johannes Schindelin:
      Teach diff about -b and -w flags
      cvsimport: always set $ENV{GIT_INDEX_FILE} to $index{$branch}
      Save errno in handle_alias()

   Junio C Hamano:
      git-merge --squash
      diff --color: use $GIT_DIR/config
      combine-diff.c: type sanity
      connect.c: remove unused parameters from tcp_connect and proxy_connect
      connect.c: check the commit buffer boundary while parsing.
      t/README: start testing porcelainish
      checkout -m: fix read-tree invocation

   Martin Langhoff:
      cvsimport: setup indexes correctly for ancestors and incremental imports
      cvsimport - cleanup of the multi-indexes handling

   Matthias Lederhofer:
      correct documentation for git grep

   Timo Hirvonen:
      Make some strings const


* The 'next' branch, in addition, has these.

   Johannes Schindelin:
      add diff_flush_patch_id() to calculate the patch id
      format-patch: introduce "--ignore-if-in-upstream"
      t4014: fix for whitespace from "wc -l"
      format-patch: use clear_commit_marks() instead of some ad-hockery

   Junio C Hamano:
      Makefile: add framework to verify and bench sha1 implementations.
      test-sha1: test hashing large buffer
      git-repack: Be careful when updating the same pack as an existing one.
      t4013: add tests for diff/log family output options.
      t4014: add format-patch --ignore-if-in-upstream test
      t4013: add more tests around -c and --cc
      t4014: fix test commit labels.
      diff.c: fix get_patch_id()

   Linus Torvalds:
      xdiff: generate "anti-diffs" aka what is common to two files


* The 'pu' branch, in addition, has these.

   Dennis Stosberg:
      "test" in Solaris' /bin/sh does not support -e
      Makefile fix for Solaris
      Add possibility to pass CFLAGS and LDFLAGS specific to the perl subdir

   Junio C Hamano:
      Fix some more diff options changes.
      t4013 test updates for new output code.
      combine-diff.c: type sanity.
      Perl interface: add build-time configuration to allow building with -fPIC
      Perl interface: make testsuite work again.
      perl: fix make clean
      Git.pm: tentative fix to test the freshly built Git.pm
      Perly Git: arrange include path settings properly.
      Makefile: Set USE_PIC on x86-64

   Linus Torvalds:
      Prepare "git-merge-tree" for future work
      Improved three-way blob merging code

   Matthias Lederhofer:
      GIT_TRACE: show which built-in/external commands are executed

   Petr Baudis:
      Introduce Git.pm (v4)
      Git.pm: Implement Git::exec_path()
      Git.pm: Call external commands using execv_git_cmd()
      Git.pm: Implement Git::version()
      Add Error.pm to the distribution
      Git.pm: Better error handling
      Git.pm: Handle failed commands' output
      Git.pm: Enhance the command_pipe() mechanism
      Git.pm: Implement options for the command interface
      Git.pm: Add support for subdirectories inside of working copies
      Convert git-mv to use Git.pm
      Git.pm: assorted build related fixes.
      Git.pm: Try to support ActiveState output pipe
      Git.pm: Swap hash_object() parameters
      Git.pm: Fix Git->repository("/somewhere/totally/elsewhere")
      Git.pm: Support for perl/ being built by a different compiler

   Timo Hirvonen:
      Merge with_raw, with_stat and summary variables to output_format
      Make --raw option available for all diff commands
      Set default diff output format after parsing command line
      DIFF_FORMAT_RAW is not default anymore
      Add msg_sep to diff_options
      Don't xcalloc() struct diffstat_t
      whatchanged: Default to DIFF_FORMAT_RAW
      Print empty line between raw, stat, summary and patch
      diff-tree: Use ---\n as a message separator
      log --raw: Don't descend into subdirectories by default
      Fix diff-tree -s
      --name-only, --name-status, --check and -s are mutually exclusive
      Remove awkward compatibility warts
      Fix a mixed declarations and code warning

   Unknown:
      A better-scheduled PPC SHA-1 implementation.

^ permalink raw reply

* [ANNOUNCE] GIT 1.4.1-rc2
From: Junio C Hamano @ 2006-06-29  6:40 UTC (permalink / raw)
  To: git; +Cc: linux-kernel

The tip of "master" branch has been tagged as GIT 1.4.1-rc2.
Many fixes since v1.4.1-rc1 was done, but most notable are:

 - git-cvsimport to manage multiple branches are (hopefully)
   fixed now (Martin and Johannes).

 - git-rebase learned how to use 3-way merge backends by
   "git-rebase --merge" (Eric Wong).

 - git-svn updates (Eric Wong).

 - "git-commit -m" breakage was fixed (me).

Hopefully we can merge the format-patch fixes from Johannes,
which is still cooking in "next", and do the real 1.4.1 release
this weekend.

^ permalink raw reply

* Re: xdiff: generate "anti-diffs" aka what is common to two files
From: Junio C Hamano @ 2006-06-29  6:04 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.64.0606282149060.12404@g5.osdl.org>

Linus Torvalds <torvalds@osdl.org> writes:

> Junio: the patch that actually _uses_ this comes next. It's based on my 
> previous "throw-away" example patches, if you want me to base it on 
> something else, please holler.

Nah, I appreciate that you did it this way, because it gave me a
test case to "git-am" your two patches into a temporary branch
and "pull --squash" that into lt/merge-tree branch ;-).

	git am ./+LT.mbox
        ... two patches, applied with --whitespace=strip ...
	git tag -f CG
        git reset --hard HEAD^^
        git pull --squash . tag CG
        git commig -C CG
        git tag -d CG

^ permalink raw reply

* Improved three-way blob merging code
From: Linus Torvalds @ 2006-06-29  5:06 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List; +Cc: Davide Libenzi


This fleshes out the code that generates a three-way merge of a set of 
blobs.

It still actually does the three-way merge using an external executable 
(ie just calling "merge"), but the interfaces have been cleaned up a lot 
and are now fully based on the 'mmfile_t' interface, so if libxdiff were 
to ever grow a compatible three-way-merge, it could probably be directly
plugged in.

It also uses the previous XDL_EMIT_COMMON functionality extension to 
libxdiff to generate a made-up base file for the merge for the case where 
no base file previously existed. This should be equivalent to what we 
currently do in git-merge-one-file.sh:

	diff -u -La/$orig -Lb/$orig $orig $src2 | git-apply --no-add

except it should be much simpler and can be done using the direct libxdiff 
interfaces.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

It would be lovely if libxdiff did a 3-way merge on its own, but this 
basically approximates it within that three_way_filemerge() function 
using external functionality.

The caller is, of course, still expected to do all the other proper magic 
(ie do things like resolve renames, mode differences, and all the trivial 
cases), and the example merge-tree.c code I posted earlier is obviously 
_not_ careful enough to be a real merge routine.

But we all have to start somewhere.


diff --git a/merge-file.c b/merge-file.c
index 4fedd6b..f32c653 100644
--- a/merge-file.c
+++ b/merge-file.c
@@ -1,19 +1,19 @@
 #include "cache.h"
 #include "run-command.h"
+#include "xdiff-interface.h"
 #include "blob.h"
 
 static void rm_temp_file(const char *filename)
 {
 	unlink(filename);
+	free((void *)filename);
 }
 
-static const char *write_temp_file(struct blob *blob)
+static const char *write_temp_file(mmfile_t *f)
 {
 	int fd;
 	const char *tmp = getenv("TMPDIR");
-	unsigned long size;
-	char *filename, type[20];
-	void *buf;
+	char *filename;
 
 	if (!tmp)
 		tmp = "/tmp";
@@ -22,13 +22,9 @@ static const char *write_temp_file(struc
 	if (fd < 0)
 		return NULL;
 	filename = strdup(filename);
-	buf = read_sha1_file(blob->object.sha1, type, &size);
-	if (buf) {
-		if (size != xwrite(fd, buf, size)) {
-			rm_temp_file(filename);
-			return NULL;
-		}
-		free(buf);
+	if (f->size != xwrite(fd, f->ptr, f->size)) {
+		rm_temp_file(filename);
+		return NULL;
 	}
 	close(fd);
 	return filename;
@@ -53,55 +49,118 @@ static void *read_temp_file(const char *
 	return buf;
 }
 
-void *merge_file(struct blob *base, struct blob *our, struct blob *their, unsigned long *size)
+static int fill_mmfile_blob(mmfile_t *f, struct blob *obj)
 {
-	const char *t1, *t2, *t3;
+	void *buf;
+	unsigned long size;
 	char type[20];
 
-	if (base) {
-		int code;
-		void *res;
-
-		/*
-		 * Removed in either branch?
-		 *
-		 * NOTE! This depends on the caller having done the
-		 * proper warning about removing a file that got
-		 * modified in the other branch!
-		 */
-		if (!our || !their)
-			return NULL;
-		t1 = write_temp_file(base);
-		t2 = write_temp_file(our);
-		t3 = write_temp_file(their);
-		res = NULL;
-		if (t1 && t2 && t3) {
-			code = run_command("merge", t2, t1, t3, NULL);
-			if (!code || code == -1)
-				res = read_temp_file(t2, size);
-		}
-		rm_temp_file(t1);
-		rm_temp_file(t2);
-		rm_temp_file(t3);
-		return res;
+	buf = read_sha1_file(obj->object.sha1, type, &size);
+	if (!buf)
+		return -1;
+	if (strcmp(type, blob_type))
+		return -1;
+	f->ptr = buf;
+	f->size = size;
+	return 0;
+}
+
+static void free_mmfile(mmfile_t *f)
+{
+	free(f->ptr);
+}
+
+static void *three_way_filemerge(mmfile_t *base, mmfile_t *our, mmfile_t *their, unsigned long *size)
+{
+	void *res;
+	const char *t1, *t2, *t3;
+
+	t1 = write_temp_file(base);
+	t2 = write_temp_file(our);
+	t3 = write_temp_file(their);
+	res = NULL;
+	if (t1 && t2 && t3) {
+		int code = run_command("merge", t2, t1, t3, NULL);
+		if (!code || code == -1)
+			res = read_temp_file(t2, size);
 	}
+	rm_temp_file(t1);
+	rm_temp_file(t2);
+	rm_temp_file(t3);
+	return res;
+}
 
-	if (!our)
-		return read_sha1_file(their->object.sha1, type, size);
-	if (!their)
-		return read_sha1_file(our->object.sha1, type, size);
+static int common_outf(void *priv_, mmbuffer_t *mb, int nbuf)
+{
+	int i;
+	mmfile_t *dst = priv_;
+
+	for (i = 0; i < nbuf; i++) {
+		memcpy(dst->ptr + dst->size, mb[i].ptr, mb[i].size);
+		dst->size += mb[i].size;
+	}
+	return 0;
+}
+
+static int generate_common_file(mmfile_t *res, mmfile_t *f1, mmfile_t *f2)
+{
+	unsigned long size = f1->size < f2->size ? f1->size : f2->size;
+	void *ptr = xmalloc(size);
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
+	xdemitcb_t ecb;
+
+	xpp.flags = XDF_NEED_MINIMAL;
+	xecfg.ctxlen = 3;
+	xecfg.flags = XDL_EMIT_COMMON;
+	ecb.outf = common_outf;
+
+	res->ptr = ptr;
+	res->size = 0;
+
+	ecb.priv = res;
+	return xdl_diff(f1, f2, &xpp, &xecfg, &ecb);
+}
+
+void *merge_file(struct blob *base, struct blob *our, struct blob *their, unsigned long *size)
+{
+	void *res = NULL;
+	mmfile_t f1, f2, common;
 
 	/*
-	 * Added in both!
-	 *
-	 * This is nasty.
-	 *
-	 * We should do something like 
+	 * Removed in either branch?
 	 *
-	 *	git diff our their | git-apply --no-add
-	 *
-	 * to generate a "minimal common file" to be used
-	 * as a base. Right now we punt.
+	 * NOTE! This depends on the caller having done the
+	 * proper warning about removing a file that got
+	 * modified in the other branch!
 	 */
-	return NULL;
+	if (!our || !their) {
+		char type[20];
+		if (base)
+			return NULL;
+		if (!our)
+			our = their;
+		return read_sha1_file(our->object.sha1, type, size);
+	}
+
+	if (fill_mmfile_blob(&f1, our) < 0)
+		goto out_no_mmfile;
+	if (fill_mmfile_blob(&f2, their) < 0)
+		goto out_free_f1;
+
+	if (base) {
+		if (fill_mmfile_blob(&common, base) < 0)
+			goto out_free_f2_f1;
+	} else {
+		if (generate_common_file(&common, &f1, &f2) < 0)
+			goto out_free_f2_f1;
+	}
+	res = three_way_filemerge(&common, &f1, &f2, size);
+	free_mmfile(&common);
+out_free_f2_f1:
+	free_mmfile(&f2);
+out_free_f1:
+	free_mmfile(&f1);
+out_no_mmfile:
+	return res;
 }

^ permalink raw reply related

* xdiff: generate "anti-diffs" aka what is common to two files
From: Linus Torvalds @ 2006-06-29  4:57 UTC (permalink / raw)
  To: Davide Libenzi, Junio C Hamano, Git Mailing List

This fairly trivial patch adds a new XDL_EMIT_xxx flag to tell libxdiff 
that we don't want to generate the _diff_ between two files, we want to 
see the lines that are _common_ to two files.

So when you set XDL_EMIT_COMMON, xdl_diff() will do everything exactly 
like it used to do, but the output records it generates just contain the 
lines that aren't part of the diff.

This is for doing things like generating the common base case for a file 
that was added in both branches.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
---

Davide, I say it's a "fairly trivial patch", but quite frankly, I'm 
winging it. I'm not that comfortable with the libxdiff internal workings, 
so while this seems to work for me and make sense, can you please take a 
quick look.

Junio: the patch that actually _uses_ this comes next. It's based on my 
previous "throw-away" example patches, if you want me to base it on 
something else, please holler.

diff --git a/xdiff/xdiff.h b/xdiff/xdiff.h
index 2ce10b4..c9f8178 100644
--- a/xdiff/xdiff.h
+++ b/xdiff/xdiff.h
@@ -39,6 +39,7 @@ #define XDL_PATCH_MODEMASK ((1 << 8) - 1
 #define XDL_PATCH_IGNOREBSPACE (1 << 8)

 #define XDL_EMIT_FUNCNAMES (1 << 0)
+#define XDL_EMIT_COMMON (1 << 1)

 #define XDL_MMB_READONLY (1 << 0)

diff --git a/xdiff/xemit.c b/xdiff/xemit.c
index ad5bfb1..3604ca1 100644
--- a/xdiff/xemit.c
+++ b/xdiff/xemit.c
@@ -100,6 +100,21 @@ static void xdl_find_func(xdfile_t *xf, 
 }

+int xdl_emit_common(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
+		    xdemitconf_t const *xecfg) {
+	xdfile_t *xdf = &xe->xdf1;
+	const char *rchg = xdf->rchg;
+	long ix;
+
+	for (ix = 0; ix < xdf->nrec; ix++) {
+		if (rchg[ix])
+			continue;
+		if (xdl_emit_record(xdf, ix, "", ecb))
+			return -1;
+	}
+	return 0;
+}		
+
 int xdl_emit_diff(xdfenv_t *xe, xdchange_t *xscr, xdemitcb_t *ecb,
 		  xdemitconf_t const *xecfg) {
 	long s1, s2, e1, e2, lctx;
@@ -107,6 +122,9 @@ int xdl_emit_diff(xdfenv_t *xe, xdchange
 	char funcbuf[40];
 	long funclen = 0;

+	if (xecfg->flags & XDL_EMIT_COMMON)
+		return xdl_emit_common(xe, xscr, ecb, xecfg);
+
 	for (xch = xche = xscr; xch; xch = xche->next) {
 		xche = xdl_get_hunk(xch, xecfg);

^ permalink raw reply related

* Re: [RFC] Cache negative delta pairs
From: Jeff King @ 2006-06-29  4:30 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <20060629035849.GA30749@coredump.intra.peff.net>

On Wed, Jun 28, 2006 at 11:58:49PM -0400, Jeff King wrote:

>    about 2.7M. The linux-2.6 cache is 22M.
>    [...]
>  - correctness. Currently the code uses the output of try_delta for
>    negative caching. Should the cache checking be moved inside try_delta
>    instead? This would give more control over which reasons to mark a

I tried moving the cache check/mark to wrap the call to create_delta in
try_delta. The speedup is the same (since all of the CPU time is spent
in create_delta anyway), and the linux-2.6 cache size dropped to 18M.
I also think this is probably more correct (it ignores every reason not
to delta EXCEPT create_delta failing).

The distribution of the window parameter 'j' is similarly uniform:
  44900 j=2
  44907 j=3
  44943 j=1
  45001 j=4
  45014 j=5
  45023 j=6
  45063 j=7
  45158 j=8
  45288 j=9
  45466 j=10

Patch on top of my other one is below.

-Peff

-- >8 --
pack-objects: move delta-cache check/mark closer to create_delta

Signed-off-by: Jeff King <peff@peff.net>
---
 pack-objects.c |   15 ++++++---------
 1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/pack-objects.c b/pack-objects.c
index 46b9775..83ccc8a 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -1014,9 +1014,13 @@ static int try_delta(struct unpacked *tr
 	if (sizediff >= max_size)
 		return 0;
 
+	if (delta_cache_check(src->entry->sha1, trg->entry->sha1))
+		return 0;
 	delta_buf = create_delta(src_index, trg->data, size, &delta_size, max_size);
-	if (!delta_buf)
+	if (!delta_buf) {
+		delta_cache_mark(src->entry->sha1, trg->entry->sha1);
 		return 0;
+	}
 
 	trg_entry->delta = src_entry;
 	trg_entry->delta_size = delta_size;
@@ -1084,21 +1088,14 @@ static void find_deltas(struct object_en
 		j = window;
 		while (--j > 0) {
 			unsigned int other_idx = idx + j;
-			int r;
 			struct unpacked *m;
 			if (other_idx >= window)
 				other_idx -= window;
 			m = array + other_idx;
 			if (!m->entry)
 				break;
-			if (delta_cache_check(n->entry->sha1, m->entry->sha1))
-				continue;
-			r = try_delta(n, m, m->index, depth);
-			if (r < 0)
+			if (try_delta(n, m, m->index, depth) < 0)
 				break;
-			if (r == 0)
-				delta_cache_mark(n->entry->sha1,
-						 m->entry->sha1);
 		}
 		/* if we made n a delta, and if n is already at max
 		 * depth, leaving it in the window is pointless.  we
-- 
1.4.1.rc1.g0458-dirty

^ permalink raw reply related

* Re: [RFC] Cache negative delta pairs
From: Jeff King @ 2006-06-29  4:09 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4py4y7wo.fsf@assigned-by-dhcp.cox.net>

On Wed, Jun 28, 2006 at 08:09:43PM -0700, Junio C Hamano wrote:

> that hit your the negative cache are?  That is, in find_deltas()
> function, we have "while (--j > 0)" loop that attempts to delta
> with the entry that is j (modulo window size) entries away from
> the current one, then j-1, j-2, ...; I am interested in the
> distribution of "j" value for the pair "n,m" that hits your
> negative cache for normal repositories, and I am speculating
> that the value would probably be small relative to the delta
> window size.

Just to make sure I am understanding you correctly, you're interested in
the distribution of 'j' each time we skip a delta for being negative
(or each time we mark a negative, but that should simply equal the
lookup times for the next run). The instrumentation I added simply
prints the value of j each time we skip a delta. The counts are
surprisingly uniform (this is for linux-2.6):
  57209 j=1
  57213 j=2
  57217 j=3
  57221 j=4
  57225 j=5
  57229 j=6
  57233 j=7
  57237 j=8
  57241 j=9
  57245 j=10
They're so uniform (and in order by j!) that I feel like I must have done
something wrong...

-Peff

^ permalink raw reply

* [RFC] Cache negative delta pairs
From: Jeff King @ 2006-06-29  3:58 UTC (permalink / raw)
  To: git
In-Reply-To: <7v4py4y7wo.fsf@assigned-by-dhcp.cox.net>

>From repack to repack, we end up trying to delta many of the same object
pairs, which is computationally expensive.  This patch makes a
persistent cache, $GIT_DIR/delta-cache, which contains pairs of sha1
hashes (the presence of a pair indicates that it's not worth trying to
delta those two objects).

Signed-off-by: Jeff King <peff@peff.net>
---

[This is a repost, since the original got caught by the list filter.]

I found this especially to be a problem with repos that consist of many
large, unrelated files (e.g., photos). For example, on my test repo
(about 300 unrelated 1-2M jpgs), a 'git-repack -a' takes about 10
minutes to complete. With the delta cache, subsequent repacks take only
13 seconds. Results are not quite as dramatic for "normal" repos, but
there is still some speedup. Repacking a fully packed linux-2.6 repo
went from 1m12s to 36s. Repacking the git repo goes from 5.6s to 3.0s.

Here are some of my thoughts:

 - speed. The implementation is quite fast. The sha1 pairs are stored
   sorted, and we mmap and binary search them. Certainly the extra time
   spent in lookup is justified by avoiding the delta attempts.

 - size. The cache is a packed sequence of binary sha1 pairs. I was
   concerned that it would grow too large (obviously for n blobs you can
   end up with n^2/2 entries), but it doesn't seem unreasonable for most
   repos (either you don't have a lot of files, or if you do, they delta
   reasonably well). My test repo's cache is only 144K. The git cache is
   about 2.7M. The linux-2.6 cache is 22M.

   Theoretically, I could bound the cache size and boot old entries.
   However, that means storing age information, which increases the
   required size. I think keeping it simple is best.

 - correctness. Currently the code uses the output of try_delta for
   negative caching. Should the cache checking be moved inside try_delta
   instead? This would give more control over which reasons to mark a
   delta negative (I believe right now hitting the depth limit will
   cause a negative mark; we should perhaps only do so if the content
   itself makes the delta unpalatable).
 
 - optionalness. Currently the delta-cache is always used. Since it is a
   space-time tradeoff, maybe it should be optional (it will have
   negligible performance and horrible size impact on a repo that
   consists of many very small but unrelated objects).  Possible methods
   include:
     - enable cache saves only if .git/delta-cache is present; turn it
       on initially with 'touch .git/delta-cache'
     - config variable

 Makefile       |    4 +-
 delta-cache.c  |  119 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 delta-cache.h  |   11 +++++
 pack-objects.c |   11 +++++
 4 files changed, 142 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index cde619c..39c4308 100644
--- a/Makefile
+++ b/Makefile
@@ -202,7 +202,7 @@ LIB_H = \
 	blob.h cache.h commit.h csum-file.h delta.h \
 	diff.h object.h pack.h pkt-line.h quote.h refs.h \
 	run-command.h strbuf.h tag.h tree.h git-compat-util.h revision.h \
-	tree-walk.h log-tree.h dir.h
+	tree-walk.h log-tree.h dir.h delta-cache.h
 
 DIFF_OBJS = \
 	diff.o diff-lib.o diffcore-break.o diffcore-order.o \
@@ -217,7 +217,7 @@ LIB_OBJS = \
 	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
 	tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
 	fetch-clone.o revision.o pager.o tree-walk.o xdiff-interface.o \
-	alloc.o $(DIFF_OBJS)
+	alloc.o delta-cache.o $(DIFF_OBJS)
 
 BUILTIN_OBJS = \
 	builtin-log.o builtin-help.o builtin-count.o builtin-diff.o builtin-push.o \
diff --git a/delta-cache.c b/delta-cache.c
new file mode 100644
index 0000000..d132867
--- /dev/null
+++ b/delta-cache.c
@@ -0,0 +1,119 @@
+#include "delta-cache.h"
+#include "cache.h"
+
+static const unsigned char* disk_cache = 0;
+static unsigned disk_cache_len = 0;
+static unsigned char* mem_cache = 0;
+static unsigned mem_cache_len = 0;
+static unsigned mem_cache_alloc = 0;
+
+#define GETCACHE(c, n) (c+(40*n))
+
+static void disk_cache_init()
+{
+	static int done = 0;
+	int fd;
+	struct stat st;
+
+	if (done) return;
+	done = 1;
+
+	fd = open(git_path("delta-cache"), O_RDONLY);
+	if (fd < 0)
+		return;
+	if (fstat(fd, &st) || (st.st_size == 0)) {
+		close(fd);
+		return;
+	}
+	disk_cache = mmap(NULL, st.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
+	close(fd);
+	if (disk_cache != MAP_FAILED)
+		disk_cache_len = st.st_size / 40;
+}
+
+static int
+check_cache(const unsigned char* c, unsigned len, const unsigned char s[40])
+{
+	int left, right, mid, cmp;
+
+	left = 0;
+	right = len - 1;
+	while (left <= right) {
+		mid = left + (right - left) / 2;
+		cmp = memcmp(s, GETCACHE(c, mid), 40);
+		if(cmp == 0) return 1;
+		else if(cmp <  0) right = mid - 1;
+		else left = mid + 1;
+	}
+	return 0;
+}
+
+extern int
+delta_cache_check(const unsigned char a[20], const unsigned char b[20])
+{
+	unsigned char search[40];
+
+	disk_cache_init();
+	memcpy(search, a, 20);
+	memcpy(search+20, b, 20);
+	return check_cache(disk_cache, disk_cache_len, search);
+}
+
+extern void
+delta_cache_mark(const unsigned char a[20], const unsigned char b[20])
+{
+	if (mem_cache_len == mem_cache_alloc) {
+		mem_cache_alloc = mem_cache_alloc ? mem_cache_alloc * 2 : 16;
+		mem_cache = xrealloc(mem_cache, mem_cache_alloc * 40);
+	}
+	memcpy(GETCACHE(mem_cache, mem_cache_len), a, 20);
+	memcpy(GETCACHE(mem_cache, mem_cache_len)+20, b, 20);
+	mem_cache_len++;
+}
+
+static int
+compare_sha1pair(const void *a, const void *b)
+{
+	return memcmp(a, b, 40);
+}
+
+static int
+merge_write(int fd, const unsigned char* p1, unsigned n1,
+		    const unsigned char* p2, unsigned n2)
+{
+#define EMIT(p, x) do { \
+	if (xwrite(fd, GETCACHE(p, x), 40) < 0) return -1; x++; \
+	} while(0)
+
+	int i = 0, j = 0, cmp;
+	while (i < n1 && j < n2) {
+		cmp = memcmp(GETCACHE(p1, i), GETCACHE(p2, j), 40);
+		if (cmp < 0) EMIT(p1, i);
+		else EMIT(p2, j);
+	}
+	while (i < n1) EMIT(p1, i);
+	while (j < n2) EMIT(p2, j);
+#undef EMIT
+	return 0;
+}
+
+extern void
+delta_cache_save(void)
+{
+	int fd;
+	char tmpfile[PATH_MAX];
+
+	strcpy(tmpfile, git_path("delta-cache.%u", getpid()));
+	fd = open(tmpfile, O_WRONLY|O_EXCL|O_CREAT, 0666);
+	if (fd < 0)
+		return;
+
+	qsort(mem_cache, mem_cache_len, 40, compare_sha1pair);
+	if (merge_write(fd, mem_cache, mem_cache_len,
+			    disk_cache, disk_cache_len) < 0) {
+		close(fd);
+		return;
+	}
+
+	rename(tmpfile, git_path("delta-cache"));
+}
diff --git a/delta-cache.h b/delta-cache.h
new file mode 100644
index 0000000..19201be
--- /dev/null
+++ b/delta-cache.h
@@ -0,0 +1,11 @@
+#ifndef DELTA_CACHE_H
+#define DELTA_CACHE_H
+
+extern int
+delta_cache_check(const unsigned char a[20], const unsigned char b[20]);
+extern void
+delta_cache_mark(const unsigned char a[20], const unsigned char b[20]);
+extern void
+delta_cache_save(void);
+
+#endif /* DELTA_CACHE_H */
diff --git a/pack-objects.c b/pack-objects.c
index bed2497..46b9775 100644
--- a/pack-objects.c
+++ b/pack-objects.c
@@ -8,6 +8,7 @@ #include "delta.h"
 #include "pack.h"
 #include "csum-file.h"
 #include "tree-walk.h"
+#include "delta-cache.h"
 #include <sys/time.h>
 #include <signal.h>
 
@@ -1083,14 +1084,21 @@ static void find_deltas(struct object_en
 		j = window;
 		while (--j > 0) {
 			unsigned int other_idx = idx + j;
+			int r;
 			struct unpacked *m;
 			if (other_idx >= window)
 				other_idx -= window;
 			m = array + other_idx;
 			if (!m->entry)
 				break;
-			if (try_delta(n, m, m->index, depth) < 0)
+			if (delta_cache_check(n->entry->sha1, m->entry->sha1))
+				continue;
+			r = try_delta(n, m, m->index, depth);
+			if (r < 0)
 				break;
+			if (r == 0)
+				delta_cache_mark(n->entry->sha1,
+						 m->entry->sha1);
 		}
 		/* if we made n a delta, and if n is already at max
 		 * depth, leaving it in the window is pointless.  we
@@ -1342,5 +1350,6 @@ int main(int argc, char **argv)
 	if (progress)
 		fprintf(stderr, "Total %d, written %d (delta %d), reused %d (delta %d)\n",
 			nr_result, written, written_delta, reused, reused_delta);
+	delta_cache_save();
 	return 0;
 }
-- 
1.4.1.rc1.g3000

^ permalink raw reply related

* Re: [RFC] Cache negative delta pairs
From: Jeff King @ 2006-06-29  3:50 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7v4py4y7wo.fsf@assigned-by-dhcp.cox.net>

On Wed, Jun 28, 2006 at 08:09:43PM -0700, Junio C Hamano wrote:

> Interesting idea.  I think this matters more because for a
> repository with many unrelated undeltifiable files, we do the
> computation for objects that results in _no_ delta.  For normal

Yes, precisely.

> So I am curious where the speed-up comes from for "normal" repos
> in your experiments.  If it turns out that in "normal" repos the

I assumed they were either:
  - blobs of files with only a single revision
  - blobs with one "island" revision which is so different from previous
    revisions that it isn't worth a delta
You can get a (very rough) estimate on the former:
  $ cd linux-2.6 && git-rev-list --objects --all \
     | grep / | cut -d' ' -f2 | sort | uniq -u | wc -l
  8364

As I said, we may also be catching possible deltas at the edge of the
pack depth. I should maybe put the cache check closer to the
create_delta call.

> objects that hit your negative cache are stored undeltified,
> then that suggests that it might be worthwhile to consider using
> a cache of "inherently undeltifiable objects", In other words, a
> negative cache of O(N) entries, instead of O(N^2) entries,

That would certainly be preferable, though I'm not convinced there
aren't objects which are deltifiable, but not right now (e.g., a file
with only one revision, but which will later get a revision).

I'm not sure what would make a file inherently undeltifiable. But you
could put the O(N) cache in front of the other cache, and reduce the
size of N for the O(N^2) cache. 

> deltify against "just in case".  Can you easily instrument your
> code to see where in the sorted delta candidate list the pairs
> that hit your the negative cache are?  That is, in find_deltas()

I'll look into that...

> Another idea is to have a cache of "paths at which inherently
> undeltifiable objects live in".  For example, we currently do
> not delta OpenOffice documents (*.odt, *.odp, etc) very well.
> If one has a repository that tracks the history of "file.odp",
> we know each revision of "file.odp" would not delta against any
> other version anyway, and could skip attempting to deltify them.

I thought about that, but I was trying to avoid "clever" heuristics that
can often be wrong. Case in point: my repo consists mainly of jpegs,
most of which will not delta. But for a few, I edited the exif header
and they delta'd nicely against their previous revision.

> Your message contained string "*pt-in" in the commentary part
> (replace asterisk with lowercase o) and was discarded by vger
> mailing list software because that was a taboo word.  If you

Oops, I didn't know we were filtering. I'll resend in a moment.

> >    reasonably well). My test repo's cache is only 144K. The git cache is
> >    about 2.7M. The linux-2.6 cache is 22M.
> The fully-packed object database is 6.2M pack so you are talking
> about 40% bloat; the kernel is 115M so the overhead is 19%.

Yes, obviously if you're interested in maximal space saving, this is
stupid for the classic git repo case (though it makes sense for repos
with few but very large blobs). However, if you assume that:
  1. Packing is inherently space-saving and more-or-less required on
     large repos like linux-2.6 (which is 1.8G unpacked and 115M packed)
  2. It saves time (36 seconds per repack on linux-2.6)
then a more reasonable question is "do you want to spend 22M to save 30
seconds every time you repack -a?". Or, to really cook the numbers, you
can say that the cache is only wasting 1% versus an unpacked repo. :)

I think it's reasonable that this should be an optional feature. For
some repos, it turns them from almost unusable to very fast. For others,
it's a questionable space-time tradeoff (and for some pathological
cases, it would probably turn the repo from usable to unusable).

-Peff

^ permalink raw reply

* Re: [RFC] Cache negative delta pairs
From: Junio C Hamano @ 2006-06-29  3:09 UTC (permalink / raw)
  To: Jeff King; +Cc: git
In-Reply-To: <20060628223744.GA24421@coredump.intra.peff.net>

Jeff King <peff@peff.net> writes:

> From repack to repack, we end up trying to delta many of the same object
> pairs, which is computationally expensive.
>...
> I found this especially to be a problem with repos that consist of many
> large, unrelated files (e.g., photos). For example, on my test repo
> (about 300 unrelated 1-2M jpgs), a 'git-repack -a' takes about 10
> minutes to complete. With the delta cache, subsequent repacks take only
> 13 seconds. Results are not quite as dramatic for "normal" repos, but
> there is still some speedup. Repacking a fully packed linux-2.6 repo
> went from 1m12s to 36s. Repacking the git repo goes from 5.6s to 3.0s.

Interesting idea.  I think this matters more because for a
repository with many unrelated undeltifiable files, we do the
computation for objects that results in _no_ delta.  For normal
nearly fully packed repositories, once an object is deltified
against something else, subsequent repacking of the same set of
objects (or a superset thereof) will very likely reuse the delta
without recomputation, so as long as each object _can_ be
deltified with at least _one_ other object, you should not see
improvement on them.

So I am curious where the speed-up comes from for "normal" repos
in your experiments.  If it turns out that in "normal" repos the
objects that hit your negative cache are stored undeltified,
then that suggests that it might be worthwhile to consider using
a cache of "inherently undeltifiable objects", In other words, a
negative cache of O(N) entries, instead of O(N^2) entries,

Another interpretation of your result is that we may be using a
delta window that is unnecessarily too deep, and your negative
cache is collecting less optimum candidates that we attempt to
deltify against "just in case".  Can you easily instrument your
code to see where in the sorted delta candidate list the pairs
that hit your the negative cache are?  That is, in find_deltas()
function, we have "while (--j > 0)" loop that attempts to delta
with the entry that is j (modulo window size) entries away from
the current one, then j-1, j-2, ...; I am interested in the
distribution of "j" value for the pair "n,m" that hits your
negative cache for normal repositories, and I am speculating
that the value would probably be small relative to the delta
window size.

Another idea is to have a cache of "paths at which inherently
undeltifiable objects live in".  For example, we currently do
not delta OpenOffice documents (*.odt, *.odp, etc) very well.
If one has a repository that tracks the history of "file.odp",
we know each revision of "file.odp" would not delta against any
other version anyway, and could skip attempting to deltify them.

Your message contained string "*pt-in" in the commentary part
(replace asterisk with lowercase o) and was discarded by vger
mailing list software because that was a taboo word.  If you
would want to pursue this I would suggest to resend your
original patch after rephrasing that part.

>  - size. The cache is a packed sequence of binary sha1 pairs. I was
>    concerned that it would grow too large (obviously for n blobs you can
>    end up with n^2/2 entries), but it doesn't seem unreasonable for most
>    repos (either you don't have a lot of files, or if you do, they delta
>    reasonably well). My test repo's cache is only 144K. The git cache is
>    about 2.7M. The linux-2.6 cache is 22M.

The fully-packed object database is 6.2M pack so you are talking
about 40% bloat; the kernel is 115M so the overhead is 19%.

^ permalink raw reply

* Re: CFT: merge-recursive in C (updated)
From: Junio C Hamano @ 2006-06-29  2:41 UTC (permalink / raw)
  To: Alex Riesen; +Cc: git
In-Reply-To: <20060629002547.GA27507@steel.home>

fork0@t-online.de (Alex Riesen) writes:

> this broke t6022-merge-rename.sh (the second test). It produces an
> index with this:
>
> .../t/trash$ git-diff-index white
> :100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B
> :100644 100644 ba41fb96393979b22691106b06bf5231eab57b85 0000000000000000000000000000000000000000 M      N
>
> whereas git-merge-recursive (and the previous version, without pipe):
>
> .../t/trash$ git-diff-index white
> :100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B
>
> I can see that "git update-index --add" is somehow different from a
> pipe to "git update-index --index-info", but not very clear. Does this
> "zero-sha1" mean that the file "N" is not in the index?

When diff-index and diff-files compare a tree entry or an index
entry with a file in the working tree, they do not compute the
blob hash value for the file in the working tree.  0{40} is used
on the RHS in such a case.  When the working tree file matches
the corresponding index entry, then we know RHS matches what is
in the index, so both sides have the blob hash value.

^ permalink raw reply

* [PATCH] autoconf: Use autoconf to write installation directories to config.mak
From: Jakub Narebski @ 2006-06-29  1:01 UTC (permalink / raw)
  To: git

This is beginning of patch series introducing installation configuration
using autoconf (and no other autotools) to git. The idea is to generate
config.mak using ./configure (generated from configure.ac) from
config.mak.in, so one can use autoconf as an _alternative_ to ordinary
Makefile, and creating one's own config.mak.

This patch includes minimal configure.ac and config.mak.in, so one 
can set installation directories using ./configure script
e.g. ./configure --prefix=/usr

Ignoring files generated by running autoconf and ./configure

Signed-off-by: Jakub Narebski <jnareb@gmail.com>

---

One of the ideas is to use in .spec RPM macro %configure which takes
care of setting all installation directories correctly.

Thoughts?

 .gitignore    |    4 ++++
 config.mak.in |   12 ++++++++++++
 configure.ac  |   11 +++++++++++
 3 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/.gitignore b/.gitignore
index 7b954d5..b0dd54d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -136,3 +136,7 @@ git-core.spec
 *.py[co]
 config.mak
 git-blame
+autom4te.cache
+config.log
+config.status
+configure
diff --git a/config.mak.in b/config.mak.in
new file mode 100644
index 0000000..82d80e2
--- /dev/null
+++ b/config.mak.in
@@ -0,0 +1,12 @@
+# git Makefile configuration, included in main Makefile
+# @configure_input@
+
+prefix = @prefix@
+exec_prefix = @exec_prefix@
+bindir = @bindir@
+#gitexecdir = @libexecdir@/git-core/
+template_dir = @datadir@/git-core/templates/
+GIT_PYTHON_DIR = @datadir@/git-core/python
+
+srcdir = @srcdir@
+VPATH = @srcdir@
diff --git a/configure.ac b/configure.ac
new file mode 100644
index 0000000..4003ff6
--- /dev/null
+++ b/configure.ac
@@ -0,0 +1,11 @@
+#                                               -*- Autoconf -*-
+# Process this file with autoconf to produce a configure script.
+
+AC_PREREQ(2.59)
+AC_INIT([git], [1.4.0], [git@vger.kernel.org])
+
+AC_CONFIG_SRCDIR([git.c])
+
+# Output files
+AC_CONFIG_FILES([config.mak])
+AC_OUTPUT
-- 
1.4.0

^ permalink raw reply related

* Re: CFT: merge-recursive in C
From: Christopher Faylor @ 2006-06-29  0:49 UTC (permalink / raw)
  To: Alex Riesen, git
In-Reply-To: <20060629003837.GB27507@steel.home>

On Thu, Jun 29, 2006 at 02:38:37AM +0200, Alex Riesen wrote:
>Christopher Faylor, Wed, Jun 28, 2006 17:06:47 +0200:
>>>It still uses some calls to git programs (git-update-index,
>>>git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
>>>the labels (-L) missing - I was unsure how to tackle this on windows -
>>>it has only argv[1].
>>
>>Actually, Windows should behave the same as Linux wrt argv handling.
>>You can use argv[1] ...  argv[n] modulo any differences in command line
>>quoting.
>
>which leaves us (without quoting) with exactly one argument.  argv[1],
>aka GetCommandLine.
>
>>On Windows the arguments are broken into individual components by the
>>runtime, e.g., MSVCRT.dll or Cygwin1.dll.
>
>And the rules for quoting are the same for ms and cygwin?

Probably not but I was the one who raised the issue of quoting, not you.
Quoting is irrelevant to the general assertion that there is only an
argv[1] on Windows.  If that is the behavior that you're seeing then
something is wrong in the way the program is being invoked.

Maybe I'm missing something here and you're talking about some specific
case in git.  It's hard to see how anyone could make this assertion
otherwise.

cgf

^ permalink raw reply

* Re: [patch] Make git-svn init accept a target dir
From: Luca Barbato @ 2006-06-29  0:41 UTC (permalink / raw)
  To: Eric Wong; +Cc: git
In-Reply-To: <20060629002852.GA29147@hand.yhbt.net>

[-- Attachment #1: Type: text/plain, Size: 199 bytes --]

Eric Wong wrote:

> 
> Sounds useful and I'll probably accept it, but I don't see the actual
> patch, though...
> 

ops...

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero


[-- Attachment #2: git-svn-init.patch --]
[-- Type: text/plain, Size: 1042 bytes --]

--- /usr/bin/git-svn	2006-06-29 00:57:26.000000000 +0200
+++ /usr/bin/git-svn.new	2006-06-28 03:40:11.000000000 +0200
@@ -5,7 +5,7 @@
 use strict;
 use vars qw/	$AUTHOR $VERSION
 		$SVN_URL $SVN_INFO $SVN_WC $SVN_UUID
-		$GIT_SVN_INDEX $GIT_SVN
+		$GIT_SVN_INDEX $GIT_SVN $REPO_PATH
 		$GIT_DIR $REV_DIR/;
 $AUTHOR = 'Eric Wong <normalperson@yhbt.net>';
 $VERSION = '1.1.0-pre';
@@ -98,6 +98,7 @@
 $GIT_SVN ||= $ENV{GIT_SVN_ID} || 'git-svn';
 $GIT_SVN_INDEX = "$GIT_DIR/$GIT_SVN/index";
 $SVN_URL = undef;
+$REPO_PATH = undef;
 $REV_DIR = "$GIT_DIR/$GIT_SVN/revs";
 $SVN_WC = "$GIT_DIR/$GIT_SVN/tree";
 
@@ -227,6 +228,16 @@
 sub init {
 	$SVN_URL = shift or die "SVN repository location required " .
 				"as a command-line argument\n";
+        $REPO_PATH = shift;
+        
+        if ($REPO_PATH) {
+            unless (-d $REPO_PATH) {
+                mkpath([$REPO_PATH]);
+            }
+            $GIT_DIR=$REPO_PATH . "/.git";
+            $ENV{GIT_DIR}=$GIT_DIR;
+        }
+
 	unless (-d $GIT_DIR) {
 		sys('git-init-db');
 	}

^ permalink raw reply

* Re: CFT: merge-recursive in C
From: Alex Riesen @ 2006-06-29  0:38 UTC (permalink / raw)
  To: Christopher Faylor; +Cc: git
In-Reply-To: <20060628150647.GA16935@trixie.casa.cgf.cx>

Christopher Faylor, Wed, Jun 28, 2006 17:06:47 +0200:
> >It still uses some calls to git programs (git-update-index,
> >git-hash-object, git-diff-tree and git-write-tree), and merge(1) has
> >the labels (-L) missing - I was unsure how to tackle this on windows -
> >it has only argv[1].
> 
> Actually, Windows should behave the same as Linux wrt argv handling.
> You can use argv[1] ... argv[n] modulo any differences in command line
> quoting.

which leaves us (without quoting) with exactly one argument. argv[1],
aka GetCommandLine.

> On Windows the arguments are broken into individual components by the
> runtime, e.g., MSVCRT.dll or Cygwin1.dll.

And the rules for quoting are the same for ms and cygwin? It's just
passing arguments between cygwin programs and windows natives never
works as one might them expect. Try passing "^" to a batch script (to
a perl script with cmd wrapper around it).

^ permalink raw reply

* Re: [patch] Make git-svn init accept a target dir
From: Eric Wong @ 2006-06-29  0:28 UTC (permalink / raw)
  To: Luca Barbato; +Cc: git
In-Reply-To: <44A30BAD.60907@gentoo.org>

Luca Barbato <lu_zero@gentoo.org> wrote:
> Since I'm lazy I just hacked a bit git-svn in order to create a target
> dir and init it if is passed as second parameter.
> 
> git-svn init url://to/the/repo local-repo
> 
> will create the local-repo dir if doesn't exist yet and populate it as
> expected.
> 
> Maybe someone else could find it useful

Sounds useful and I'll probably accept it, but I don't see the actual
patch, though...

-- 
Eric Wong

^ permalink raw reply

* Re: CFT: merge-recursive in C (updated)
From: Alex Riesen @ 2006-06-29  0:25 UTC (permalink / raw)
  To: git; +Cc: Johannes Schindelin, Junio C Hamano, Fredrik Kuivinen,
	Linus Torvalds
In-Reply-To: <81b0412b0606280234x7d07fbbck7887b5214d98bf91@mail.gmail.com>

Alex Riesen, Wed, Jun 28, 2006 11:34:23 +0200:
> >> > - use a pipe to "git-update-index --index-info" instead of
> >> > using command line
> 
> ...and to take it a step further, a patch (0002) to avoid too many calls to
> git-write-tree and to git-update-index. Brought merge times on my test
> monster (~25k files) down to 2min 30sec (from something around 11 min).

this broke t6022-merge-rename.sh (the second test). It produces an
index with this:

.../t/trash$ git-diff-index white
:100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B
:100644 100644 ba41fb96393979b22691106b06bf5231eab57b85 0000000000000000000000000000000000000000 M      N

whereas git-merge-recursive (and the previous version, without pipe):

.../t/trash$ git-diff-index white
:100644 100644 2d603156dc5bdf6295c789cac08e3c9942a0b82a 0000000000000000000000000000000000000000 M      B

I can see that "git update-index --add" is somehow different from a
pipe to "git update-index --index-info", but not very clear. Does this
"zero-sha1" mean that the file "N" is not in the index?

^ permalink raw reply

* [patch] Make git-svn init accept a target dir
From: Luca Barbato @ 2006-06-28 23:07 UTC (permalink / raw)
  To: git

Since I'm lazy I just hacked a bit git-svn in order to create a target
dir and init it if is passed as second parameter.

git-svn init url://to/the/repo local-repo

will create the local-repo dir if doesn't exist yet and populate it as
expected.

Maybe someone else could find it useful

lu

-- 

Luca Barbato

Gentoo/linux Gentoo/PPC
http://dev.gentoo.org/~lu_zero

^ permalink raw reply

* Re: [PATCH] Makefile: set USE_PIC on Linux x86_64 for linking with Git.pm
From: Junio C Hamano @ 2006-06-28 21:24 UTC (permalink / raw)
  To: Marco Roeland; +Cc: git, pasky
In-Reply-To: <20060628190814.GC5713@fiberbit.xs4all.nl>

Marco Roeland <marco.roeland@xs4all.nl> writes:

> Even for Linux someone mentioned that probably i386 is the exception in
> _not_ needing the -fPIC linkage. It might even be specific to the Perl
> "xs" implementation specifics?

USE_PIC is for pleasing Perly git and nothing else right now.

> So I should have added "Works for me (TM)"! ;-)

That would have been more explicit way to tell me that this is a
partial solution and I should solicit help from people on other
platforms.

By the way, I had an impression that compiling things with -fPIC
when not necessary was generally a bad idea from performance
point of view.  If that is the case we might want to compile,
under USE_PIC, everything with -fPIC in a separate area to
compile and link with Git.xs, without affecting the C-only core
code.

I suspect this would largely depend on the architecture.  I ran
git-fsck-objects compiled with and without -fPIC (after "make
clean" to rebuild everything) on a fully packed copy of the
linux-2.6 repository on my x86_64 box, and did not see
meaningful differences:

: gitster; /usr/bin/time ../git.junio/git-fsck-objects-no-pic --full
109.71user 5.01system 1:54.89elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (14major+1834967minor)pagefaults 0swaps
: gitster; /usr/bin/time ../git.junio/git-fsck-objects-with-pic --full
109.05user 4.97system 1:54.08elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+1834981minor)pagefaults 0swaps
: gitster;

^ permalink raw reply

* Re: Prepare "git-merge-tree" for future work
From: Linus Torvalds @ 2006-06-28 21:09 UTC (permalink / raw)
  To: Junio C Hamano, Git Mailing List
In-Reply-To: <Pine.LNX.4.64.0606281054470.3782@g5.osdl.org>



On Wed, 28 Jun 2006, Linus Torvalds wrote:
> 
> This changes how "git-merge-tree" works in two ways:

Ok, and here is a totally bogus (but hopefully instructive) patch on how 
to actually use this to prepare a diff.

I punted on trying to use the proper git diff interfaces (they are very 
tightly tied into the "diff_filespec" model - Junio, it might be nice if 
there was some way to use them in a setting where that isn't necessarily 
as natural). So the "diff" is actually just the raw output from xlib, but 
that wasn't really the point of this. 

It was more meant to do part of the the the "git-merge-one-file" logic, 
and use that logic together with the git-merge-tree logic to do a "merge" 
to diff against.

As an example, with this you can now do

	git-merge-tree $(git-merge-base HEAD next) HEAD next

to generate a "kind of" diff between the current HEAD and what would 
happen if it got merged with "next".

What (a properly done version of) this would be actually useful for is 
when you want to see what the merge results would be. For example, that's 
an integral part of sending a "please pull" request: telling the receiver 
what the diffstat should be after the pull (even though we don't _locally_ 
want to actually execute that merge - we expect the remote side to do so).

Anybody want to actually make this useful for something like that?

		Linus

----
diff --git a/Makefile b/Makefile
index cde619c..e880457 100644
--- a/Makefile
+++ b/Makefile
@@ -217,7 +217,7 @@ LIB_OBJS = \
 	server-info.o setup.o sha1_file.o sha1_name.o strbuf.o \
 	tag.o tree.o usage.o config.o environment.o ctype.o copy.o \
 	fetch-clone.o revision.o pager.o tree-walk.o xdiff-interface.o \
-	alloc.o $(DIFF_OBJS)
+	alloc.o merge-file.o $(DIFF_OBJS)
 
 BUILTIN_OBJS = \
 	builtin-log.o builtin-help.o builtin-count.o builtin-diff.o builtin-push.o \
diff --git a/merge-file.c b/merge-file.c
new file mode 100644
index 0000000..4fedd6b
--- /dev/null
+++ b/merge-file.c
@@ -0,0 +1,107 @@
+#include "cache.h"
+#include "run-command.h"
+#include "blob.h"
+
+static void rm_temp_file(const char *filename)
+{
+	unlink(filename);
+}
+
+static const char *write_temp_file(struct blob *blob)
+{
+	int fd;
+	const char *tmp = getenv("TMPDIR");
+	unsigned long size;
+	char *filename, type[20];
+	void *buf;
+
+	if (!tmp)
+		tmp = "/tmp";
+	filename = mkpath("%s/%s", tmp, "git-tmp-XXXXXX");
+	fd = mkstemp(filename);
+	if (fd < 0)
+		return NULL;
+	filename = strdup(filename);
+	buf = read_sha1_file(blob->object.sha1, type, &size);
+	if (buf) {
+		if (size != xwrite(fd, buf, size)) {
+			rm_temp_file(filename);
+			return NULL;
+		}
+		free(buf);
+	}
+	close(fd);
+	return filename;
+}
+
+static void *read_temp_file(const char *filename, unsigned long *size)
+{
+	struct stat st;
+	char *buf = NULL;
+	int fd = open(filename, O_RDONLY);
+	if (fd < 0)
+		return NULL;
+	if (!fstat(fd, &st)) {
+		*size = st.st_size;
+		buf = xmalloc(st.st_size);
+		if (st.st_size != xread(fd, buf, st.st_size)) {
+			free(buf);
+			buf = NULL;
+		}
+	}
+	close(fd);
+	return buf;
+}
+
+void *merge_file(struct blob *base, struct blob *our, struct blob *their, unsigned long *size)
+{
+	const char *t1, *t2, *t3;
+	char type[20];
+
+	if (base) {
+		int code;
+		void *res;
+
+		/*
+		 * Removed in either branch?
+		 *
+		 * NOTE! This depends on the caller having done the
+		 * proper warning about removing a file that got
+		 * modified in the other branch!
+		 */
+		if (!our || !their)
+			return NULL;
+		t1 = write_temp_file(base);
+		t2 = write_temp_file(our);
+		t3 = write_temp_file(their);
+		res = NULL;
+		if (t1 && t2 && t3) {
+			code = run_command("merge", t2, t1, t3, NULL);
+			if (!code || code == -1)
+				res = read_temp_file(t2, size);
+		}
+		rm_temp_file(t1);
+		rm_temp_file(t2);
+		rm_temp_file(t3);
+		return res;
+	}
+
+	if (!our)
+		return read_sha1_file(their->object.sha1, type, size);
+	if (!their)
+		return read_sha1_file(our->object.sha1, type, size);
+
+	/*
+	 * Added in both!
+	 *
+	 * This is nasty.
+	 *
+	 * We should do something like 
+	 *
+	 *	git diff our their | git-apply --no-add
+	 *
+	 * to generate a "minimal common file" to be used
+	 * as a base. Right now we punt.
+	 */
+	return NULL;
+}
diff --git a/merge-tree.c b/merge-tree.c
index fd0c211..7cf00be 100644
--- a/merge-tree.c
+++ b/merge-tree.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "tree-walk.h"
+#include "xdiff-interface.h"
 #include "blob.h"
 
 static const char merge_tree_usage[] = "git-merge-tree <base-tree> <branch1> <branch2>";
@@ -52,6 +53,77 @@ static const char *explanation(struct me
 	return "removed in remote";
 }
 
+extern void *merge_file(struct blob *, struct blob *, struct blob *, unsigned long *);
+
+static void *result(struct merge_list *entry, unsigned long *size)
+{
+	char type[20];
+	struct blob *base, *our, *their;
+
+	if (!entry->stage)
+		return read_sha1_file(entry->blob->object.sha1, type, size);
+	base = NULL;
+	if (entry->stage == 1) {
+		base = entry->blob;
+		entry = entry->link;
+	}
+	our = NULL;
+	if (entry && entry->stage == 2) {
+		our = entry->blob;
+		entry = entry->link;
+	}
+	their = NULL;
+	if (entry)
+		their = entry->blob;
+	return merge_file(base, our, their, size);
+}
+
+static void *origin(struct merge_list *entry, unsigned long *size)
+{
+	char type[20];
+	while (entry) {
+		if (entry->stage == 2)
+			return read_sha1_file(entry->blob->object.sha1, type, size);
+		entry = entry->link;
+	}
+	return NULL;
+}
+
+static int show_outf(void *priv_, mmbuffer_t *mb, int nbuf)
+{
+	int i;
+	for (i = 0; i < nbuf; i++)
+		printf("%.*s", (int) mb[i].size, mb[i].ptr);
+	return 0;
+}
+
+static void show_diff(struct merge_list *entry)
+{
+	unsigned long size;
+	mmfile_t src, dst;
+	xpparam_t xpp;
+	xdemitconf_t xecfg;
+	xdemitcb_t ecb;
+
+	xpp.flags = XDF_NEED_MINIMAL;
+	xecfg.ctxlen = 3;
+	xecfg.flags = 0;
+	ecb.outf = show_outf;
+	ecb.priv = NULL;
+
+	src.ptr = origin(entry, &size);
+	if (!src.ptr)
+		size = 0;
+	src.size = size;
+	dst.ptr = result(entry, &size);
+	if (!dst.ptr)
+		size = 0;
+	dst.size = size;
+	xdl_diff(&src, &dst, &xpp, &xecfg, &ecb);
+	free(src.ptr);
+	free(dst.ptr);
+}
+
 static void show_result_list(struct merge_list *entry)
 {
 	printf("%s\n", explanation(entry));
@@ -70,6 +142,7 @@ static void show_result(void)
 	walk = merge_result;
 	while (walk) {
 		show_result_list(walk);
+		show_diff(walk);
 		walk = walk->next;
 	}
 }

^ permalink raw reply related

* Re: [PATCH] Makefile: set USE_PIC on Linux x86_64 for linking with Git.pm
From: Pavel Roskin @ 2006-06-28 20:52 UTC (permalink / raw)
  To: Marco Roeland; +Cc: Junio C Hamano, git
In-Reply-To: <20060628192145.GD5713@fiberbit.xs4all.nl>

On Wed, 2006-06-28 at 21:21 +0200, Marco Roeland wrote:
> I certainly do not know cases outside Linux where this might break on
> x86-64. I just tried to limit it to the case I could test. But perhaps
> someone with an x86-64 BSD or Solaris might try it?
> 
> To paraphrase Dave Jones: I type 'make', it fails. Some 'git log' later
> I realise I have to manually define 'USE_PIC'. Hey, why doesn't it work
> automagically?

Automagically?  You should search the archives for "Autoconf".  When I
proposed using it, the hell broke loose.  Now let me indulge in
Schadenfreude :-)

I guess I'll need to argue with a working patch next time.

-- 
Regards,
Pavel Roskin

^ permalink raw reply

* Problem with GITK
From: Paolo Ciarrocchi @ 2006-06-28 20:40 UTC (permalink / raw)
  To: Git Mailing List

Hi all,
there is a strange orange line in the following screenshot from gitk:
http://paolo.ciarrocchi.googlepages.com/Screenshotgit.png

paolo@Italia:~/linux-2.6$ git version
git version 1.4.1.rc1.g47e5

Ciao,

-- 
Paolo
http://paolociarrocchi.googlepages.com
http://picasaweb.google.com/paolo.ciarrocchi

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox