Git development
 help / color / mirror / Atom feed
* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-18 21:33 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7vzmutqz5f.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Wed, May 18, 2005 at 12:13:32AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> PB> But that won't work good enough for me. E.g. when committing in a
> PB> subdirectory, I want to commit only changes made in the subdirectory,
> PB> etc.
> 
> Assuming that you have something that lets you commit selected
> files when you are at the top level (say cg-commit), and further
> assuming that today it only works from the toplevel, that is:
> 
>     $ pwd
>     /usr/src/linux
>     $ cg-commit fs/ext?/Makefile
> 
> works today, what I am saying is:
> 
>     $ pwd
>     /usr/src/linux/fs
>     $ git-run-with-user-path cg-commit -- ext?/Makefile
> 
> would work.

Yes. But if you do just cg-commit in the subdirectory, it won't work.
You could pass the original directory in some environment variable or
whatever, but I think that's just not worth the trouble for Cogito -
it's much easier for it when you just stay in the directory you are in
and instead set the environment variables so that the git toolkit DTRT.
(I like this acronym. :-)

> BTW, I am wondering if your choice of cg-commit as an example
> (as opposed to something else like diff or add) is a flamebait
> or just an innocent random example ;-)?

It was completely innocent. :-) How would it be a flamebait?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Petr Baudis @ 2005-05-18 21:35 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505171802570.30848-100000@iabervon.org>

Dear diary, on Wed, May 18, 2005 at 12:20:40AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
> On Tue, 17 May 2005, Petr Baudis wrote:
> 
> > Dear diary, on Tue, May 17, 2005 at 11:20:54PM CEST, I got a letter
> > where Daniel Barkalow <barkalow@iabervon.org> told me that...
> > > Hmm... maybe the right thing is to make the implementation-provided
> > > transfer code handle arbitrary things in GIT_DIR, but have code for
> > > updating reference files atomically and using a reference file to start
> > > from use "refs/"? Certainly, there's nothing special about reference files
> > > in transit.
> > > 
> > > Certainly the things in the info/ directory shouldn't be treated a head
> > > that you're going to pull, so that has to be different above the protocol
> > > level anyway.
> > 
> > *confused* :) I'm sorry, I have trouble understanding this. Could you
> > rephrase, please?
> 
> If you want to get info/ignore, you want to get it and save it, not
> download a set of objects it refers to. So it's different from specifying
> that you want to use refs/heads/master as the starting point for a pull.

Obviously. I think you should need to "explicitly" tell pull to actually
save any files locally, since you (I mean Cogito) certainly does not
want the pull stuff to touch the local refs/heads/master - it wants it
in some other file.

> > > So the remote receiver should get an instruction: change X from OLD to NEW
> > > and pull NEW. It should:
> > > 
> > >  - lock the file against further updates
> > >  - check that the current value is the provided OLD
> > >  - pull the necessary objects
> > >  - write NEW to the file
> > - unlock the file ;-))
> 
> The way I'm actually doing things is to write NEW into the lock file at
> some arbitrary point, and "writing to the file" is actually renaming the
> lock file to the normal filename. So writing unlocks the file
> automatically.

Ah. Obviously. That makes sense. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: README rewrite
From: Petr Baudis @ 2005-05-18 21:42 UTC (permalink / raw)
  To: Zack Brown; +Cc: Wink Saville, git
In-Reply-To: <20050516151604.GF7391@tumblerings.org>

Dear diary, on Mon, May 16, 2005 at 05:16:04PM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> So a branch is just a name for a cloned tree somewhere, the same as a tag is
> just a name for a revision some time in the past?

Very much so.

> On Sun, May 15, 2005 at 07:28:03PM +0200, Petr Baudis wrote:
> > So the local branch is the "master" branch, the rest are "remote"
> > branches. Note that there is a theoretical support for multiple local
> > branches, but I decided not to make things even more confusing and there
> > is no Cogito interface for managing them now.
> 
> Is there anything about the repository that 'knows' which is the master branch,
> or is this just a matter of which person is in charge? So, if I have a project,
> and I have a Cogito repository, so far it's just me, and just one branch.
> 
> Then another person joins the project, and they clone my repository onto their
> local system, and give it their own branch name.
> 
> Now here is the question:
> 
> We decide that the other person is a better project leader, and we decide to use
> their branch as the master branch, and mine as just a remote branch.
> 
> Would that be normal Cogito behavior? i.e. there is nothing to distinguish a
> 'master' branch from any other except that it is the one everyone says is the
> master branch?

That's right. The "master" branch is just your local thing, as well as
naming of the remote branches. The "master" branch name means this is
the branch representing your working tree, not that it is the mainline
of the project or anything. If you fork Linus' tree, in your repository
your fork will be the "master" branch, and Linus' branch will be called
however you name it.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: README rewrite
From: Petr Baudis @ 2005-05-18 22:27 UTC (permalink / raw)
  To: Zack Brown; +Cc: git
In-Reply-To: <20050515044941.GB7391@tumblerings.org>

Dear diary, on Sun, May 15, 2005 at 06:49:41AM CEST, I got a letter
where Zack Brown <zbrown@tumblerings.org> told me that...
> Here's an updated patch with fixes, apply instead of the one I just sent:
> 
> Signed-off-by: Zack Brown <zbrown@tumblerings.org>

Thanks. I've used the first part of the rewrite, tweaked it
substantially (I have some reservations about the style and suspicions
regarding the grammar), and somewhat awkwardly merged in the current
stuff missing in the rewrite (cg-diff and such).

I'd prefer the reference documentation in separate Documentation/ files,
much in the style of the GIT documentation.

Thanks again,

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH cogito] "cg-whatsnew" command
From: Petr Baudis @ 2005-05-18 22:30 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: Matthias Urlichs, git
In-Reply-To: <tnxis1jk1sn.fsf@arm.com>

Dear diary, on Mon, May 16, 2005 at 10:33:44AM CEST, I got a letter
where Catalin Marinas <catalin.marinas@arm.com> told me that...
> Matthias Urlichs <smurf@smurf.noris.de> wrote:
> >> +	cg-diff		[-p] [-r FROM_ID[:TO_ID]] [-m [BNAME] [BNAME]] [FILE]...
> >
> > That should be
> >
> > [-m [BNAME [BNAME]]]
> 
> You are right.
> 
> > though I'd suggest something more mnemonic than two BNAMEs.
> 
> Another try, see attached.

Unfortunately I can't comment on it well when it's not either in the
body or as text/plain attachment.

I think the -m usage doesn't make much sense now. What about dropping
branch1 and instead using what the user passed as the -r argument?

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-18 22:41 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050518213309.GD10358@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

>> $ pwd
>> /usr/src/linux/fs
>> $ git-run-with-user-path cg-commit -- ext?/Makefile
>> 
>> would work.

PB> Yes. But if you do just cg-commit in the subdirectory, it won't work.

The point of git-run-with-user-path is that it canonicalizes and
filters the paths, chdir(2)'s to GIT_PROJECT_TOP before running
cg-commit.  So when cg-commit starts in the above example,

    (1) its $cwd is /usr/src/linux and your .git subdirectory is
        right there in ./.git/
    (2) it gets fs/ext2/Makefile and fs/ext3/Makefile as arguments.

>> BTW, I am wondering if your choice of cg-commit as an example
>> (as opposed to something else like diff or add) is a flamebait
>> or just an innocent random example ;-)?

PB> It was completely innocent. :-) How would it be a flamebait?

<http://members.cox.net/junkio/per-file-commit.txt> ;-).


^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Petr Baudis @ 2005-05-18 23:24 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds
In-Reply-To: <7vekc4nom5.fsf@assigned-by-dhcp.cox.net>

Dear diary, on Thu, May 19, 2005 at 12:41:38AM CEST, I got a letter
where Junio C Hamano <junkio@cox.net> told me that...
> >>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:
> 
> >> $ pwd
> >> /usr/src/linux/fs
> >> $ git-run-with-user-path cg-commit -- ext?/Makefile
> >> 
> >> would work.
> 
> PB> Yes. But if you do just cg-commit in the subdirectory, it won't work.
> 
> The point of git-run-with-user-path is that it canonicalizes and
> filters the paths, chdir(2)'s to GIT_PROJECT_TOP before running
> cg-commit.  So when cg-commit starts in the above example,
> 
>     (1) its $cwd is /usr/src/linux and your .git subdirectory is
>         right there in ./.git/
>     (2) it gets fs/ext2/Makefile and fs/ext3/Makefile as arguments.

Yes. My point is that sometimes the Cogito commands have
directory-specific functionality even when called without any arguments.

$ pwd
/usr/src/linux
$ date >>README
$ cd fs
$ date >>Makefile
$ cg-commit

will commit only the fs/Makefile change.

> >> BTW, I am wondering if your choice of cg-commit as an example
> >> (as opposed to something else like diff or add) is a flamebait
> >> or just an innocent random example ;-)?
> 
> PB> It was completely innocent. :-) How would it be a flamebait?
> 
> <http://members.cox.net/junkio/per-file-commit.txt> ;-).

JIT's snapshotting makes up for it, I think. It has some beauty. :-)

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* Re: [PATCH cogito] "cg-whatsnew" command
From: Matthias Urlichs @ 2005-05-18 23:50 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Catalin Marinas, git
In-Reply-To: <20050518223034.GH10358@pasky.ji.cz>

Hi,

Petr Baudis:
> Unfortunately I can't comment on it well when it's not either in the
> body or as text/plain attachment.

It was a text/x-patch attachment. What's the problem?
If nothing else: save it, read it in, s/^/> /.  *shrug*

At least, this way there won't be any word-wrapping by
overzealous email programs ...

-- 
Matthias Urlichs   |   {M:U} IT Design @ m-u-it.de   |  smurf@smurf.noris.de

^ permalink raw reply

* Re: [PATCH 0/1] Diff-helper update
From: Junio C Hamano @ 2005-05-18 23:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git, pasky
In-Reply-To: <Pine.LNX.4.58.0505181338270.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Wed, 18 May 2005, Junio C Hamano wrote:
>> 
>> I suspect doing something like this might be saner instead,
>> assuming non raw-diffs come at the end.  

LT> It won't ever trigger, since we only exit the loop once we see EOF.

I was not talking about correctness, but the readability of the
code.  Breaking out from the loop to process raw-diff and
switching to straight copy would make our intent more explicit.


^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-18 23:56 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, torvalds
In-Reply-To: <20050518232408.GA18281@pasky.ji.cz>

>>>>> "PB" == Petr Baudis <pasky@ucw.cz> writes:

PB> Yes. My point is that sometimes the Cogito commands have
PB> directory-specific functionality even when called without any arguments.

PB> $ pwd
PB> /usr/src/linux
PB> $ date >>README
PB> $ cd fs
PB> $ date >>Makefile
PB> $ cg-commit

PB> will commit only the fs/Makefile change.

Ah, thanks.  That what I missed.




^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Linus Torvalds @ 2005-05-19  0:34 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Petr Baudis, git
In-Reply-To: <7v64xgnl55.fsf@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> 
> PB> Yes. My point is that sometimes the Cogito commands have
> PB> directory-specific functionality even when called without any arguments.
> 
> PB> $ pwd
> PB> /usr/src/linux
> PB> $ date >>README
> PB> $ cd fs
> PB> $ date >>Makefile
> PB> $ cg-commit
> 
> PB> will commit only the fs/Makefile change.
> 
> Ah, thanks.  That what I missed.

Note that if git-run-with-user-path just has some way to tell what the
relative pathname of the original program was (say $DEF_SUBDIRECTORY),
this could still fairly easily be handled: having the cg-Xcommit program
say "if there are no arguments, we default to $DEF_SUBDIRECTORY" rather
than "with no arguments, default to '.'".

I don't personally much care, since this is all porcelain, but basically I
don't think these things are in any way mutually incompatible, and I do
believe that git-run-with-user-path _could_ be a good way to abstract out
the "where the heck in the tree am I?" issues.

		Linus

^ permalink raw reply

* [preview] diff-helper rename detection.
From: Junio C Hamano @ 2005-05-19  2:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, git
In-Reply-To: <Pine.LNX.4.58.0505181110480.18337@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

>> About the built-in diff not doing the rename , I have a bit
>> longer term (knowing _my_ timescale I'd imagine you would
>> understand that is not that long ;-) plan to have -p option for
>> diff-* family to use the same rename detection logic that I
>> added to diff-helper in the patch you are commenting on.

LT> Goodie. I was hoping that was the case, but felt that the diff-helper 
LT> thing should be pretty easy to do.

This is not yet a request for inclusion but is a preview
requesting for testing and comments, especially from Linus (for
general usability comments and suggestions for cut-off-points in
heuristics) and Nico (in case I had blatantly misused the
diff-delta interface).

I stole diff-delta part from the delta support patch (but I
dropped the part that actually touches sha1_file part, hence
this does not introduce the deltified object store), and used it
as an "extent-of-damage estimator".  The rename detector logic
in the previous round was detecting exact renames only (and did
not even look at the filesystem so the raw-diff could not come
from diff-files or diff-cache without --cached), but this round
it actually checks the content.  The interim heuristics is:

  - If exact match in SHA1, or exact match comparing the files,
    then that is a rename (trivial);

  - If size changed more than 20% that cannot be a rename;

  - If delta produced by the deltification stuff between two is
    more than 20% of the original, that cannot be a rename;

  - Otherwise pick the deleted-created file pair that has the
    smallest delta between them.

I've only very lightly tested it but it seems to do the right
thing.  I fed soemthing like this when I had heavily modified
diff-helper.c (diff-helpee.c was taken from the git-ls-files
output for diff-helper.c from the cache) and lightly modified
diff.c (diffo.c was from the cache), and diffo.c -> diff.c were
reported as rename, while helpee -> helper was not.

-100644	blob	cde27275fad8103084d7ed2d08d246ba4ce6eb9c	Makefile
+100644	blob	cde27275fad8103084d7ed2d08d246ba4ce6eb9c	GNUMakefile
-100644	blob	2877ddc4df85179c03cdcc8c7a66831951ec5d97	diff-helpee.c
+100644	blob	0000000000000000000000000000000000000000	diff-helper.c
-100644	blob	74004e5a3f9fa491b30ab3d5f231826593e4eae4	diffo.c
+100644	blob	0000000000000000000000000000000000000000	diff.c

Not-signed-off-yet-by: Junio C Hamano <junkio@cox.net>
---
# - linus: merge-base: use the new lookup_commit_reference() helper function
# + 9: diff-helper detects renames with Nico's delta stuff.
diff -git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -36,7 +36,7 @@ install: $(PROG) $(SCRIPTS)
 	$(INSTALL) $(PROG) $(SCRIPTS) $(dest)$(bin)
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o
+	 tag.o date.o diff-delta.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
 
diff -git a/delta.h b/delta.h
new file mode 100644
--- /dev/null
+++ b/delta.h
@@ -0,0 +1,6 @@
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+			void *to_buf, unsigned long to_size,
+		        unsigned long *delta_size);
+extern void *patch_delta(void *src_buf, unsigned long src_size,
+			 void *delta_buf, unsigned long delta_size,
+			 unsigned long *dst_size);
diff -git a/diff-delta.c b/diff-delta.c
new file mode 100644
--- /dev/null
+++ b/diff-delta.c
@@ -0,0 +1,330 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ *  Many parts of this file have been lifted from LibXDiff version 0.10.
+ *  http://www.xmailserver.org/xdiff-lib.html
+ *
+ *  LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2003	Davide Libenzi
+ *
+ *  Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ *  This file is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+	
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i)  { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i)  DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i)  DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i)  DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf)    DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+	int k;
+	unsigned int s1 = adler & 0xffff;
+	unsigned int s2 = adler >> 16;
+
+	while (len > 0) {
+		k = MIN(len, NMAX);
+		len -= k;
+		while (k >= 16) {
+			DO16(buf);
+			buf += 16;
+			k -= 16;
+		}
+		if (k != 0)
+			do {
+				s1 += *buf++;
+				s2 += s1;
+			} while (--k);
+		s1 %= BASE;
+		s2 %= BASE;
+	}
+
+	return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+	unsigned int val = 1, bits = 0;
+	while (val < size && bits < 32) {
+		val <<= 1;
+	       	bits++;
+	}
+	return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+	struct s_chanode *next;
+	int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+	chanode_t *head, *tail;
+	int isize, nsize;
+	chanode_t *ancur;
+	chanode_t *sncur;
+	int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+	cha->head = cha->tail = NULL;
+	cha->isize = isize;
+	cha->nsize = icount * isize;
+	cha->ancur = cha->sncur = NULL;
+	cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+	chanode_t *ancur;
+	void *data;
+
+	ancur = cha->ancur;
+	if (!ancur || ancur->icurr == cha->nsize) {
+		ancur = malloc(sizeof(chanode_t) + cha->nsize);
+		if (!ancur)
+			return NULL;
+		ancur->icurr = 0;
+		ancur->next = NULL;
+		if (cha->tail)
+			cha->tail->next = ancur;
+		if (!cha->head)
+			cha->head = ancur;
+		cha->tail = ancur;
+		cha->ancur = ancur;
+	}
+
+	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+	ancur->icurr += cha->isize;
+	return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+	chanode_t *cur = cha->head;
+	while (cur) {
+		chanode_t *tmp = cur;
+		cur = cur->next;
+		free(tmp);
+	}
+}
+
+typedef struct s_bdrecord {
+	struct s_bdrecord *next;
+	unsigned int fp;
+	const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+	const unsigned char *data, *top;
+	chastore_t cha;
+	unsigned int fphbits;
+	bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+	unsigned int fphbits;
+	int i, hsize;
+	const unsigned char *base, *data, *top;
+	bdrecord_t *brec;
+	bdrecord_t **fphash;
+
+	fphbits = hashbits(bufsize / BLK_SIZE + 1);
+	hsize = 1 << fphbits;
+	fphash = malloc(hsize * sizeof(bdrecord_t *));
+	if (!fphash)
+		return -1;
+	for (i = 0; i < hsize; i++)
+		fphash[i] = NULL;
+	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+	bdf->data = data = base = buf;
+	bdf->top = top = buf + bufsize;
+	data += (bufsize / BLK_SIZE) * BLK_SIZE;
+	if (data == top)
+		data -= BLK_SIZE;
+
+	for ( ; data >= base; data -= BLK_SIZE) {
+		brec = cha_alloc(&bdf->cha);
+		if (!brec) {
+			cha_free(&bdf->cha);
+			free(fphash);
+			return -1;
+		}
+		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+		brec->ptr = data;
+		i = HASH(brec->fp, fphbits);
+		brec->next = fphash[i];
+		fphash[i] = brec;
+	}
+
+	bdf->fphbits = fphbits;
+	bdf->fphash = fphash;
+
+	return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+	free(bdf->fphash);
+	cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+     !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+		 void *to_buf, unsigned long to_size,
+		 unsigned long *delta_size)
+{
+	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int fp;
+	const unsigned char *data, *top, *ptr1, *ptr2;
+	unsigned char *out, *orig;
+	bdrecord_t *brec;
+	bdfile_t bdf;
+
+	if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+		return NULL;
+	
+	outpos = 0;
+	outsize = 8192;
+	out = malloc(outsize);
+	if (!out) {
+		delta_cleanup(&bdf);
+		return NULL;
+	}
+
+	data = to_buf;
+	top = to_buf + to_size;
+
+	/* store reference buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (from_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = from_size;
+		}
+		i++;
+		from_size >>= 8;
+	} while (from_size);
+
+	/* store target buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (to_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = to_size;
+		}
+		i++;
+		to_size >>= 8;
+	} while (to_size);
+
+	inscnt = 0;
+	moff = 0;
+	while (data < top) {
+		msize = 0;
+		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+		i = HASH(fp, bdf.fphbits);
+		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+			if (brec->fp == fp) {
+				csize = bdf.top - brec->ptr;
+				if (csize > top - data)
+					csize = top - data;
+				for (ptr1 = brec->ptr, ptr2 = data; 
+				     csize && *ptr1 == *ptr2;
+				     csize--, ptr1++, ptr2++);
+
+				csize = ptr1 - brec->ptr;
+				if (csize > msize) {
+					moff = brec->ptr - bdf.data;
+					msize = csize;
+					if (msize >= 0x10000) {
+						msize = 0x10000;
+						break;
+					}
+				}
+			}
+		}
+
+		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+			if (!inscnt)
+				outpos++;
+			out[outpos++] = *data++;
+			inscnt++;
+			if (inscnt == 0x7f) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+		} else {
+			if (inscnt) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+
+			data += msize;
+			orig = out + outpos++;
+			i = 0x80;
+
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+			msize >>= 8;
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+			*orig = i;
+		}
+
+		/* next time around the largest possible output is 1 + 4 + 3 */
+		if (outpos > outsize - 8) {
+			void *tmp = out;
+			outsize = outsize * 3 / 2;
+			out = realloc(out, outsize);
+			if (!out) {
+				free(tmp);
+				delta_cleanup(&bdf);
+				return NULL;
+			}
+		}
+	}
+
+	if (inscnt)
+		out[outpos - inscnt - 1] = inscnt;
+
+	delta_cleanup(&bdf);
+	*delta_size = outpos;
+	return out;
+}
diff -git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -5,6 +5,7 @@
 #include "cache.h"
 #include "strbuf.h"
 #include "diff.h"
+#include "delta.h"
 
 static int matches_pathspec(const char *name, const char **spec, int cnt)
 {
@@ -31,8 +32,14 @@ static int detect_rename = 0;
 
 static struct diff_spec_hold {
 	struct diff_spec_hold *next;
-	struct diff_spec_hold *matched;
-	struct diff_spec old, new;
+	struct diff_spec old;
+	struct diff_spec new;
+	unsigned long size;
+	int flags;
+#define MATCHED 1
+#define SHOULD_FREE 2
+#define SHOULD_MUNMAP 4
+	void *data;
 	char path[1];
 } *createdfile, *deletedfile;
 
@@ -47,101 +54,199 @@ static void hold_spec(const char *path,
 	*list = elem;
 	elem->old = *old;
 	elem->new = *new;
-	elem->matched = 0;
+	elem->size = 0;
+	elem->data = NULL;
+	elem->flags = 0;
+}
+
+/* diff_spec sha1_valid flag tells us if mode is to be trusted
+ * and if blob_sha1 is not null_sha1 then that is also to be
+ * trusted.  Here we need to know if we need to look at the file
+ * system.
+ */
+static int look_at_fs(struct diff_spec *s)
+{
+	static const unsigned char null_sha1[20] = { 0, };
+	return (!s->sha1_valid ||
+		!memcmp(null_sha1, s->blob_sha1, sizeof(null_sha1)));
+}
+
+static int populate_data(struct diff_spec_hold *s, int use_old)
+{
+	char type[20];
+	struct diff_spec *u = use_old ? &(s->old) : &(s->new);
+
+	if (!look_at_fs(u)) {
+		s->data = read_sha1_file(u->blob_sha1, type, &s->size);
+		s->flags |= SHOULD_FREE;
+	}
+	else {
+		struct stat st;
+		int fd;
+		fd = open(s->path, O_RDONLY);
+		if (fd < 0)
+			return -1;
+		if (fstat(fd, &st)) {
+			close(fd);
+			return -1;
+		}
+		s->size = st.st_size;
+		s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
+		close(fd);
+		if (!s->size)
+			s->data = "";
+		else
+			s->flags |= SHOULD_MUNMAP;
+	}
+	return 0;
+}
+
+static void free_data(struct diff_spec_hold *s)
+{
+	if (s->flags & SHOULD_FREE)
+		free(s->data);
+	else if (s->flags & SHOULD_MUNMAP)
+		munmap(s->data, s->size);
+	s->flags &= ~(SHOULD_FREE|SHOULD_MUNMAP);
+}
+
+static void flush_rest(struct diff_spec_hold *elem,
+		       const char **spec, int cnt, int reverse)
+{
+	for ( ; elem ; elem = elem->next) {
+		free_data(elem);
+		if (elem->flags & MATCHED)
+			continue;
+		if (!cnt ||
+		    matches_pathspec(elem->path, spec, cnt)) {
+			if (reverse)
+				run_external_diff(elem->path, NULL,
+						  &elem->new, &elem->old);
+			else
+				run_external_diff(elem->path, NULL,
+						  &elem->old, &elem->new);
+		}
+	}
 }
 
-#define MINIMUM_SCORE 7000
-int estimate_similarity(struct diff_spec *one, struct diff_spec *two)
+#define EXACT_MATCH  10000
+#define MINIMUM_SCORE 5000
+int estimate_similarity(struct diff_spec_hold *src,
+			struct diff_spec_hold *dst,
+			int expect)
 {
-	/* Return how similar they are, representing the score as an
-	 * integer between 0 and 10000.
+	/* src points at a deleted file (src->old) and dst points at
+	 * a created file (dst->new).  They may be quite similar in which
+	 * case we want to say src->old is renamed to dst->new.
 	 *
-	 * This version is very dumb and detects exact matches only.
-	 * Wnen Nico's delta stuff gets in, I'll use the delta
-	 * algorithm to estimate the similarity score in core.
+	 * Compare them and return how similar they are, representing
+	 * the score as an integer between 0 and 10000.  10000 is
+	 * reserved for the case where they match exactly.
 	 */
+	void *delta;
+	unsigned long delta_size;
 
-	if (one->sha1_valid && two->sha1_valid &&
-	    !memcmp(one->blob_sha1, two->blob_sha1, 20))
+	if (!look_at_fs(&(src->old)) && !look_at_fs(&(dst->new)) &&
+	    !memcmp(src->old.blob_sha1, dst->new.blob_sha1, 20))
 		return 10000;
-	return 0;
+	if (populate_data(src, 1) || populate_data(dst, 0))
+		/* this is an error but will be caught downstream */
+		return 0;
+	if (src->size == dst->size &&
+	    !memcmp(src->data, dst->data, src->size))
+		return 10000;
+
+	if (expect == EXACT_MATCH)
+		return 0;
+
+	delta_size = ((src->size < dst->size) ?
+		      (dst->size - src->size) : (src->size - dst->size));
+
+	/* We would not consider rename followed by more than
+	 * 20% edits; that is, delta_size must be smaller than
+	 * (src->size + dst->size)/2 * 0.2, which means...
+	 */
+	if ((src->size + dst->size) < delta_size * 10)
+		return 0;
+
+	delta = diff_delta(src->data, src->size,
+			   dst->data, dst->size,
+			   &delta_size);
+	free(delta);
+
+	/* This "delta" is really xdiff with adler32 and all the
+	 * overheads but it is a quick and dirty approximation.
+	 * Again, we say there should be less than 20% edit.
+	 */
+	if ((src->size + dst->size) < delta_size * 10)
+		return 0;
+
+	/* Now we will give some score to it.  Let's say 20% edit gets
+	 * 5000 points and 0% edit gets 9000 points.  That is, every
+	 * 1/20000 edit gets 1 point penalty.  The amount of penalty is:
+	 *
+	 * (delta_size * 2 / (src->size + dst->size)) * 20000
+	 *
+	 */
+	return 9000 - (40000 * delta_size / (src->size+dst->size));
 }
 
-static void flush_renames(const char **spec, int cnt, int reverse)
+static void flush_rename_pairs(int floor_score,
+			       const char **spec, int cnt, int reverse)
 {
-	struct diff_spec_hold *rename_src, *rename_dst, *elem;
-	struct diff_spec_hold *leftover = NULL;
+	struct diff_spec_hold *src, *dst, *elem;
 	int score, best_score;
 
-	while (createdfile) {
-		rename_dst = createdfile;
-		createdfile = rename_dst->next;
-		best_score = MINIMUM_SCORE;
-		rename_src = NULL;
-		for (elem = deletedfile;
-		     elem;
-		     elem = elem->next) {
-			if (elem->matched)
+	for (dst = createdfile; dst; dst = dst->next) {
+		if (dst->flags & MATCHED)
+			continue;
+
+		best_score = floor_score;
+		src = NULL;
+		for (elem = deletedfile; elem; elem = elem->next) {
+			if (elem->flags & MATCHED)
 				continue;
-			score = estimate_similarity(&elem->old,
-						    &rename_dst->new);
-			if (best_score < score) {
-				rename_src = elem;
+			score = estimate_similarity(elem, dst, best_score);
+			if (best_score <= score) {
+				src = elem;
 				best_score = score;
 			}
 		}
-		if (rename_src) {
-			rename_src->matched = rename_dst;
-			rename_dst->matched = rename_src;
+		if (src) {
+			src->flags |= MATCHED;
+			dst->flags |= MATCHED;
+			free_data(src);
+			free_data(dst);
 
 			if (!cnt ||
-			    matches_pathspec(rename_src->path, spec, cnt) ||
-			    matches_pathspec(rename_dst->path, spec, cnt)) {
+			    matches_pathspec(src->path, spec, cnt) ||
+			    matches_pathspec(dst->path, spec, cnt)) {
 				if (reverse)
-					run_external_diff(rename_dst->path,
-							  rename_src->path,
-							  &rename_dst->new,
-							  &rename_src->old);
+					run_external_diff(dst->path,
+							  src->path,
+							  &dst->new,
+							  &src->old);
 				else
-					run_external_diff(rename_src->path,
-							  rename_dst->path,
-							  &rename_src->old,
-							  &rename_dst->new);
+					run_external_diff(src->path,
+							  dst->path,
+							  &src->old,
+							  &dst->new);
 			}
 		}
-		else {
-			rename_dst->next = leftover;
-			leftover = rename_dst;
-		}
 	}
+}
 
-	/* unmatched deletes */
-	for (elem = deletedfile; elem; elem = elem->next) {
-		if (elem->matched)
-			continue;
-		if (!cnt ||
-		    matches_pathspec(elem->path, spec, cnt)) {
-			if (reverse)
-				run_external_diff(elem->path, NULL,
-						  &elem->new, &elem->old);
-			else
-				run_external_diff(elem->path, NULL,
-						  &elem->old, &elem->new);
-		}
-	}
+static void flush_renames(const char **spec, int cnt, int reverse)
+{
+	/* We do this in two steps; first we pick up the exact matches
+	 * only and then closest match
+	 */
 
-	/* unmatched creates */
-	for (elem = leftover; elem; elem = elem->next) {
-		if (!cnt ||
-		    matches_pathspec(elem->path, spec, cnt)) {
-			if (reverse)
-				run_external_diff(elem->path, NULL,
-						  &elem->new, &elem->old);
-			else
-				run_external_diff(elem->path, NULL,
-						  &elem->old, &elem->new);
-		}
-	}
+	flush_rename_pairs(EXACT_MATCH, spec, cnt, reverse);
+	flush_rename_pairs(MINIMUM_SCORE, spec, cnt, reverse);
+
+	flush_rest(createdfile, spec, cnt, reverse);
+	flush_rest(deletedfile, spec, cnt, reverse);
 }
 
 static int parse_oneside_change(const char *cp, struct diff_spec *one,
diff -git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -329,7 +329,7 @@ void run_external_diff(const char *name,
 		die("unable to fork");
 	if (!pid) {
 		const char *pgm = external_diff();
-		/* not passing rename patch to external ones */
+		/* NEEDSWORK: not passing rename patch to external ones */
 		if (!other && pgm) {
 			if (one && two)
 				execlp(pgm, pgm,


^ permalink raw reply

* Re: [preview] diff-helper rename detection.
From: Linus Torvalds @ 2005-05-19  3:01 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Nicolas Pitre, Git Mailing List
In-Reply-To: <7vr7g4m0lz.fsf_-_@assigned-by-dhcp.cox.net>



On Wed, 18 May 2005, Junio C Hamano wrote:
> 
> This is not yet a request for inclusion but is a preview
> requesting for testing and comments, especially from Linus (for
> general usability comments and suggestions for cut-off-points in
> heuristics) and Nico (in case I had blatantly misused the
> diff-delta interface).

I think this is great, although I also bet that we'll tweak the heuristics
later.

I suspect that one tweak may be to try to find the "best" rename first,
rather than look at files in the order they were discovered (ie you'd
create a _matrix_ of the delta scores, rather than walk through the newly
created files in order). That would then potentially allow for a more
relaxed definitions of "rename", without any possibility of losing sight
of a better rename due to finding a bad one first.

But I think the simple heurstic is probably fine as a first cut.

So my only nit so far is that you declare "patch_delta()", even though you
don't actually have it. Just remove it.

		Linus

^ permalink raw reply

* Re: [preview] diff-helper rename detection.
From: Junio C Hamano @ 2005-05-19  3:08 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Nicolas Pitre, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505181952260.2322@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> I suspect that one tweak may be to try to find the "best" rename first,
LT> rather than look at files in the order they were discovered (ie you'd
LT> create a _matrix_ of the delta scores, rather than walk through the newly
LT> created files in order). That would then potentially allow for a more
LT> relaxed definitions of "rename", without any possibility of losing sight
LT> of a better rename due to finding a bad one first.

I thought about that but I opted for simplicity of first picking
the exact matches.  I was unsure about the usefulness of relaxed
matching to begin with.  But I now realize that to see how useful
that would be we need to experiment it and a matrix approach
would become necessity for that.

LT> So my only nit so far is that you declare "patch_delta()",
LT> even though you don't actually have it. Just remove it.

I'd rather keep that file as is for later merge with Nico.  It
came from the delitification patch and I tried to keep the new
files as they are.


^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Daniel Barkalow @ 2005-05-19  3:19 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505171802570.30848-100000@iabervon.org>

On Tue, 17 May 2005, Daniel Barkalow wrote:

> I think I'll get to implementing it Wednesday night. I might be able to
> get the first step done tonight (my previous patch, except with the
> transfer applying to arbitrary files).

Upon further consideration, I think there are three things that pull
implementation needs to handle:

 1) fetching object files by hash, validating them, and writing them to
    the local objects directory
 2) fetching reference files by name, and making them available to the
    local program without writing them to disk at all.
 3) fetching other files by name and writing them to either the
    corresponding filename or a provided replacement.

I had thought that (2) could be done as a special case of (3), but I think
that it has to be separate, because (2) just returns the value, while
(3) can't just return the contents, but has to write it somewhere, since
it isn't constrained to be exactly 20 bytes.

So I think I'd like to do essentially the original series, slightly
rearranged and with a few edits, and then add (3) afterwards; this should
be easy once the rpush/rpull changes to make the protocol extensible are
in place.

I'll also do additional (independant) patches to provide an expected
starting point and lock things appropriately.

	-Daniel
*This .sig left intentionally blank*


^ permalink raw reply

* [PATCH] Make rsh protocol extensible
From: Daniel Barkalow @ 2005-05-19  5:11 UTC (permalink / raw)
  To: Petr Baudis; +Cc: git

This changes the rsh protocol to allow reporting failure in getting an
object without breaking the connection, and to allow other types of
request than for objects to be made. It is a preliminary to any more
extensive pull operation.

Signed-off-by: Daniel Barkalow <barkalow@iabervon.org>
Index: rpull.c
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/rpull.c  (mode:100644 sha1:b48e63157c66c160b9751603a92831f77106044c)
+++ c70baff05489356575384857c7cc4d4c641c39e3/rpull.c  (mode:100644 sha1:7f5b390d8ed653bfb67363db2c61c576a4d3134d)
@@ -15,7 +15,16 @@
 int fetch(unsigned char *sha1)
 {
 	int ret;
-	write(fd_out, sha1, 20);
+	signed char remote;
+	char type = 'o';
+	if (has_sha1_file(sha1))
+		return 0;
+	write(fd_out, &type, 1);
+        write(fd_out, sha1, 20);
+	if (read(fd_in, &remote, 1) < 1)
+		return -1;
+	if (remote < 0)
+		return remote;
 	ret = write_sha1_from_fd(sha1, fd_in);
 	if (!ret)
 		pull_say("got %s\n", sha1_to_hex(sha1));
Index: rpush.c
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/rpush.c  (mode:100644 sha1:3f2c898c8f5cf5ba62d689a13c646936b8372ee7)
+++ c70baff05489356575384857c7cc4d4c641c39e3/rpush.c  (mode:100644 sha1:07a8461878ad7b1d8cea27e44003b2ed44632834)
@@ -3,46 +3,68 @@
 #include <sys/socket.h>
 #include <errno.h>
 
-void service(int fd_in, int fd_out) {
+int serve_object(int fd_in, int fd_out) {
 	ssize_t size;
-	int posn;
+	int posn = 0;
 	char unsigned sha1[20];
 	unsigned long objsize;
 	void *buf;
+	signed char remote;
 	do {
-		posn = 0;
-		do {
-			size = read(fd_in, sha1 + posn, 20 - posn);
-			if (size < 0) {
-				perror("rpush: read ");
-				return;
+		size = read(fd_in, sha1 + posn, 20 - posn);
+		if (size < 0) {
+			perror("rpush: read ");
+			return -1;
+		}
+		if (!size)
+			return -1;
+		posn += size;
+	} while (posn < 20);
+	
+	/* fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1)); */
+	remote = 0;
+	
+	buf = map_sha1_file(sha1, &objsize);
+	
+	if (!buf) {
+		fprintf(stderr, "rpush: could not find %s\n", 
+			sha1_to_hex(sha1));
+		remote = -1;
+	}
+	
+	write(fd_out, &remote, 1);
+	
+	if (remote < 0)
+		return 0;
+	
+	posn = 0;
+	do {
+		size = write(fd_out, buf + posn, objsize - posn);
+		if (size <= 0) {
+			if (!size) {
+				fprintf(stderr, "rpush: write closed");
+			} else {
+				perror("rpush: write ");
 			}
-			if (!size)
-				return;
-			posn += size;
-		} while (posn < 20);
-
-		/* fprintf(stderr, "Serving %s\n", sha1_to_hex(sha1)); */
+			return -1;
+		}
+		posn += size;
+	} while (posn < objsize);
+	return 0;
+}
 
-		buf = map_sha1_file(sha1, &objsize);
-		if (!buf) {
-			fprintf(stderr, "rpush: could not find %s\n", 
-				sha1_to_hex(sha1));
+void service(int fd_in, int fd_out) {
+	char type;
+	int retval;
+	do {
+		retval = read(fd_in, &type, 1);
+		if (retval < 1) {
+			if (retval < 0)
+				perror("rpush: read ");
 			return;
 		}
-		posn = 0;
-		do {
-			size = write(fd_out, buf + posn, objsize - posn);
-			if (size <= 0) {
-				if (!size) {
-					fprintf(stderr, "rpush: write closed");
-				} else {
-					perror("rpush: write ");
-				}
-				return;
-			}
-			posn += size;
-		} while (posn < objsize);
+		if (type == 'o' && serve_object(fd_in, fd_out))
+			return;
 	} while (1);
 }
 
@@ -53,6 +75,8 @@
         char *url;
 	int fd_in, fd_out;
 	while (arg < argc && argv[arg][0] == '-') {
+		if (argv[arg][1] == 'w')
+			arg++;
                 arg++;
         }
         if (argc < arg + 2) {
Index: rsh.c
===================================================================
--- 75b95bec390d6728b9b1b4572056af8cee34ea7d/rsh.c  (mode:100644 sha1:5d1cb9d578a8e679fc190a9d7d2c842ad811223f)
+++ c70baff05489356575384857c7cc4d4c641c39e3/rsh.c  (mode:100644 sha1:71afc1aa5c9fb3cfe8d49e73471c30e92df9e327)
@@ -36,8 +36,8 @@
 	*(path++) = '\0';
 	/* ssh <host> 'cd /<path>; stdio-pull <arg...> <commit-id>' */
 	snprintf(command, COMMAND_SIZE, 
-		 "cd /%s; %s=objects %s",
-		 path, DB_ENVIRONMENT, remote_prog);
+		 "GIT_DIR='%s' %s",
+		 path, remote_prog);
 	posn = command + strlen(command);
 	for (i = 0; i < rmt_argc; i++) {
 		*(posn++) = ' ';


^ permalink raw reply

* [PATCH] fix strbuf take #2
From: Junio C Hamano @ 2005-05-19  6:34 UTC (permalink / raw)
  To: torvalds; +Cc: git

I just remembered why I placed that bogus "sb->len ==0 implies
sb->eof" condition there.  We need at least something like this
to catch the normal EOF (that is, line termination immediately
followed by EOF) case.  "if (feof(fp))" fires when we have
already read the eof, not when we are about read it.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
git-diff-cache -r -z 0 | ./git-diff-helper -z
diff --git a/strbuf.c b/strbuf.c
--- a/strbuf.c
+++ b/strbuf.c
@@ -37,6 +37,8 @@ void read_line(struct strbuf *sb, FILE *
 			break;
 		strbuf_add(sb, ch);
 	}
+	if (ch == EOF && sb->len == 0)
+		sb->eof = 1;
 	strbuf_end(sb);
 }
 



^ permalink raw reply

* Re: [PATCH 0/4] Pulling refs files
From: Petr Baudis @ 2005-05-19  6:52 UTC (permalink / raw)
  To: Daniel Barkalow; +Cc: git, Linus Torvalds
In-Reply-To: <Pine.LNX.4.21.0505182259060.30848-100000@iabervon.org>

Dear diary, on Thu, May 19, 2005 at 05:19:01AM CEST, I got a letter
where Daniel Barkalow <barkalow@iabervon.org> told me that...
>  2) fetching reference files by name, and making them available to the
>     local program without writing them to disk at all.
>  3) fetching other files by name and writing them to either the
>     corresponding filename or a provided replacement.
> 
> I had thought that (2) could be done as a special case of (3), but I think
> that it has to be separate, because (2) just returns the value, while
> (3) can't just return the contents, but has to write it somewhere, since
> it isn't constrained to be exactly 20 bytes.

Huh. How would (2) be useful and why can't you just still write it e.g.
to some user-supplied temporary file? I think that'd be still actually
much less trouble for the scripts to handle.

-- 
				Petr "Pasky" Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor

^ permalink raw reply

* [PATCH] A test case addition for strbuf regression
From: Junio C Hamano @ 2005-05-19  6:55 UTC (permalink / raw)
  To: torvalds; +Cc: git

This test would have caught the strbuf eof condition gotcha,
hopefully fixed with my previous patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---
# - HEAD: fix strbuf take #2
# + (working tree)
diff --git a/strbuf.c b/strbuf.c
diff --git a/t/t4000-diff-format.sh b/t/t4000-diff-format.sh
--- a/t/t4000-diff-format.sh
+++ b/t/t4000-diff-format.sh
@@ -50,4 +50,13 @@ test_expect_success \
     'validate git-diff-files -p output.' \
     'cmp -s current expected'
 
+test_expect_success \
+    'build same diff using git-diff-helper.' \
+    'git-diff-files -z | git-diff-helper -z >current'
+
+
+test_expect_success \
+    'validate git-diff-helper output.' \
+    'cmp -s current expected'
+
 test_done


^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Thomas Glanzmann @ 2005-05-19  7:40 UTC (permalink / raw)
  To: git
In-Reply-To: <20050518232408.GA18281@pasky.ji.cz>

Hello,

> > <http://members.cox.net/junkio/per-file-commit.txt> ;-).

> I think the workflow that involves per-file commit is
> fundamentally broken at two levels.

I disagree here. We at FAUmachine often have FLAGS to turn on specific
features or debugging output by tweaking a headerfile. However we commit
often and work in small steps (at the moment using CVS because every
developer has write access). And we definitely don't want to tweak the
header file every time we commit, so I often do a:

cvs diff > diff
vim diff
lsdiff diff
cvs commit <interesting files here>

	Thomas

^ permalink raw reply

* [PATCH] README: make help example more consistent
From: Martin Atukunda @ 2005-05-19  8:12 UTC (permalink / raw)
  To: git

in the README, the help section mentions how to get help on the command 
cg-update, yet in subsquent examples uses the command cg-pull to demostrate. 
this patch changes the reference to cg-update to cg-pull instead to be more 
consistent.

Signed-off-by: Martin Atukunda <matlads@ds.co.ug>


Index: README
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/README  (mode:100644)
+++ uncommitted/README  (mode:100644)
@@ -223,7 +223,7 @@
        Getting Help

 Cogito commands come with their own helpful documentation. To get help on
-cg-update, for example, give this command:
+cg-pull, for example, give this command:

        $ cg-pull --help

-- 
- Martin -

^ permalink raw reply

* Re: [PATCH 1/2] Introduce git-run-with-user-path helper program.
From: Junio C Hamano @ 2005-05-19  8:23 UTC (permalink / raw)
  To: git
In-Reply-To: <20050519074007.GI4738@cip.informatik.uni-erlangen.de>

>>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:

TG> Hello,
>> > <http://members.cox.net/junkio/per-file-commit.txt> ;-).

>> I think the workflow that involves per-file commit is
>> fundamentally broken at two levels.

TG> I disagree here. We at FAUmachine often have FLAGS to turn on specific
TG> features or debugging output by tweaking a headerfile.

So what?  I would understand that in your workflow, the nature
of that never-committed headerfile (the fact it only has
debugging tweaks and contains nothing substantially risky)
practically minimizes the risk to the level everybody in the
group feels acceptable.

That does not, however, change what I stated in the document:
what you have in the repository is something that never existed
in a work tree as a consistent whole and tested.

You are only saying is that it does not practically matter in
your workflow, only because what is floating (not checked in)
are things you feel safe to drift.  I would not dare say that is
a wrong way to work.

However, I feel fairly strong about this after being burned many
times by careless coleagues who forgot to check in either newly
created files or locally modified files and finding problems
only after customer installation happened.


^ permalink raw reply

* Re: [PATCH cogito] "cg-whatsnew" command
From: Catalin Marinas @ 2005-05-19  8:24 UTC (permalink / raw)
  To: Petr Baudis; +Cc: Matthias Urlichs, git
In-Reply-To: <20050518223034.GH10358@pasky.ji.cz>

[-- Attachment #1: Type: text/plain, Size: 491 bytes --]

Petr Baudis <pasky@ucw.cz> wrote:
> Unfortunately I can't comment on it well when it's not either in the
> body or as text/plain attachment.

It was attached as text/x-patch.

> I think the -m usage doesn't make much sense now. What about dropping
> branch1 and instead using what the user passed as the -r argument?

See attached (text/plain this time).

It is also possible not to give any argument to -m and use the -r ones
entirely (not implemented in the attached patch).

-- 
Catalin


[-- Attachment #2: patch-m-option --]
[-- Type: text/plain, Size: 3814 bytes --]

"-m" option added to cg-diff, cg-log and cg-mkpatch

This option takes a branch name as an optional parameter and shows the
changes on this branch not yet merged to HEAD or to the first argument
passed to "-r".

Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

---
Index: cg-diff
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cg-diff  (mode:100755)
+++ uncommitted/cg-diff  (mode:100755)
@@ -14,6 +14,9 @@
 # -p instead of one ID denotes a parent commit to the specified ID
 # (which must not be a tree, obviously).
 #
+# -m [branch] shows the changes in branch (defaulting to origin)
+# not yet merged to HEAD (or the first argument passed to -r)
+#
 # Outputs a diff converting the first tree to the second one.
 
 . ${COGITO_LIB}cg-Xlib
@@ -49,6 +52,18 @@
 	id1=$(parent-id "$id2" | head -n 1)
 fi
 
+if [ "$1" = "-m" ]; then
+	[ "$id1" = " " ] && id1=HEAD
+	id2=origin
+	shift
+	if [ "$1" ]; then
+		id2=$1
+		shift
+	fi
+	id1=$(git-merge-base "$id1" "$id2")
+	[ "$id1" ] || die "Unable to determine the merge base"
+fi
+
 
 filter=
 if [ "$*" ]; then
Index: cg-help
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cg-help  (mode:100755)
+++ uncommitted/cg-help  (mode:100755)
@@ -26,14 +26,14 @@
 	cg-cancel
 	cg-clone	[-s] SOURCE_LOC [DESTDIR]
 	cg-commit	[-m"Commit message"]... [-e | -E] [FILE]... < log message
-	cg-diff		[-p] [-r FROM_ID[:TO_ID]] [FILE]...
+	cg-diff		[-p] [-r FROM_ID[:TO_ID]] [-m [BNAME]] [FILE]...
 	cg-export	DEST [TREE_ID]
 	cg-help		[COMMAND]
 	cg-init
-	cg-log		[-c] [-f] [-r FROM_ID[:TO_ID]] [FILE]...
+	cg-log		[-c] [-f] [-r FROM_ID[:TO_ID]] [-m [BNAME]] [FILE]...
 	cg-ls		[TREE_ID]
 	cg-merge	[-c] [-b BASE_ID] FROM_ID
-	cg-mkpatch	[-s] [-r FROM_ID[:TO_ID]]
+	cg-mkpatch	[-s] [-r FROM_ID[:TO_ID]] [-m [BNAME]]
 	cg-patch			< patch on stdin
 	cg-pull		[BNAME]
 	cg-restore	[FILE]...
Index: cg-log
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cg-log  (mode:100755)
+++ uncommitted/cg-log  (mode:100755)
@@ -22,6 +22,9 @@
 # (HEAD by default), or id1:id2 representing an (id1;id2] range
 # of commits to show.
 #
+# -m [branch] shows the changes in branch (defaulting to origin)
+# not yet merged to HEAD (or the first argument passed to -r)
+#
 # The rest of arguments are took as filenames; cg-log then displays
 # only changes in those files.
 
@@ -110,6 +113,18 @@
 	shift
 fi
 
+if [ "$1" = "-m" ]; then
+	[ "$log_start" ] || log_start=HEAD
+	log_end=origin
+	shift
+	if [ "$1" ]; then
+		log_end=$1
+		shift
+	fi
+	log_start=$(git-merge-base "$log_start" "$log_end")
+	[ "$log_start" ] || die "Unable to determine the merge base"
+fi
+
 if [ "$log_end" ]; then
 	id1="$(commit-id $log_start)" || exit 1
 	id2="$(commit-id $log_end)" || exit 1
Index: cg-mkpatch
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cg-mkpatch  (mode:100755)
+++ uncommitted/cg-mkpatch  (mode:100755)
@@ -9,6 +9,9 @@
 #
 # Takes an -r followed with ID defaulting to HEAD, or id1:id2, forming
 # a range (id1;id2]. (Use "id1:" to take just everything from id1 to HEAD.)
+#
+# -m [branch] shows the changes in branch (defaulting to origin)
+# not yet merged to HEAD (or the first argument passed to -r)
 
 . ${COGITO_LIB}cg-Xlib
 
@@ -80,6 +83,18 @@
 	shift
 fi
 
+if [ "$1" = "-m" ]; then
+	[ "$log_start" ] || log_start=HEAD
+	log_end=origin
+	shift
+	if [ "$1" ]; then
+		log_end=$1
+		shift
+	fi
+	log_start=$(git-merge-base "$log_start" "$log_end")
+	[ "$log_start" ] || die "Unable to determine the merge base"
+fi
+
 if [ "$log_end" ]; then
 	id1=$(commit-id $log_start) || exit 1
 	id2=$(commit-id $log_end) || exit 1

^ permalink raw reply

* [PATCH] Deltification library work by Nicolas Pitre.
From: Junio C Hamano @ 2005-05-19 10:28 UTC (permalink / raw)
  To: torvalds, Nicolas Pitre; +Cc: git

This is stolen from the deltification patch by Nicolas Pitre.
Although the deltification patch has not been submitted for the
inclusion, the library part here is useful for the rename
detection logic in the diff work I have been doing.  The next
patch will depend on this, so if Nico is OK with this one,
please consider inclusion of this patch.  This does not include
the main part of Nico's work, which is to delitify repository
objects.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Makefile     |    2 
delta.h      |    6 +
diff-delta.c |  330 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 337 insertions(+), 1 deletion(-)
delta.h (. --> 100644)
diff-delta.c (. --> 100644)

diff --git a/Makefile b/Makefile
--- a/Makefile
+++ b/Makefile
@@ -36,7 +36,7 @@
 	$(INSTALL) $(PROG) $(SCRIPTS) $(dest)$(bin)
 
 LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o \
-	 tag.o date.o
+	 tag.o date.o diff-delta.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h blob.h tree.h commit.h tag.h
 
diff --git a/delta.h b/delta.h
new file mode 100644
--- /dev/null
+++ b/delta.h
@@ -0,0 +1,6 @@
+/*
+ * Copyright (c) 2005 Nicolas Pitre <nico@cam.org>
+ */
+extern void *diff_delta(void *from_buf, unsigned long from_size,
+			void *to_buf, unsigned long to_size,
+		        unsigned long *delta_size);
diff --git a/diff-delta.c b/diff-delta.c
new file mode 100644
--- /dev/null
+++ b/diff-delta.c
@@ -0,0 +1,330 @@
+/*
+ * diff-delta.c: generate a delta between two buffers
+ *
+ *  Many parts of this file have been lifted from LibXDiff version 0.10.
+ *  http://www.xmailserver.org/xdiff-lib.html
+ *
+ *  LibXDiff was written by Davide Libenzi <davidel@xmailserver.org>
+ *  Copyright (C) 2003	Davide Libenzi
+ *
+ *  Many mods for GIT usage by Nicolas Pitre <nico@cam.org>, (C) 2005.
+ *
+ *  This file is free software; you can redistribute it and/or
+ *  modify it under the terms of the GNU Lesser General Public
+ *  License as published by the Free Software Foundation; either
+ *  version 2.1 of the License, or (at your option) any later version.
+ */
+
+#include <stdlib.h>
+#include "delta.h"
+
+
+/* block size: min = 16, max = 64k, power of 2 */
+#define BLK_SIZE 16
+
+#define MIN(a, b) ((a) < (b) ? (a) : (b))
+
+#define GR_PRIME 0x9e370001
+#define HASH(v, b) (((unsigned int)(v) * GR_PRIME) >> (32 - (b)))
+	
+/* largest prime smaller than 65536 */
+#define BASE 65521
+
+/* NMAX is the largest n such that 255n(n+1)/2 + (n+1)(BASE-1) <= 2^32-1 */
+#define NMAX 5552
+
+#define DO1(buf, i)  { s1 += buf[i]; s2 += s1; }
+#define DO2(buf, i)  DO1(buf, i); DO1(buf, i + 1);
+#define DO4(buf, i)  DO2(buf, i); DO2(buf, i + 2);
+#define DO8(buf, i)  DO4(buf, i); DO4(buf, i + 4);
+#define DO16(buf)    DO8(buf, 0); DO8(buf, 8);
+
+static unsigned int adler32(unsigned int adler, const unsigned char *buf, int len)
+{
+	int k;
+	unsigned int s1 = adler & 0xffff;
+	unsigned int s2 = adler >> 16;
+
+	while (len > 0) {
+		k = MIN(len, NMAX);
+		len -= k;
+		while (k >= 16) {
+			DO16(buf);
+			buf += 16;
+			k -= 16;
+		}
+		if (k != 0)
+			do {
+				s1 += *buf++;
+				s2 += s1;
+			} while (--k);
+		s1 %= BASE;
+		s2 %= BASE;
+	}
+
+	return (s2 << 16) | s1;
+}
+
+static unsigned int hashbits(unsigned int size)
+{
+	unsigned int val = 1, bits = 0;
+	while (val < size && bits < 32) {
+		val <<= 1;
+	       	bits++;
+	}
+	return bits ? bits: 1;
+}
+
+typedef struct s_chanode {
+	struct s_chanode *next;
+	int icurr;
+} chanode_t;
+
+typedef struct s_chastore {
+	chanode_t *head, *tail;
+	int isize, nsize;
+	chanode_t *ancur;
+	chanode_t *sncur;
+	int scurr;
+} chastore_t;
+
+static void cha_init(chastore_t *cha, int isize, int icount)
+{
+	cha->head = cha->tail = NULL;
+	cha->isize = isize;
+	cha->nsize = icount * isize;
+	cha->ancur = cha->sncur = NULL;
+	cha->scurr = 0;
+}
+
+static void *cha_alloc(chastore_t *cha)
+{
+	chanode_t *ancur;
+	void *data;
+
+	ancur = cha->ancur;
+	if (!ancur || ancur->icurr == cha->nsize) {
+		ancur = malloc(sizeof(chanode_t) + cha->nsize);
+		if (!ancur)
+			return NULL;
+		ancur->icurr = 0;
+		ancur->next = NULL;
+		if (cha->tail)
+			cha->tail->next = ancur;
+		if (!cha->head)
+			cha->head = ancur;
+		cha->tail = ancur;
+		cha->ancur = ancur;
+	}
+
+	data = (void *)ancur + sizeof(chanode_t) + ancur->icurr;
+	ancur->icurr += cha->isize;
+	return data;
+}
+
+static void cha_free(chastore_t *cha)
+{
+	chanode_t *cur = cha->head;
+	while (cur) {
+		chanode_t *tmp = cur;
+		cur = cur->next;
+		free(tmp);
+	}
+}
+
+typedef struct s_bdrecord {
+	struct s_bdrecord *next;
+	unsigned int fp;
+	const unsigned char *ptr;
+} bdrecord_t;
+
+typedef struct s_bdfile {
+	const unsigned char *data, *top;
+	chastore_t cha;
+	unsigned int fphbits;
+	bdrecord_t **fphash;
+} bdfile_t;
+
+static int delta_prepare(const unsigned char *buf, int bufsize, bdfile_t *bdf)
+{
+	unsigned int fphbits;
+	int i, hsize;
+	const unsigned char *base, *data, *top;
+	bdrecord_t *brec;
+	bdrecord_t **fphash;
+
+	fphbits = hashbits(bufsize / BLK_SIZE + 1);
+	hsize = 1 << fphbits;
+	fphash = malloc(hsize * sizeof(bdrecord_t *));
+	if (!fphash)
+		return -1;
+	for (i = 0; i < hsize; i++)
+		fphash[i] = NULL;
+	cha_init(&bdf->cha, sizeof(bdrecord_t), hsize / 4 + 1);
+
+	bdf->data = data = base = buf;
+	bdf->top = top = buf + bufsize;
+	data += (bufsize / BLK_SIZE) * BLK_SIZE;
+	if (data == top)
+		data -= BLK_SIZE;
+
+	for ( ; data >= base; data -= BLK_SIZE) {
+		brec = cha_alloc(&bdf->cha);
+		if (!brec) {
+			cha_free(&bdf->cha);
+			free(fphash);
+			return -1;
+		}
+		brec->fp = adler32(0, data, MIN(BLK_SIZE, top - data));
+		brec->ptr = data;
+		i = HASH(brec->fp, fphbits);
+		brec->next = fphash[i];
+		fphash[i] = brec;
+	}
+
+	bdf->fphbits = fphbits;
+	bdf->fphash = fphash;
+
+	return 0;
+}
+
+static void delta_cleanup(bdfile_t *bdf)
+{
+	free(bdf->fphash);
+	cha_free(&bdf->cha);
+}
+
+#define COPYOP_SIZE(o, s) \
+    (!!(o & 0xff) + !!(o & 0xff00) + !!(o & 0xff0000) + !!(o & 0xff000000) + \
+     !!(s & 0xff) + !!(s & 0xff00) + 1)
+
+void *diff_delta(void *from_buf, unsigned long from_size,
+		 void *to_buf, unsigned long to_size,
+		 unsigned long *delta_size)
+{
+	int i, outpos, outsize, inscnt, csize, msize, moff;
+	unsigned int fp;
+	const unsigned char *data, *top, *ptr1, *ptr2;
+	unsigned char *out, *orig;
+	bdrecord_t *brec;
+	bdfile_t bdf;
+
+	if (!from_size || !to_size || delta_prepare(from_buf, from_size, &bdf))
+		return NULL;
+	
+	outpos = 0;
+	outsize = 8192;
+	out = malloc(outsize);
+	if (!out) {
+		delta_cleanup(&bdf);
+		return NULL;
+	}
+
+	data = to_buf;
+	top = to_buf + to_size;
+
+	/* store reference buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (from_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = from_size;
+		}
+		i++;
+		from_size >>= 8;
+	} while (from_size);
+
+	/* store target buffer size */
+	orig = out + outpos++;
+	*orig = i = 0;
+	do {
+		if (to_size & 0xff) {
+			*orig |= (1 << i);
+			out[outpos++] = to_size;
+		}
+		i++;
+		to_size >>= 8;
+	} while (to_size);
+
+	inscnt = 0;
+	moff = 0;
+	while (data < top) {
+		msize = 0;
+		fp = adler32(0, data, MIN(top - data, BLK_SIZE));
+		i = HASH(fp, bdf.fphbits);
+		for (brec = bdf.fphash[i]; brec; brec = brec->next) {
+			if (brec->fp == fp) {
+				csize = bdf.top - brec->ptr;
+				if (csize > top - data)
+					csize = top - data;
+				for (ptr1 = brec->ptr, ptr2 = data; 
+				     csize && *ptr1 == *ptr2;
+				     csize--, ptr1++, ptr2++);
+
+				csize = ptr1 - brec->ptr;
+				if (csize > msize) {
+					moff = brec->ptr - bdf.data;
+					msize = csize;
+					if (msize >= 0x10000) {
+						msize = 0x10000;
+						break;
+					}
+				}
+			}
+		}
+
+		if (!msize || msize < COPYOP_SIZE(moff, msize)) {
+			if (!inscnt)
+				outpos++;
+			out[outpos++] = *data++;
+			inscnt++;
+			if (inscnt == 0x7f) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+		} else {
+			if (inscnt) {
+				out[outpos - inscnt - 1] = inscnt;
+				inscnt = 0;
+			}
+
+			data += msize;
+			orig = out + outpos++;
+			i = 0x80;
+
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x01; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x02; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x04; }
+			moff >>= 8;
+			if (moff & 0xff) { out[outpos++] = moff; i |= 0x08; }
+
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x10; }
+			msize >>= 8;
+			if (msize & 0xff) { out[outpos++] = msize; i |= 0x20; }
+
+			*orig = i;
+		}
+
+		/* next time around the largest possible output is 1 + 4 + 3 */
+		if (outpos > outsize - 8) {
+			void *tmp = out;
+			outsize = outsize * 3 / 2;
+			out = realloc(out, outsize);
+			if (!out) {
+				free(tmp);
+				delta_cleanup(&bdf);
+				return NULL;
+			}
+		}
+	}
+
+	if (inscnt)
+		out[outpos - inscnt - 1] = inscnt;
+
+	delta_cleanup(&bdf);
+	*delta_size = outpos;
+	return out;
+}
------------------------------------------------


^ permalink raw reply

* [PATCH] Detect renames in diff family.
From: Junio C Hamano @ 2005-05-19 10:32 UTC (permalink / raw)
  To: torvalds; +Cc: git

This patch rips out the rename detection engine from diff-helper
and moves it to the diff core, and updates the internal calling
convention used by diff-tree family into the diff core.  In
order to give the same option name to diff-tree family as well
as to diff-helper, I've changed the earlier diff-helper '-r'
option to '-M' (stands for Move; sorry but the natural
abbreviation 'r' for 'rename' is already taken for 'recursive').

Although I did a fair amount of test with the git-diff-tree with
existing rename commits in the core GIT repository, this should
still be considered beta (preview) release.  This patch depends
on the diff-delta patch I stole from Nico.  Please give it a
good beating.

Special request for Linus is to check if I did not screw up the
various calls into the diff core from diff-tree.  Essentially
the idea is to start one patchset session with diff_setup() and
close it with diff_flush() before you start another patchset
session.

This implements almost everything I wanted to see in this series
of patch, except a few minor cleanups in the calling convention
into diff core, but that will be a separate cleanup patch.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

Documentation/git-diff-cache.txt  |    5 
Documentation/git-diff-files.txt  |    5 
Documentation/git-diff-helper.txt |    4 
Documentation/git-diff-tree.txt   |    5 
diff-cache.c                      |   16 +
diff-files.c                      |   11 -
diff-helper.c                     |  209 ++-----------------
diff-tree.c                       |   25 +-
diff.c                            |  410 +++++++++++++++++++++++++++++++++++---
diff.h                            |   19 -
git-apply-patch-script            |    3 
t/t4001-diff-rename.sh            |   60 +++++
12 files changed, 533 insertions(+), 239 deletions(-)
t/t4001-diff-rename.sh (. --> 100644)

diff --git a/Documentation/git-diff-cache.txt b/Documentation/git-diff-cache.txt
--- a/Documentation/git-diff-cache.txt
+++ b/Documentation/git-diff-cache.txt
@@ -9,7 +9,7 @@
 
 SYNOPSIS
 --------
-'git-diff-cache' [-p] [-r] [-z] [-m] [--cached] <tree-ish>
+'git-diff-cache' [-p] [-r] [-z] [-m] [-M] [--cached] <tree-ish>
 
 DESCRIPTION
 -----------
@@ -33,6 +33,9 @@
 -z::
 	\0 line termination on output
 
+-M::
+	Detect renames; implies -p.
+
 --cached::
 	do not consider the on-disk file at all
 
diff --git a/Documentation/git-diff-files.txt b/Documentation/git-diff-files.txt
--- a/Documentation/git-diff-files.txt
+++ b/Documentation/git-diff-files.txt
@@ -9,7 +9,7 @@
 
 SYNOPSIS
 --------
-'git-diff-files' [-p] [-q] [-r] [-z] [<pattern>...]
+'git-diff-files' [-p] [-q] [-r] [-z] [-M] [<pattern>...]
 
 DESCRIPTION
 -----------
@@ -26,6 +26,9 @@
 -q::
 	Remain silent even on nonexisting files
 
+-M::
+	Detect renames; implies -p.
+
 -r::
 	This flag does not mean anything.  It is there only to match
 	git-diff-tree.  Unlike git-diff-tree, git-diff-files always looks
diff --git a/Documentation/git-diff-helper.txt b/Documentation/git-diff-helper.txt
--- a/Documentation/git-diff-helper.txt
+++ b/Documentation/git-diff-helper.txt
@@ -9,7 +9,7 @@
 
 SYNOPSIS
 --------
-'git-diff-helper' [-z] [-R] [-r]
+'git-diff-helper' [-z] [-R] [-M]
 
 DESCRIPTION
 -----------
@@ -31,7 +31,7 @@
 	would show a diff to bring the working file back to what
 	is in the <tree>.
 
--r::
+-M::
 	Detect renames.
 
 
diff --git a/Documentation/git-diff-tree.txt b/Documentation/git-diff-tree.txt
--- a/Documentation/git-diff-tree.txt
+++ b/Documentation/git-diff-tree.txt
@@ -9,7 +9,7 @@
 
 SYNOPSIS
 --------
-'git-diff-tree' [-p] [-r] [-z] [--stdin] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
+'git-diff-tree' [-p] [-r] [-z] [--stdin] [-M] [-m] [-s] [-v] <tree-ish> <tree-ish> [<pattern>]\*
 
 DESCRIPTION
 -----------
@@ -33,6 +33,9 @@
 	generate patch (see section on generating patches).  For
 	git-diff-tree, this flag implies '-r' as well.
 
+-M::
+	Detect renames; implies -p, in turn implying also '-r'.
+
 -r::
 	recurse
 
diff --git a/diff-cache.c b/diff-cache.c
--- a/diff-cache.c
+++ b/diff-cache.c
@@ -5,6 +5,7 @@
 static int generate_patch = 0;
 static int match_nonexisting = 0;
 static int line_termination = '\n';
+static int detect_rename = 0;
 
 /* A file entry went away or appeared */
 static void show_file(const char *prefix, struct cache_entry *ce, unsigned char *sha1, unsigned int mode)
@@ -165,13 +166,14 @@
 }
 
 static char *diff_cache_usage =
-"git-diff-cache [-p] [-r] [-z] [-m] [--cached] <tree sha1>";
+"git-diff-cache [-p] [-r] [-z] [-m] [-M] [--cached] <tree-ish>";
 
 int main(int argc, char **argv)
 {
 	unsigned char tree_sha1[20];
 	void *tree;
 	unsigned long size;
+	int ret;
 
 	read_cache();
 	while (argc > 2) {
@@ -186,6 +188,10 @@
 			generate_patch = 1;
 			continue;
 		}
+		if (!strcmp(arg, "-M")) {
+			generate_patch = detect_rename = 1;
+			continue;
+		}
 		if (!strcmp(arg, "-z")) {
 			line_termination = '\0';
 			continue;
@@ -204,6 +210,9 @@
 	if (argc != 2 || get_sha1(argv[1], tree_sha1))
 		usage(diff_cache_usage);
 
+	if (generate_patch)
+		diff_setup(detect_rename, 0, 0, 0, 0);
+
 	mark_merge_entries();
 
 	tree = read_object_with_reference(tree_sha1, "tree", &size, 0);
@@ -212,5 +221,8 @@
 	if (read_tree(tree, size, 1))
 		die("unable to read tree object %s", argv[1]);
 
-	return diff_cache(active_cache, active_nr);
+	ret = diff_cache(active_cache, active_nr);
+	if (generate_patch)
+		diff_flush();
+	return ret;
 }
diff --git a/diff-files.c b/diff-files.c
--- a/diff-files.c
+++ b/diff-files.c
@@ -7,10 +7,11 @@
 #include "diff.h"
 
 static const char *diff_files_usage =
-"diff-files [-p] [-q] [-r] [-z] [paths...]";
+"diff-files [-p] [-q] [-r] [-z] [-M] [paths...]";
 
 static int generate_patch = 0;
 static int line_termination = '\n';
+static int detect_rename = 0;
 static int silent = 0;
 
 static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt)
@@ -79,6 +80,9 @@
 			; /* no-op */
 		else if (!strcmp(argv[1], "-z"))
 			line_termination = 0;
+		else if (!strcmp(argv[1], "-M")) {
+			detect_rename = generate_patch = 1;
+		}
 		else
 			usage(diff_files_usage);
 		argv++; argc--;
@@ -92,6 +96,9 @@
 		exit(1);
 	}
 
+	if (generate_patch)
+		diff_setup(detect_rename, 0, 0, 0, 0);		
+
 	for (i = 0; i < entries; i++) {
 		struct stat st;
 		unsigned int oldmode, mode;
@@ -132,5 +139,7 @@
 		show_modified(oldmode, mode, ce->sha1, null_sha1,
 			      ce->name);
 	}
+	if (generate_patch)
+		diff_flush();
 	return 0;
 }
diff --git a/diff-helper.c b/diff-helper.c
--- a/diff-helper.c
+++ b/diff-helper.c
@@ -6,160 +6,23 @@
 #include "strbuf.h"
 #include "diff.h"
 
-static int matches_pathspec(const char *name, const char **spec, int cnt)
-{
-	int i;
-	int namelen = strlen(name);
-	for (i = 0; i < cnt; i++) {
-		int speclen = strlen(spec[i]);
-		if (! strncmp(spec[i], name, speclen) &&
-		    speclen <= namelen &&
-		    (name[speclen] == 0 ||
-		     name[speclen] == '/'))
-			return 1;
-	}
-	return 0;
-}
-
 static int detect_rename = 0;
 
-/*
- * We do not detect circular renames.  Just hold created and deleted
- * entries and later attempt to match them up.  If they do not match,
- * then spit them out as deletes or creates as original.
- */
-
-static struct diff_spec_hold {
-	struct diff_spec_hold *next;
-	struct diff_spec_hold *matched;
-	struct diff_spec old, new;
-	char path[1];
-} *createdfile, *deletedfile;
-
-static void hold_spec(const char *path,
-		      struct diff_spec *old, struct diff_spec *new)
-{
-	struct diff_spec_hold **list, *elem;
-	list = (! old->file_valid) ? &createdfile : &deletedfile;
-	elem = xmalloc(sizeof(*elem) + strlen(path));
-	strcpy(elem->path, path);
-	elem->next = *list;
-	*list = elem;
-	elem->old = *old;
-	elem->new = *new;
-	elem->matched = 0;
-}
-
-#define MINIMUM_SCORE 7000
-int estimate_similarity(struct diff_spec *one, struct diff_spec *two)
-{
-	/* Return how similar they are, representing the score as an
-	 * integer between 0 and 10000.
-	 *
-	 * This version is very dumb and detects exact matches only.
-	 * Wnen Nico's delta stuff gets in, I'll use the delta
-	 * algorithm to estimate the similarity score in core.
-	 */
-
-	if (one->sha1_valid && two->sha1_valid &&
-	    !memcmp(one->blob_sha1, two->blob_sha1, 20))
-		return 10000;
-	return 0;
-}
-
-static void flush_renames(const char **spec, int cnt, int reverse)
-{
-	struct diff_spec_hold *rename_src, *rename_dst, *elem;
-	struct diff_spec_hold *leftover = NULL;
-	int score, best_score;
-
-	while (createdfile) {
-		rename_dst = createdfile;
-		createdfile = rename_dst->next;
-		best_score = MINIMUM_SCORE;
-		rename_src = NULL;
-		for (elem = deletedfile;
-		     elem;
-		     elem = elem->next) {
-			if (elem->matched)
-				continue;
-			score = estimate_similarity(&elem->old,
-						    &rename_dst->new);
-			if (best_score < score) {
-				rename_src = elem;
-				best_score = score;
-			}
-		}
-		if (rename_src) {
-			rename_src->matched = rename_dst;
-			rename_dst->matched = rename_src;
-
-			if (!cnt ||
-			    matches_pathspec(rename_src->path, spec, cnt) ||
-			    matches_pathspec(rename_dst->path, spec, cnt)) {
-				if (reverse)
-					run_external_diff(rename_dst->path,
-							  rename_src->path,
-							  &rename_dst->new,
-							  &rename_src->old);
-				else
-					run_external_diff(rename_src->path,
-							  rename_dst->path,
-							  &rename_src->old,
-							  &rename_dst->new);
-			}
-		}
-		else {
-			rename_dst->next = leftover;
-			leftover = rename_dst;
-		}
-	}
-
-	/* unmatched deletes */
-	for (elem = deletedfile; elem; elem = elem->next) {
-		if (elem->matched)
-			continue;
-		if (!cnt ||
-		    matches_pathspec(elem->path, spec, cnt)) {
-			if (reverse)
-				run_external_diff(elem->path, NULL,
-						  &elem->new, &elem->old);
-			else
-				run_external_diff(elem->path, NULL,
-						  &elem->old, &elem->new);
-		}
-	}
-
-	/* unmatched creates */
-	for (elem = leftover; elem; elem = elem->next) {
-		if (!cnt ||
-		    matches_pathspec(elem->path, spec, cnt)) {
-			if (reverse)
-				run_external_diff(elem->path, NULL,
-						  &elem->new, &elem->old);
-			else
-				run_external_diff(elem->path, NULL,
-						  &elem->old, &elem->new);
-		}
-	}
-}
-
-static int parse_oneside_change(const char *cp, struct diff_spec *one,
-				char *path)
+static int parse_oneside_change(const char *cp, int *mode,
+				unsigned char *sha1, char *path)
 {
-	int ch;
+	int ch, m;
 
-	one->file_valid = one->sha1_valid = 1;
-	one->mode = 0;
+	m = 0;
 	while ((ch = *cp) && '0' <= ch && ch <= '7') {
-		one->mode = (one->mode << 3) | (ch - '0');
+		m = (m << 3) | (ch - '0');
 		cp++;
 	}
-
+	*mode = m;
 	if (strncmp(cp, "\tblob\t", 6))
 		return -1;
 	cp += 6;
-	if (get_sha1_hex(cp, one->blob_sha1))
+	if (get_sha1_hex(cp, sha1))
 		return -1;
 	cp += 40;
 	if (*cp++ != '\t')
@@ -168,79 +31,63 @@
 	return 0;
 }
 
-static int parse_diff_raw_output(const char *buf,
-				 const char **spec, int cnt, int reverse)
+static int parse_diff_raw_output(const char *buf)
 {
-	struct diff_spec old, new;
 	char path[PATH_MAX];
+	unsigned char old_sha1[20], new_sha1[20];
 	const char *cp = buf;
-	int ch;
+	int ch, old_mode, new_mode;
 
 	switch (*cp++) {
 	case 'U':
-		if (!cnt || matches_pathspec(cp + 1, spec, cnt))
-			diff_unmerge(cp + 1);
-		return 0;
+		diff_unmerge(cp + 1);
+		break;
 	case '+':
-		old.file_valid = 0;
-		parse_oneside_change(cp, &new, path);
+		parse_oneside_change(cp, &new_mode, new_sha1, path);
+		diff_addremove('+', new_mode, new_sha1, path, NULL);
 		break;
 	case '-':
-		new.file_valid = 0;
-		parse_oneside_change(cp, &old, path);
+		parse_oneside_change(cp, &old_mode, old_sha1, path);
+		diff_addremove('-', old_mode, old_sha1, path, NULL);
 		break;
 	case '*':
-		old.file_valid = old.sha1_valid =
-			new.file_valid = new.sha1_valid = 1;
-		old.mode = new.mode = 0;
+		old_mode = new_mode = 0;
 		while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
-			old.mode = (old.mode << 3) | (ch - '0');
+			old_mode = (old_mode << 3) | (ch - '0');
 			cp++;
 		}
 		if (strncmp(cp, "->", 2))
 			return -1;
 		cp += 2;
 		while ((ch = *cp) && ('0' <= ch && ch <= '7')) {
-			new.mode = (new.mode << 3) | (ch - '0');
+			new_mode = (new_mode << 3) | (ch - '0');
 			cp++;
 		}
 		if (strncmp(cp, "\tblob\t", 6))
 			return -1;
 		cp += 6;
-		if (get_sha1_hex(cp, old.blob_sha1))
+		if (get_sha1_hex(cp, old_sha1))
 			return -1;
 		cp += 40;
 		if (strncmp(cp, "->", 2))
 			return -1;
 		cp += 2;
-		if (get_sha1_hex(cp, new.blob_sha1))
+		if (get_sha1_hex(cp, new_sha1))
 			return -1;
 		cp += 40;
 		if (*cp++ != '\t')
 			return -1;
 		strcpy(path, cp);
+		diff_change(old_mode, new_mode, old_sha1, new_sha1, path, 0);
 		break;
 	default:
 		return -1;
 	}
-
-	if (detect_rename && old.file_valid != new.file_valid) {
-		/* hold these */
-		hold_spec(path, &old, &new);
-		return 0;
-	}
-
-	if (!cnt || matches_pathspec(path, spec, cnt)) {
-		if (reverse)
-			run_external_diff(path, NULL, &new, &old);
-		else
-			run_external_diff(path, NULL, &old, &new);
-	}
 	return 0;
 }
 
 static const char *diff_helper_usage =
-	"git-diff-helper [-r] [-R] [-z] paths...";
+	"git-diff-helper [-z] [-R] [-M] paths...";
 
 int main(int ac, const char **av) {
 	struct strbuf sb;
@@ -254,7 +101,7 @@
 			reverse = 1;
 		else if (av[1][1] == 'z')
 			line_termination = 0;
-		else if (av[1][1] == 'r')
+		else if (av[1][1] == 'M')
 			detect_rename = 1;
 		else
 			usage(diff_helper_usage);
@@ -262,18 +109,20 @@
 	}
 	/* the remaining parameters are paths patterns */
 
+	diff_setup(detect_rename, 0, reverse, av+1, ac-1);
+
 	while (1) {
 		int status;
 		read_line(&sb, stdin, line_termination);
 		if (sb.eof)
 			break;
-		status = parse_diff_raw_output(sb.buf, av+1, ac-1, reverse);
+		status = parse_diff_raw_output(sb.buf);
 		if (status) {
-			flush_renames(av+1, ac-1, reverse);
+			diff_flush();
 			printf("%s%c", sb.buf, line_termination);
 		}
 	}
 
-	flush_renames(av+1, ac-1, reverse);
+	diff_flush();
 	return 0;
 }
diff --git a/diff-tree.c b/diff-tree.c
--- a/diff-tree.c
+++ b/diff-tree.c
@@ -9,6 +9,7 @@
 static int read_stdin = 0;
 static int line_termination = '\n';
 static int generate_patch = 0;
+static int detect_rename = 0;
 static const char *header = NULL;
 static const char *header_prefix = "";
 
@@ -281,6 +282,18 @@
 	return retval;
 }
 
+static int diff_tree_sha1_top(const unsigned char *old,
+			      const unsigned char *new, const char *base)
+{
+	int ret;
+	if (generate_patch)
+		diff_setup(detect_rename, 0, 0, 0, 0);
+	ret = diff_tree_sha1(old, new, base);
+	if (generate_patch)
+		diff_flush();
+	return ret;
+}
+
 static int get_one_line(const char *msg, unsigned long len)
 {
 	int ret = 0;
@@ -377,7 +390,7 @@
 		if (get_sha1_hex(buf + offset + 7, parent))
 			return -1;
 		header = generate_header(name, sha1_to_hex(parent), buf, size);
-		diff_tree_sha1(parent, commit, "");
+		diff_tree_sha1_top(parent, commit, "");
 		if (!header && verbose_header)
 			header_prefix = "\ndiff-tree ";
 		offset += 48;
@@ -401,14 +414,14 @@
 		line[81] = 0;
 		sprintf(this_header, "%s (from %s)\n", line, line+41);
 		header = this_header;
-		return diff_tree_sha1(parent, commit, "");
+		return diff_tree_sha1_top(parent, commit, "");
 	}
 	line[40] = 0;
 	return diff_tree_commit(commit, line);
 }
 
 static char *diff_tree_usage =
-"diff-tree [-p] [-r] [-z] [--stdin] [-m] [-s] [-v] <tree sha1> <tree sha1>";
+"diff-tree [-p] [-r] [-z] [--stdin] [-M] [-m] [-s] [-v] <tree-ish> <tree-ish>";
 
 int main(int argc, char **argv)
 {
@@ -447,6 +460,10 @@
 			recursive = generate_patch = 1;
 			continue;
 		}
+		if (!strcmp(arg, "-M")) {
+			detect_rename = recursive = generate_patch = 1;
+			continue;
+		}
 		if (!strcmp(arg, "-z")) {
 			line_termination = '\0';
 			continue;
@@ -490,7 +507,7 @@
 		diff_tree_commit(sha1[0], NULL);
 		break;
 	case 2:
-		diff_tree_sha1(sha1[0], sha1[1], "");
+		diff_tree_sha1_top(sha1[0], sha1[1], "");
 		break;
 	}
 
diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -7,8 +7,10 @@
 #include <limits.h>
 #include "cache.h"
 #include "diff.h"
+#include "delta.h"
 
 static const char *diff_opts = "-pu";
+static unsigned char null_sha1[20] = { 0, };
 
 static const char *external_diff(void)
 {
@@ -79,6 +81,18 @@
 	char tmp_path[50];
 } diff_temp[2];
 
+struct diff_spec {
+	unsigned char blob_sha1[20];
+	unsigned short mode;	 /* file mode */
+	unsigned sha1_valid : 1; /* if true, use blob_sha1 and trust mode;
+				  * however with a NULL SHA1, read them
+				  * from the file system.
+				  * if false, use the name and read mode from
+				  * the filesystem.
+				  */
+	unsigned file_valid : 1; /* if false the file does not even exist */
+};
+
 static void builtin_diff(const char *name_a,
 			 const char *name_b,
 			 struct diff_tempfile *temp)
@@ -160,7 +174,7 @@
 	struct cache_entry *ce;
 	struct stat st;
 	int pos, len;
-	
+
 	/* We do not read the cache ourselves here, because the
 	 * benchmark with my previous version that always reads cache
 	 * shows that it makes things worse for diff-tree comparing
@@ -214,9 +228,6 @@
 			      struct diff_tempfile *temp,
 			      struct diff_spec *one)
 {
-	static unsigned char null_sha1[20] = { 0, };
-	int use_work_tree = 0;
-
 	if (!one->file_valid) {
 	not_a_valid_file:
 		/* A '-' entry produces this for file-2, and
@@ -228,12 +239,8 @@
 		return;
 	}
 
-	if (one->sha1_valid &&
-	    (!memcmp(one->blob_sha1, null_sha1, sizeof(null_sha1)) ||
-	     work_tree_matches(name, one->blob_sha1)))
-		use_work_tree = 1;
-
-	if (!one->sha1_valid || use_work_tree) {
+	if (!one->sha1_valid ||
+	    work_tree_matches(name, one->blob_sha1)) {
 		struct stat st;
 		temp->name = name;
 		if (lstat(temp->name, &st) < 0) {
@@ -295,22 +302,59 @@
 	remove_tempfile();
 }
 
+static int detect_rename;
+static int reverse_diff;
+static const char **pathspec;
+static int speccnt;
+static int diff_rename_minimum_score;
+
+static int matches_pathspec(const char *name)
+{
+	int i;
+	int namelen;
+
+	if (speccnt == 0)
+		return 1;
+
+	namelen = strlen(name);
+	for (i = 0; i < speccnt; i++) {
+		int speclen = strlen(pathspec[i]);
+		if (! strncmp(pathspec[i], name, speclen) &&
+		    speclen <= namelen &&
+		    (name[speclen] == 0 || name[speclen] == '/'))
+			return 1;
+	}
+	return 0;
+}
+
 /* An external diff command takes:
  *
  * diff-cmd name infile1 infile1-sha1 infile1-mode \
- *               infile2 infile2-sha1 infile2-mode.
+ *               infile2 infile2-sha1 infile2-mode [ rename-to ]
  *
  */
-void run_external_diff(const char *name,
-		       const char *other,
-		       struct diff_spec *one,
-		       struct diff_spec *two)
+static void run_external_diff(const char *name,
+			      const char *other,
+			      struct diff_spec *one,
+			      struct diff_spec *two)
 {
 	struct diff_tempfile *temp = diff_temp;
 	pid_t pid;
 	int status;
 	static int atexit_asked = 0;
 
+	if (reverse_diff) {
+		struct diff_spec *tmp_spec;
+		tmp_spec = one; one = two; two = tmp_spec;
+		if (other) {
+			const char *tmp;
+			tmp = name; name = other; other = tmp;
+		}
+	}
+
+	if (!matches_pathspec(name) && (!other || !matches_pathspec(other)))
+		return;
+
 	if (one && two) {
 		prepare_temp_file(name, &temp[0], one);
 		prepare_temp_file(other ? : name, &temp[1], two);
@@ -329,14 +373,23 @@
 		die("unable to fork");
 	if (!pid) {
 		const char *pgm = external_diff();
-		/* not passing rename patch to external ones */
-		if (!other && pgm) {
-			if (one && two)
-				execlp(pgm, pgm,
-				       name,
-				       temp[0].name, temp[0].hex, temp[0].mode,
-				       temp[1].name, temp[1].hex, temp[1].mode,
-				       NULL);
+		if (pgm) {
+			if (one && two) {
+				const char *exec_arg[9];
+				const char **arg = &exec_arg[0];
+				*arg++ = pgm;
+				*arg++ = name;
+				*arg++ = temp[0].name;
+				*arg++ = temp[0].hex;
+				*arg++ = temp[0].mode;
+				*arg++ = temp[1].name;
+				*arg++ = temp[1].hex;
+				*arg++ = temp[1].mode;
+				if (other)
+					*arg++ = other;
+				*arg = 0;
+				execvp(pgm, (char *const*) exec_arg);
+			}
 			else
 				execlp(pgm, pgm, name, NULL);
 		}
@@ -367,6 +420,293 @@
 	remove_tempfile();
 }
 
+/*
+ * We do not detect circular renames.  Just hold created and deleted
+ * entries and later attempt to match them up.  If they do not match,
+ * then spit them out as deletes or creates as original.
+ */
+
+static struct diff_spec_hold {
+	struct diff_spec_hold *next;
+	struct diff_spec it;
+	unsigned long size;
+	int flags;
+#define MATCHED 1
+#define SHOULD_FREE 2
+#define SHOULD_MUNMAP 4
+	void *data;
+	char path[1];
+} *createdfile, *deletedfile;
+
+static void hold_diff(const char *name,
+		      struct diff_spec *one,
+		      struct diff_spec *two)
+{
+	struct diff_spec_hold **list, *elem;
+
+	if (one->file_valid && two->file_valid)
+		die("internal error");
+
+	if (!detect_rename) {
+		run_external_diff(name, NULL, one, two);
+		return;
+	}
+	elem = xmalloc(sizeof(*elem) + strlen(name));
+	strcpy(elem->path, name);
+	elem->size = 0;
+	elem->data = NULL;
+	elem->flags = 0;
+	if (one->file_valid) {
+		list = &deletedfile;
+		elem->it = *one;
+	}
+	else {
+		list = &createdfile;
+		elem->it = *two;
+	}
+	elem->next = *list;
+	*list = elem;
+}
+
+static int populate_data(struct diff_spec_hold *s)
+{
+	char type[20];
+
+	if (s->data)
+		return 0;
+	if (s->it.sha1_valid) {
+		s->data = read_sha1_file(s->it.blob_sha1, type, &s->size);
+		s->flags |= SHOULD_FREE;
+	}
+	else {
+		struct stat st;
+		int fd;
+		fd = open(s->path, O_RDONLY);
+		if (fd < 0)
+			return -1;
+		if (fstat(fd, &st)) {
+			close(fd);
+			return -1;
+		}
+		s->size = st.st_size;
+		s->data = mmap(NULL, s->size, PROT_READ, MAP_PRIVATE, fd, 0);
+		close(fd);
+		if (!s->size)
+			s->data = "";
+		else
+			s->flags |= SHOULD_MUNMAP;
+	}
+	return 0;
+}
+
+static void free_data(struct diff_spec_hold *s)
+{
+	if (s->flags & SHOULD_FREE)
+		free(s->data);
+	else if (s->flags & SHOULD_MUNMAP)
+		munmap(s->data, s->size);
+	s->flags &= ~(SHOULD_FREE|SHOULD_MUNMAP);
+}
+
+static void flush_remaining_diff(struct diff_spec_hold *elem,
+				 int on_created_list)
+{
+	static struct diff_spec null_file_spec;
+
+	null_file_spec.file_valid = 0;
+	for ( ; elem ; elem = elem->next) {
+		free_data(elem);
+		if (elem->flags & MATCHED)
+			continue;
+		if (on_created_list)
+			run_external_diff(elem->path, NULL,
+					  &null_file_spec, &elem->it);
+		else
+			run_external_diff(elem->path, NULL,
+					  &elem->it, &null_file_spec);
+	}
+}
+
+static int is_exact_match(struct diff_spec_hold *src,
+			  struct diff_spec_hold *dst)
+{
+	if (src->it.sha1_valid && dst->it.sha1_valid &&
+	    !memcmp(src->it.blob_sha1, dst->it.blob_sha1, 20))
+		return 1;
+	if (populate_data(src) || populate_data(dst))
+		/* this is an error but will be caught downstream */
+		return 0;
+	if (src->size == dst->size &&
+	    !memcmp(src->data, dst->data, src->size))
+		return 1;
+	return 0;
+}
+
+#define MINIMUM_SCORE 5000
+int estimate_similarity(struct diff_spec_hold *src, struct diff_spec_hold *dst)
+{
+	/* src points at a deleted file and dst points at a created
+	 * file.  They may be quite similar, in which case we want to
+	 * say src is renamed to dst.
+	 *
+	 * Compare them and return how similar they are, representing
+	 * the score as an integer between 0 and 10000.  10000 is
+	 * reserved for the case where they match exactly.
+	 */
+	void *delta;
+	unsigned long delta_size;
+
+	delta_size = ((src->size < dst->size) ?
+		      (dst->size - src->size) : (src->size - dst->size));
+
+	/* We would not consider rename followed by more than
+	 * 20% edits; that is, delta_size must be smaller than
+	 * (src->size + dst->size)/2 * 0.2, which means...
+	 */
+	if ((src->size + dst->size) < delta_size * 10)
+		return 0;
+
+	delta = diff_delta(src->data, src->size,
+			   dst->data, dst->size,
+			   &delta_size);
+	free(delta);
+
+	/* This "delta" is really xdiff with adler32 and all the
+	 * overheads but it is a quick and dirty approximation.
+	 *
+	 * Now we will give some score to it.  Let's say 20% edit gets
+	 * 5000 points and 0% edit gets 9000 points.  That is, every
+	 * 1/20000 edit gets 1 point penalty.  The amount of penalty is:
+	 *
+	 * (delta_size * 2 / (src->size + dst->size)) * 20000
+	 *
+	 */
+	return 9000 - (40000 * delta_size / (src->size+dst->size));
+}
+
+struct diff_score {
+	struct diff_spec_hold *src;
+	struct diff_spec_hold *dst;
+	int score;
+};
+
+static int score_compare(const void *a_, const void *b_)
+{
+	const struct diff_score *a = a_, *b = b_;
+	return b->score - a->score;
+}
+
+static void flush_rename_pair(struct diff_spec_hold *src,
+			      struct diff_spec_hold *dst)
+{
+	src->flags |= MATCHED;
+	dst->flags |= MATCHED;
+	free_data(src);
+	free_data(dst);
+	run_external_diff(src->path, dst->path,
+			  &src->it, &dst->it);
+}
+
+static void free_held_diff(struct diff_spec_hold *list)
+{
+	struct diff_spec_hold *h;
+	for (h = list; list; list = h) {
+		h = list->next;
+		free_data(list);
+		free(list);
+	}
+}
+
+void diff_flush(void)
+{
+	int num_create, num_delete, c, d;
+	struct diff_spec_hold *elem, *src, *dst;
+	struct diff_score *mx;
+
+	/* We really want to cull the candidates list early
+	 * with cheap tests in order to avoid doing deltas.
+	 */
+	for (dst = createdfile; dst; dst = dst->next) {
+		for (src = deletedfile; src; src = src->next) {
+			if (! is_exact_match(src, dst))
+				continue;
+			flush_rename_pair(src, dst);
+			break;
+		}
+	}
+
+	/* Count surviving candidates */
+	for (num_create = 0, elem = createdfile; elem; elem = elem->next)
+		if (!(elem->flags & MATCHED))
+			num_create++;
+
+	for (num_delete = 0, elem = deletedfile; elem; elem = elem->next)
+		if (!(elem->flags & MATCHED))
+			num_delete++;
+
+	if (num_create == 0 ||  num_delete == 0)
+		goto exit_path;
+
+	mx = xmalloc(sizeof(*mx) * num_create * num_delete);
+	for (c = 0, dst = createdfile; dst; dst = dst->next) {
+		int base = c * num_delete;
+		if (dst->flags & MATCHED)
+			continue;
+		for (d = 0, src = deletedfile; src; src = src->next) {
+			struct diff_score *m = &mx[base+d];
+			if (src->flags & MATCHED)
+				continue;
+			m->src = src;
+			m->dst = dst;
+			m->score = estimate_similarity(src, dst);
+			d++;
+		}
+		c++;
+	}
+	qsort(mx, num_create*num_delete, sizeof(*mx), score_compare); 
+
+ 	for (c = 0; c < num_create * num_delete; c++) {
+		src = mx[c].src;
+		dst = mx[c].dst;
+		if ((src->flags & MATCHED) || (dst->flags & MATCHED))
+			continue;
+		fprintf(stderr,
+			"**score ** %d %s %s\n",
+			mx[c].score, src->path, dst->path);
+	}
+
+ 	for (c = 0; c < num_create * num_delete; c++) {
+		src = mx[c].src;
+		dst = mx[c].dst;
+		if ((src->flags & MATCHED) || (dst->flags & MATCHED))
+			continue;
+		if (mx[c].score < diff_rename_minimum_score)
+			break;
+		flush_rename_pair(src, dst);
+	}
+
+ exit_path:
+	flush_remaining_diff(createdfile, 1);
+	flush_remaining_diff(deletedfile, 0);
+	free_held_diff(createdfile);
+	free_held_diff(deletedfile);
+	createdfile = deletedfile = NULL;
+}
+
+void diff_setup(int detect_rename_, int minimum_score_, int reverse_diff_,
+		const char **pathspec_, int speccnt_)
+{
+	free_held_diff(createdfile);
+	free_held_diff(deletedfile);
+	createdfile = deletedfile = NULL;
+
+	detect_rename = detect_rename_;
+	reverse_diff = reverse_diff_;
+	pathspec = pathspec_;
+	speccnt = speccnt_;
+	diff_rename_minimum_score = minimum_score_ ? : MINIMUM_SCORE;
+}
+
 void diff_addremove(int addremove, unsigned mode,
 		    const unsigned char *sha1,
 		    const char *base, const char *path)
@@ -376,7 +716,8 @@
 
 	memcpy(spec[0].blob_sha1, sha1, 20);
 	spec[0].mode = mode;
-	spec[0].sha1_valid = spec[0].file_valid = 1;
+	spec[0].sha1_valid = !!memcmp(sha1, null_sha1, 20);
+	spec[0].file_valid = 1;
 	spec[1].file_valid = 0;
 
 	if (addremove == '+') {
@@ -384,12 +725,12 @@
 	} else {
 		one = spec; two = one + 1;
 	}
-	
+
 	if (path) {
 		strcpy(concatpath, base);
 		strcat(concatpath, path);
 	}
-	run_external_diff(path ? concatpath : base, NULL, one, two);
+	hold_diff(path ? concatpath : base, one, two);
 }
 
 void diff_change(unsigned old_mode, unsigned new_mode,
@@ -399,17 +740,22 @@
 	char concatpath[PATH_MAX];
 	struct diff_spec spec[2];
 
+	if (path) {
+		strcpy(concatpath, base);
+		strcat(concatpath, path);
+	}
+
 	memcpy(spec[0].blob_sha1, old_sha1, 20);
 	spec[0].mode = old_mode;
 	memcpy(spec[1].blob_sha1, new_sha1, 20);
 	spec[1].mode = new_mode;
-	spec[0].sha1_valid = spec[0].file_valid = 1;
-	spec[1].sha1_valid = spec[1].file_valid = 1;
+	spec[0].sha1_valid = !!memcmp(old_sha1, null_sha1, 20);
+	spec[1].sha1_valid = !!memcmp(new_sha1, null_sha1, 20);
+	spec[1].file_valid = spec[0].file_valid = 1;
 
-	if (path) {
-		strcpy(concatpath, base);
-		strcat(concatpath, path);
-	}
+	/* We do not look at changed files as candidate for
+	 * rename detection ever.
+	 */
 	run_external_diff(path ? concatpath : base, NULL, &spec[0], &spec[1]);
 }
 
diff --git a/diff.h b/diff.h
--- a/diff.h
+++ b/diff.h
@@ -17,21 +17,10 @@
 
 extern void diff_unmerge(const char *path);
 
-/* These are for diff-helper */
+extern void diff_setup(int detect_rename, int minimum_score,
+		       int reverse,
+		       const char **spec, int cnt);
 
-struct diff_spec {
-	unsigned char blob_sha1[20];
-	unsigned short mode;	 /* file mode */
-	unsigned sha1_valid : 1; /* if true, use blob_sha1 and trust mode;
-				  * however with a NULL SHA1, read them
-				  * from the file system.
-				  * if false, use the name and read mode from
-				  * the filesystem.
-				  */
-	unsigned file_valid : 1; /* if false the file does not even exist */
-};
-
-extern void run_external_diff(const char *name, const char *other,
-			      struct diff_spec *, struct diff_spec *);
+extern void diff_flush(void);
 
 #endif /* DIFF_H */
diff --git a/git-apply-patch-script b/git-apply-patch-script
--- a/git-apply-patch-script
+++ b/git-apply-patch-script
@@ -11,6 +11,9 @@
 1)
     echo >&2 "cannot handle unmerged diff on path $1."
     exit 1 ;;
+8)
+    echo >&2 "cannot handle rename diff between $1 and $8 yet."
+    exit 1 ;;
 esac
 name="$1" tmp1="$2" hex1="$3" mode1="$4" tmp2="$5" hex2="$6" mode2="$7"
 
diff --git a/t/t4001-diff-rename.sh b/t/t4001-diff-rename.sh
new file mode 100644
--- /dev/null
+++ b/t/t4001-diff-rename.sh
@@ -0,0 +1,60 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='Test rename detection in diff engine.
+
+'
+. ./test-lib.sh
+
+echo >path0 'Line 1
+Line 2
+Line 3
+Line 4
+Line 5
+Line 6
+Line 7
+Line 8
+Line 9
+Line 10
+line 11
+Line 12
+Line 13
+Line 14
+Line 15
+'
+
+test_expect_success \
+    'update-cache --add a file.' \
+    'git-update-cache --add path0'
+
+test_expect_success \
+    'write that tree.' \
+    'tree=$(git-write-tree)'
+
+sed -e 's/line/Line/' <path0 >path1
+rm -f path0
+test_expect_success \
+    'renamed and edited the file.' \
+    'git-update-cache --add --remove path0 path1'
+
+test_expect_success \
+    'git-diff-cache -p -M after rename and editing.' \
+    'git-diff-cache -p -M $tree >current'
+cat >expected <<\EOF
+diff --git a/path0 b/path1
+rename old path0
+rename new path1
+--- a/path0
++++ b/path1
+@@ -8,7 +8,7 @@ Line 7
+ Line 8
+ Line 9
+ Line 10
+-line 11
++Line 11
+ Line 12
+ Line 13
+ Line 14
+EOF
------------------------------------------------


^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox