Git development
 help / color / mirror / Atom feed
* Re: [PATCH] Make sure diff-helper can tell rename/copy in the new diff-raw format.
From: Junio C Hamano @ 2005-05-23 18:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <Pine.LNX.4.58.0505230736180.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> Btw, I still disagree with this notion that the order of the
LT> use of the names makes a difference.

Having slept over it, I think I tend to agree.  I do not mind
annotating diff-raw output with "is this copy or is this rename"
bit anymore.  While we are at it, I would also want to either
(1) add similarity index field to diff-raw output, or (2) drop
similarity index output from the built-in patch output.  I am
inclined to vote for the former right now (if only it is more
fun to watch), but I can easily be dissuaded.

The proposed diff-raw format, in its fully expended form, is this:

in-place edit  :100644 100644 bcd1234... 0123456... file0 file0 . 0
copy-edit      :100644 100644 abcd123... 1234567... file1 file2 C 68
rename-edit    :100644 100644 abcd123... 1234567... file1 file3 R 86
create         :000000 100644 0000000... 1234567... file4 file4 . 0
delete         :100644 000000 1234567... 0000000... file5 file5 . 0
unmerged       :000000 000000 0000000... 0000000... file6 file6 . 0

The two columns added are rename/copy bit and similarity index.
When one->path and two->path are the same, they do not mean
anything but for parser simplicity's sake I'd like to have 0 in
the similarity index field and a dot in copy/rename bit field.

Human readable form should omit two->path and later fields
altogether if one->path == two->path, so the above becomes:

in-place edit  :100644 100644 bcd1234... 0123456... file0
copy-edit      :100644 100644 abcd123... 1234567... file1 file2 C 68
rename-edit    :100644 100644 abcd123... 1234567... file1 file3 R 86
create         :000000 100644 0000000... 1234567... file4
delete         :100644 000000 1234567... 0000000... file5
unmerged       :000000 000000 0000000... 0000000... file6

This has a nice property that diff-helper, aside from its
diff-raw parsing part, can become quite simplified.  It should
lose rename/copy related flags (-M, -C) because they are already
detected by the tool in the upstream of the pipe; and because
rename-copy is an asymmetric operation, it should also lose the
-R flag.  I think it already does a wrong thing when you use
diff-tree brothers with -M or -C and feed diff-helper -R with
the output that contains already matched rename/copy.

The only thing diff-helper _will_ continue to do is to take a
diff-raw output prepared by diff-tree brothers, and generate
what the upstream tool would have generated if it were given
'-p' (and that should have been the case from the beginning).  

Although I think diffcore transformers other than rename/copy
may still be useful (like pickaxe) in diff-helper, that also can
be handled upstream.


^ permalink raw reply

* Re: git-diff-tree -z HEAD | git-diff-helper -z fails for me
From: Linus Torvalds @ 2005-05-23 18:21 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Glanzmann, GIT
In-Reply-To: <7vwtpp6goy.fsf@assigned-by-dhcp.cox.net>



On Mon, 23 May 2005, Junio C Hamano wrote:
> 
> Tg> 	(faui00u) [~/work/git/yagf] git-diff-tree -z HEAD | git-diff-helper -z
> Tg> 	03c70739ae572ea9832b97dfcf9ca2594702efe8 (from 30e756ea8569bb429b7073b304acd8a960f77e4b)
> Tg> 	:100755 100755 f2e04c3b45b2a5ab5cf53228025158902c9de5be b93fd0310e51eea4d48d42c6ad83399cff8ab56egitgit(faui00u) [~/work/git/yagf]
> 
> Tg> What I am doing wrong here?
> 
> Nothing.  I'm an idiot.
> 
> Linus, do you mind if I say with "-z" git-diff-tree should not
> do _any_ header thing?

How about instead making sure that any "extra" text be NUL-terminated and
never start with ':' after a NUL (which will automatically be true, since
it's either "diff-tree " + ascii for the verbose case, or just the tree
name).

That way git-diff-helper automatically does the right thing when it just 
passes lines (where "lines" is defined as being NUL-terminated) that it 
doesn't understand through unmodified.

		Linus

^ permalink raw reply

* Re: git-diff-tree -z HEAD | git-diff-helper -z fails for me
From: Junio C Hamano @ 2005-05-23 16:38 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: torvalds, GIT
In-Reply-To: <20050523090206.GJ23388@cip.informatik.uni-erlangen.de>

>>>>> "Tg" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:

Tg> Hello,
Tg> this fails:

Tg> 	(faui00u) [~/work/git/yagf] git-diff-tree -z HEAD | git-diff-helper -z
Tg> 	03c70739ae572ea9832b97dfcf9ca2594702efe8 (from 30e756ea8569bb429b7073b304acd8a960f77e4b)
Tg> 	:100755 100755 f2e04c3b45b2a5ab5cf53228025158902c9de5be b93fd0310e51eea4d48d42c6ad83399cff8ab56egitgit(faui00u) [~/work/git/yagf]

Tg> What I am doing wrong here?

Nothing.  I'm an idiot.

Linus, do you mind if I say with "-z" git-diff-tree should not
do _any_ header thing?



^ permalink raw reply

* Re: cogito - how do I ???
From: Junio C Hamano @ 2005-05-23 16:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Glanzmann, Sean, Sam Ravnborg, git
In-Reply-To: <Pine.LNX.4.58.0505230731430.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> ..., and we can (and probably should) order rev-tree output 
LT> with some topological sort based on the commit tree.

Seconded.


^ permalink raw reply

* Re: [PATCH] Make sure diff-helper can tell rename/copy in the new diff-raw format.
From: Linus Torvalds @ 2005-05-23 14:49 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git
In-Reply-To: <7vfywe769d.fsf@assigned-by-dhcp.cox.net>



On Mon, 23 May 2005, Junio C Hamano wrote:
>
> This adds tests to make sure that diff-helper can tell renames
> from copies using the same "everything but the last one are
> copies and the last one is either rename or stay" logic.

Btw, I still disagree with this notion that the order of the use of the 
names makes a difference.

I think that when we generate a diff, we should always have the _option_
to make sure that we generate it in a format where you can apply it
"incrementally". But I disagree with the notion that it's something
fundamental, and in fact, I even think that it's not necessarily a good
default.

For example, let's say that you have modified "fileA" _and_ you have 
created a "fileB" that is a copy of the original "fileA" with some _other_ 
slight modifications. We'll call the SHA1's involved "sha_A", "sha_A'" and 
"sha_B"

I think it's perfectly valid to say

	:100644 100644 <sha_A> <sha_A'> M	fileA	fileA
	:100644 100644 <sha_A> <sha_B> C89	fileA	fileB

which says "fileA" was modified from orig-A to new-A, and "fileB" is a 
copy based on orig-A.

(I've used a new syntax just to confuse the issue, with the extra field 
for "what happened", aka "M" for "modify", "C89" for "copy a 89% similar 
file").

Now, when the above is turned into a "diff", that diff is no longer
something you can apply "incrementally" - you have to apply it as if
you're applying all differences to the "original tree". But the thing is,
that's actually what I _want_, because I was planning on writing a tool
that applies patches that applies them all-or-nothing.

Also, it turns out that this kind of "non-incremental" diff is the kind
that I personally want to see as a _human_, because quite frankly, my
brain-capacity is that of a demented ocelot, and I can't _remember_ what
happened in other parts of the diff. I much prefer the stateless "oh, this
file X is in that relation Y to the previous version of file Z".

I do that partly because I actually routinely edit patches. If you have 
the incremental format, that's practically impossible, while the stateless 
version is fine.

See?

So I think all the clever "don't re-use files we have modified" etc is 
actually wrong. If you want to make a traditional diff that can be applied 
with normal "patch", you just don't use the -M or -C flags.

		Linus

^ permalink raw reply

* Re: cogito - how do I ???
From: Linus Torvalds @ 2005-05-23 14:35 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Sean, Sam Ravnborg, git
In-Reply-To: <20050523071919.GG23388@cip.informatik.uni-erlangen.de>



On Mon, 23 May 2005, Thomas Glanzmann wrote:
> 
> > 	git-rev-tree MERGE_HEAD ^HEAD | git-diff-tree -v -m -s --stdin
> 
> This doesn't work for me:

Yeah, I'm an idiot.

> Bit this does:
> 
> (faui00u) [~/work/git/git] git-rev-tree 2cb45e95438c113871fbbea5b4f629f9463034e7 ^09d74b3b5ac634495e17b92b2b785fa996ffce97 | awk '{print $2'} | sed 's#:.*##' | git-diff-tree -v -m -s --stdin
> 
> was that a typo or is git-diff-tree supposed to handle the output of
> git-rev-tree as well and it is a bug?

It was me just forgetting about the time thing in rev-tree, forcing you to
have a second phase there (I usually use just "cut -d' ' -f2" - the input
_can_ have the ":n" flag thing that rev-tree outputs, and git-diff-tree
will just ignore it).

I actually suspect that whole time thing was a mistake, it seemed sensible 
back when we didn't have any other way of ordering the changesets well, 
but it's really a bad ordering anyway to do it by time (ie add a "sort 
-rn" in there), and we can (and probably should) order rev-tree output 
with some topological sort based on the commit tree.

		Linus

^ permalink raw reply

* git-diff-tree -z HEAD | git-diff-helper -z fails for me
From: Thomas Glanzmann @ 2005-05-23  9:02 UTC (permalink / raw)
  To: GIT

Hello,
this fails:

	(faui00u) [~/work/git/yagf] git-diff-tree -z HEAD | git-diff-helper -z
	03c70739ae572ea9832b97dfcf9ca2594702efe8 (from 30e756ea8569bb429b7073b304acd8a960f77e4b)
	:100755 100755 f2e04c3b45b2a5ab5cf53228025158902c9de5be b93fd0310e51eea4d48d42c6ad83399cff8ab56egitgit(faui00u) [~/work/git/yagf]

But:

	(faui00u) [~/work/git/yagf] git-diff-tree -z HEAD 30e756ea8569bb429b7073b304acd8a960f77e4b | git-diff-helper -z
	diff --git a/git b/git
	--- a/git
	+++ b/git
	@@ -540,11 +540,11 @@ changes

		if (defined ($options{'L'})) {
			pull('-o', shift);
	-               system("git-rev-tree HEAD '^REMOTE_HEAD' | sed -e 's/^[0-9]* //' | git-diff-tree --stdin -v $git_diff_tree_options");
	+               system("git-rev-tree HEAD ^REMOTE_HEAD | sed -e 's/^[0-9]* //' | git-diff-tree --stdin -v $git_diff_tree_options");

		} elsif (defined ($options{'R'})) {
			pull('-o', shift);
	-               system("git-rev-tree REMOTE_HEAD '^HEAD' | sed -e 's/^[0-9]* //' | git-diff-tree --stdin -v $git_diff_tree_options");
	+               system("git-rev-tree REMOTE_HEAD ^HEAD | sed -e 's/^[0-9]* //' | git-diff-tree --stdin -v $git_diff_tree_options");

		} else {
			system("git-rev-list HEAD | git-diff-tree --stdin -v $git_diff_tree_options");

works? What I am doing wrong here?

	Thomas

^ permalink raw reply

* [PATCH 1/1] bugfix for git-checkout-cache --prefix=/symlink/export_dir/ -a
From: David Greaves @ 2005-05-23  8:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

If there's a prefix then allow symlinks to directories in it.
This fixes a bug where
  git-checkout-cache --prefix=/symlink/export_dir/ -a
otherwise fails.

Signed-off-by: David Greaves <david@dgreaves.com>

---
commit 18006ac69ae9db97b7d96fd0bfb1bdb6893e318d
tree 37580a3d32262d2cee7d28b20474c0f2ceb6faaa
parent 2aef5bbae99aeba3551408eae13faea02bf55b67
author David Greaves <david@dgreaves.com> Mon, 23 May 2005 09:42:04 +0100
committer David Greaves <david@dgreaves.com> Mon, 23 May 2005 09:42:04 +0100

 checkout-cache.c |    8 ++++++--
 1 files changed, 6 insertions(+), 2 deletions(-)

Index: checkout-cache.c
===================================================================
--- 99282828d5af15b0af0d0eac13a5b1194e88342c/checkout-cache.c  (mode:100644)
+++ 37580a3d32262d2cee7d28b20474c0f2ceb6faaa/checkout-cache.c  (mode:100644)
@@ -37,20 +37,25 @@
 #include "cache.h"
 
 static int force = 0, quiet = 0, not_new = 0, refresh_cache = 0;
+const char *base_dir = "";
 
 static void create_directories(const char *path)
 {
 	int len = strlen(path);
 	char *buf = xmalloc(len + 1);
 	const char *slash = path;
+	int baselen = strlen(base_dir);
 
 	while ((slash = strchr(slash+1, '/')) != NULL) {
+		struct stat st;
 		len = slash - path;
 		memcpy(buf, path, len);
 		buf[len] = 0;
+		if (slash - path <= baselen &&
+		    !stat(buf, &st) && S_ISDIR(st.st_mode))
+			continue; /* allow symlinks only in --prefix */
 		if (mkdir(buf, 0755)) {
 			if (errno == EEXIST) {
-				struct stat st;
 				if (!lstat(buf, &st) && S_ISDIR(st.st_mode))
 					continue; /* ok */
 				if (force && !unlink(buf) && !mkdir(buf, 0755))
@@ -229,7 +234,6 @@
 int main(int argc, char **argv)
 {
 	int i, force_filename = 0;
-	const char *base_dir = "";
 	struct cache_file cache_file;
 	int newfd = -1;
 

^ permalink raw reply

* Re: cogito - how do I ???
From: Thomas Glanzmann @ 2005-05-23  8:02 UTC (permalink / raw)
  To: git
In-Reply-To: <7v64xa75l3.fsf@assigned-by-dhcp.cox.net>

Hello,

> It should not.  "diff-tree --stdin" expects the line to begin
> with an SHA1 and it either takes (1) one SHA1 followed by one
> space followed by another SHA1 potentially followed by garbage
> til the newline, or (2) one SHA1 potentially followed by garbage
> til the newline.  rev-tree has the timestamp at the beginning
> which does not match either of them.  What you are doing with
> awk should work, so should this:

thanks for the clarification.

>   git-rev-tree MH ^H | sed -e 's/^[0-9]* //' | git-diff-tree --stdin ...

This is very usefull with a 'pull only' mode.

Thanks,
	Thomas

^ permalink raw reply

* [cogito] Sync objects only when needed...
From: Michal Rokos @ 2005-05-23  7:48 UTC (permalink / raw)
  To: git

Hello,

during the day I'm usually connected via modem so syncing all the objects via rsync is painful.

So, is there any reason to sync them even if remote and and local origin head are the same?

 Michal

PS: I'm off the list, so please CC me.

Signed-off-by: Michal Rokos <michal@rokos.info>

Index: cg-pull
===================================================================
--- ca5fef50fb68a3afbb35e1a48ac622f7a964f021/cg-pull  (mode:100755)
+++ uncommitted/cg-pull  (mode:100755)
@@ -9,8 +9,14 @@
 
 . ${COGITO_LIB}cg-Xlib
 
-name=$1
+force=
 
+if [ "$1" == "-f" ]; then
+ force=1
+ shift
+fi
+
+name=$1
 
 [ "$name" ] || { [ -s $_git/refs/heads/origin ] && name=origin; }
 [ "$name" ] || die "where to pull from?"
@@ -199,6 +205,22 @@
 fi
 [ "$rsyncerr" ] && die "unable to get the head pointer of branch $rembranch"
 
+new_head=$(cat "$_git/refs/heads/$name")
+
+if [ ! "$orig_head" ]; then
+ echo "New branch: $new_head"
+
+elif [ "$orig_head" != "$new_head" ]; then
+ echo "Tree change: $orig_head:$new_head"
+
+elif [ -n "$force" ]; then
+ echo "Update forced..."
+
+else
+ echo "Up to date."
+ exit
+fi
+
 [ -d $_git_objects ] || mkdir -p $_git_objects
 $pull "$name" "$uri" || die "objects pull failed"
 
@@ -215,8 +237,6 @@
 [ "$rsyncerr" ] && echo "unable to get tags list (non-fatal)" >&2
 
 
-new_head=$(cat "$_git/refs/heads/$name")
-
 if [ ! "$orig_head" ]; then
  echo "New branch: $new_head"
 

^ permalink raw reply

* Re: cogito - how do I ???
From: Junio C Hamano @ 2005-05-23  7:40 UTC (permalink / raw)
  To: Thomas Glanzmann; +Cc: Linus Torvalds, git
In-Reply-To: <20050523071919.GG23388@cip.informatik.uni-erlangen.de>

>>>>> "TG" == Thomas Glanzmann <sithglan@stud.uni-erlangen.de> writes:

>> git-rev-tree MERGE_HEAD ^HEAD | git-diff-tree -v -m -s --stdin

TG> This doesn't work for me:

It should not.  "diff-tree --stdin" expects the line to begin
with an SHA1 and it either takes (1) one SHA1 followed by one
space followed by another SHA1 potentially followed by garbage
til the newline, or (2) one SHA1 potentially followed by garbage
til the newline.  rev-tree has the timestamp at the beginning
which does not match either of them.  What you are doing with
awk should work, so should this:

  git-rev-tree MH ^H | sed -e 's/^[0-9]* //' | git-diff-tree --stdin ...



^ permalink raw reply

* [PATCH] Make sure diff-helper can tell rename/copy in the new diff-raw format.
From: Junio C Hamano @ 2005-05-23  7:26 UTC (permalink / raw)
  To: torvalds; +Cc: git

This adds tests to make sure that diff-helper can tell renames
from copies using the same "everything but the last one are
copies and the last one is either rename or stay" logic.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

t/t4003-diff-rename-1.sh |    6 +--
t/t4005-diff-rename-2.sh |   93 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 93 insertions(+), 6 deletions(-)

diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
--- a/t/t4003-diff-rename-1.sh
+++ b/t/t4003-diff-rename-1.sh
@@ -58,7 +58,7 @@ rename new COPYING.#
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#1)' \
     'diff -u current expected'
 
 test_expect_success \
@@ -98,7 +98,7 @@ diff --git a/COPYING b/COPYING
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#2)' \
     'diff -u current expected'
 
 test_expect_success \
@@ -127,7 +127,7 @@ copy to COPYING.#
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#3)' \
     'diff -u current expected'
 
 test_done
diff --git a/t/t4005-diff-rename-2.sh b/t/t4005-diff-rename-2.sh
--- a/t/t4005-diff-rename-2.sh
+++ b/t/t4005-diff-rename-2.sh
@@ -36,7 +36,42 @@ cat >expected <<\EOF
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#1)' \
+    'diff -u current expected'
+
+# make sure diff-helper groks it correctly.
+mv expected raw-output
+GIT_DIFF_OPTS=--unified=0 git-diff-helper <raw-output |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current &&
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+rename old COPYING
+rename new COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- Note that the only valid version of the GPL as far as this project
++ Note that the only valid version of the G.P.L as far as this project
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ HOWEVER, in order to allow a migration to G.P.Lv# if that seems like
+@@ -# +# @@
+-	This file is licensed under the GPL v#, or a later version
++	This file is licensed under the G.P.L v#, or a later version
+EOF
+
+test_expect_success \
+    'validate output from diff-helper (#1)' \
     'diff -u current expected'
 
 test_expect_success \
@@ -56,7 +91,39 @@ cat >expected <<\EOF
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#2)' \
+    'diff -u current expected'
+
+# make sure diff-helper groks it correctly.
+mv expected raw-output
+GIT_DIFF_OPTS=--unified=0 git-diff-helper <raw-output |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+diff --git a/COPYING b/COPYING
+--- a/COPYING
++++ b/COPYING
+@@ -# +# @@
+- Note that the only valid version of the GPL as far as this project
++ Note that the only valid version of the G.P.L as far as this project
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ HOWEVER, in order to allow a migration to G.P.Lv# if that seems like
+@@ -# +# @@
+-	This file is licensed under the GPL v#, or a later version
++	This file is licensed under the G.P.L v#, or a later version
+EOF
+
+test_expect_success \
+    'validate output from diff-helper (#2)' \
     'diff -u current expected'
 
 test_expect_success \
@@ -76,7 +143,27 @@ cat >expected <<\EOF
 EOF
 
 test_expect_success \
-    'validate output from rename/copy detection' \
+    'validate output from rename/copy detection (#3)' \
+    'diff -u current expected'
+
+# make sure diff-helper groks it correctly.
+mv expected raw-output
+GIT_DIFF_OPTS=--unified=0 git-diff-helper <raw-output |
+sed -e 's/\([0-9][0-9]*\)/#/g' >current
+cat >expected <<\EOF
+diff --git a/COPYING b/COPYING.#
+similarity index #%
+copy from COPYING
+copy to COPYING.#
+--- a/COPYING
++++ b/COPYING.#
+@@ -# +# @@
+- HOWEVER, in order to allow a migration to GPLv# if that seems like
++ However, in order to allow a migration to GPLv# if that seems like
+EOF
+
+test_expect_success \
+    'validate output from diff-helper (#3)' \
     'diff -u current expected'
 
 test_done
------------------------------------------------


^ permalink raw reply

* [PATCH] Performance fix for pickaxe.
From: Junio C Hamano @ 2005-05-23  7:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git

The pickaxe was expanding the blobs and searching in them even
when it should have already known that both sides are the same.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diffcore-pickaxe.c |    3 ++-
1 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -44,7 +44,8 @@ void diffcore_pickaxe(const char *needle
 			if (contains(p->one, needle, len))
 				diff_q(&outq, p);
 		}
-		else if (contains(p->one, needle, len) !=
+		else if (!diff_unmodified_pair(p) &&
+			 contains(p->one, needle, len) !=
 			 contains(p->two, needle, len))
 			diff_q(&outq, p);
 		if (onum == outq.nr)
------------------------------------------------


^ permalink raw reply

* git-switch-tree script
From: Jeff Garzik @ 2005-05-23  7:24 UTC (permalink / raw)
  To: david; +Cc: Git Mailing List

[-- Attachment #1: Type: text/plain, Size: 76 bytes --]

This is what I use to switch between branches, in my kernel trees.

	Jeff



[-- Attachment #2: git-switch-tree --]
[-- Type: text/plain, Size: 339 bytes --]

#!/bin/sh

if [ "x$1" != "x" ]
then
	if [ "$1" == "master" ]
	then
		( cd .git && rm -f HEAD && ln -s refs/heads/master HEAD )
	else
		if [ ! -f .git/refs/heads/$1 ]
		then
			echo Branch $1 not found.
			exit 1
		fi

		( cd .git && rm -f HEAD && ln -s refs/heads/$1 HEAD )
	fi
fi

git-read-tree -m HEAD && git-checkout-cache -q -f -u -a


^ permalink raw reply

* Re: cogito - how do I ???
From: Thomas Glanzmann @ 2005-05-23  7:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Sean, Sam Ravnborg, git
In-Reply-To: <Pine.LNX.4.58.0505211635440.2206@ppc970.osdl.org>

Hello Linus,

> 	git-rev-tree MERGE_HEAD ^HEAD | git-diff-tree -v -m -s --stdin

This doesn't work for me:

(faui00u) [~/work/git/git] git-rev-tree 2cb45e95438c113871fbbea5b4f629f9463034e7 ^09d74b3b5ac634495e17b92b2b785fa996ffce97
1116799695 2cb45e95438c113871fbbea5b4f629f9463034e7:1 09d74b3b5ac634495e17b92b2b785fa996ffce97:3
(faui00u) [~/work/git/git] git-rev-tree 2cb45e95438c113871fbbea5b4f629f9463034e7 ^09d74b3b5ac634495e17b92b2b785fa996ffce97 | git-diff-tree -v -m -s --stdin
(faui00u) [~/work/git/git]

Bit this does:

(faui00u) [~/work/git/git] git-rev-tree 2cb45e95438c113871fbbea5b4f629f9463034e7 ^09d74b3b5ac634495e17b92b2b785fa996ffce97 | awk '{print $2'} | sed 's#:.*##' | git-diff-tree -v -m -s --stdin

was that a typo or is git-diff-tree supposed to handle the output of
git-rev-tree as well and it is a bug?

	Thomas

^ permalink raw reply

* Now I think I am done with diff...
From: Junio C Hamano @ 2005-05-23  6:50 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7v4qcwihu1.fsf@assigned-by-dhcp.cox.net>

>>>>> "I" == Junio C Hamano <junkio@cox.net> said:

JCH> Now I think I am done with diff,...

I think I have fixed the diff-raw output routine enough to
address the rename and copy distinction issues (although I chose
not to give individual entries "this is a copy" or "this is a
rename" indicators).

And I thought I was really done with diff this time, but here
are some leftover issues I would like to further address, not
necessarily in this order, and not necessarily promising to do
all of them.

 - Make "diff-raw human readable" output quite different from
   "diff-raw machine readable" format.  I think human readable
   format should be more like "concise version of diff-patch
   format for human consumption", not the current "LF and TAB
   separated version of machine readable output", to enhance
   readability.  This includes:

   (1) not repeating paths when src and dst are the same
       (i.e. most of the time);

   (2) substituting src/dst with /dev/null when talking about
       creation or deletion;

   (3) prefixing the src and dst with '+' or '*' or '-'.

   Currently I do not intend to add _parsing_ of such output to
   diff-helper because you can always use -z on both ends of the
   pipe (and as yourself said "Sure. Although I doubt people use
   the raw diff output except to (a) feed to diff-helper or (b)
   check that it's non-empty."), but I have to sleep on this.

 - Add an option to cause diff-tree three-brothers not to call
   diffcore_prune() at all.  This will essentially produce
   something like a merge of two output from ls-files or
   ls-trees commands, but with renames/copies resolved.  I do
   not know how useful this would be, especially now the
   immediate problem of diff_flush() pruning all the
   "non-change" entries is (hopefully) fixed.

 - Give pickaxe an option to cause it to produce diffs for the
   entire commit, not just the found file.

 - Experiment with the patch reordering I talked about in the
   previous message.

 - Add the support of "copied from post-edit image" detection in
   rename/copy logic.  Currently I compare only with pre-edit
   image for modified files.

 - Read and understand the patch-delta.c, to make rename/copy
   similarity estimator more accurate.  As you pointed out, it
   does very bad things on big consecutive deletes.

 - Think about performance issues around "diff-tree -C" stuff.

 - Think more about the problem with the patch output between
   two symbolic links, and how to fix it (I've been quietly
   thinking about this for some time ;-)).  I think the current
   output of just comparing the two readlink results and
   generating a normal diff between them is broken and hazardous
   if such a thing is mistakenly fed to patch -p1.

 - Teach JIT about the diff-raw format changes.


^ permalink raw reply

* Comments on "Rename/copy detection fix."
From: Junio C Hamano @ 2005-05-23  4:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <7vd5ribmam.fsf_-_@assigned-by-dhcp.cox.net>

I talked about patch reordering in the commit log without giving
specifics, because I did not think it is good to cast specifics
of uncoded ideas into stone in a commit log message.

But to throw it out open in the public, here are two of the
obvious ones.

 (1) Give users a way to specify patches about which files
     should come before which other files.  Things like "*.h
     files before *.c files", or "README and Documentation first
     before anything else".

 (2) Currently when pickaxe is used, it shows only diffs about
     the files involved in the change we looked for.  The user
     may sometimes want to see such a diff in the context of
     whole patches and I plan to add an option for doing so.  It
     may be useful in such a case to show the patch about the
     file that triggered the commit to be selected by the
     pickaxe first before patches for all other files.

These may or may not be useful, but this kind of reordering
would make it pretty much pointless to record which is a rename
and which is a copy in the diff_filepair structure in the
diff_queue.  It all depends on which one comes first and which
one comes later in the final ordering, which is something
diffcore-rename cannot (and should not) know about.



^ permalink raw reply

* [PATCH] Rename/copy detection fix.
From: Junio C Hamano @ 2005-05-23  4:26 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Git Mailing List
In-Reply-To: <7vy8a7cavf.fsf@assigned-by-dhcp.cox.net>

The rename/copy detection logic in earlier round was only good
enough to show patch output and discussion on the mailing list
about the diff-raw format updates revealed many problems with
it.  This patch fixes all the ones known to me, without making
things I want to do later impossible, mostly related to patch
reordering.

 (1) Earlier rename/copy detector determined which one is rename
     and which one is copy too early, which made it impossible
     to later introduce diffcore transformers to reorder
     patches.  This patch fixes it by moving that logic to the
     very end of the processing.

 (2) Earlier output routine diff_flush() was pruning all the
     "no-change" entries indiscriminatingly.  This was done due
     to my false assumption that one of the requirements in the
     diff-raw output was not to show such an entry (which
     resulted in my incorrect comment about "diff-helper never
     being able to be equivalent to built-in diff driver").  My
     special thanks go to Linus for correcting me about this.
     When we produce diff-raw output, for the downstream to be
     able to tell renames from copies, sometimes it _is_
     necessary to output "no-change" entries, and this patch
     adds diffcore_prune() function for doing it.

 (3) Earlier diff_filepair structure was trying to be not too
     specific about rename/copy operations, but the purpose of
     the structure was to record one or two paths, which _was_
     indeed about rename/copy.  This patch discards xfrm_msg
     field which was trying to be generic for this wrong reason,
     and introduces a couple of fields (rename_score and
     rename_rank) that are explicitly specific to rename/copy
     logic.  One thing to note is that the information in a
     single diff_filepair structure _still_ does not distinguish
     renames from copies, and it is deliberately so.  This is to
     allow patches to be reordered in later stages.

 (4) This patch also adds some tests about diff-raw format
     output and makes sure that necessary "no-change" entries
     appear on the output.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diff.c                   |  145 +++++++++++++++++++++++++++++++----------------
diffcore-pathspec.c      |    2 
diffcore-pickaxe.c       |    2 
diffcore-rename.c        |  112 +++++++++++-------------------------
diffcore.h               |   13 +++-
t/t4003-diff-rename-1.sh |   26 ++++----
t/t4005-diff-rename-2.sh |   82 ++++++++++++++++++++++++++
7 files changed, 244 insertions(+), 138 deletions(-)
new file (100644): t/t4005-diff-rename-2.sh

diff --git a/diff.c b/diff.c
--- a/diff.c
+++ b/diff.c
@@ -283,12 +283,6 @@ int diff_populate_filespec(struct diff_f
 	return 0;
 }
 
-void diff_free_filepair(struct diff_filepair *p)
-{
-	free(p->xfrm_msg);
-	free(p);
-}
-
 void diff_free_filespec_data(struct diff_filespec *s)
 {
 	if (s->should_free)
@@ -501,9 +495,9 @@ struct diff_filepair *diff_queue(struct 
 	struct diff_filepair *dp = xmalloc(sizeof(*dp));
 	dp->one = one;
 	dp->two = two;
-	dp->xfrm_msg = NULL;
+	dp->score = 0;
 	dp->orig_order = queue->nr;
-	dp->xfrm_work = 0;
+	dp->rename_rank = 0;
 	diff_q(queue, dp);
 	return dp;
 }
@@ -522,23 +516,7 @@ static void diff_flush_raw(struct diff_f
 	       p->two->path, line_termination);
 }
 
-static void diff_flush_patch(struct diff_filepair *p)
-{
-	const char *name, *other;
-
-	name = p->one->path;
-	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
-	if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
-	    (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
-		return; /* no tree diffs in patch format */ 
-
-	if (DIFF_PAIR_UNMERGED(p))
-		run_external_diff(name, NULL, NULL, NULL, NULL);
-	else
-		run_external_diff(name, other, p->one, p->two, p->xfrm_msg);
-}
-
-static int uninteresting(struct diff_filepair *p)
+int diff_unmodified_pair(struct diff_filepair *p)
 {
 	/* This function is written stricter than necessary to support
 	 * the currently implemented transformers, but the idea is to
@@ -570,12 +548,70 @@ static int uninteresting(struct diff_fil
 	return 0;
 }
 
+static void diff_flush_patch(struct diff_filepair *p, const char *msg)
+{
+	const char *name, *other;
+
+	/* diffcore_prune() keeps "stay" entries for diff-raw
+	 * copy/rename detection, but when we are generating
+	 * patches we do not need them.
+	 */
+	if (diff_unmodified_pair(p))
+		return;
+
+	name = p->one->path;
+	other = (strcmp(name, p->two->path) ? p->two->path : NULL);
+	if ((DIFF_FILE_VALID(p->one) && S_ISDIR(p->one->mode)) ||
+	    (DIFF_FILE_VALID(p->two) && S_ISDIR(p->two->mode)))
+		return; /* no tree diffs in patch format */ 
+
+	if (DIFF_PAIR_UNMERGED(p))
+		run_external_diff(name, NULL, NULL, NULL, NULL);
+	else
+		run_external_diff(name, other, p->one, p->two, msg);
+}
+
+int diff_needs_to_stay(struct diff_queue_struct *q, int i,
+		       struct diff_filespec *it)
+{
+	/* If it will be used in later entry (either stay or used
+	 * as the source of rename/copy), we need to copy, not rename.
+	 */
+	while (i < q->nr) {
+		struct diff_filepair *p = q->queue[i++];
+		if (!DIFF_FILE_VALID(p->two))
+			continue; /* removed is fine */
+		if (strcmp(p->one->path, it->path))
+			continue; /* not relevant */
+
+		/* p has its src set to *it and it is not a delete;
+		 * it will be used for in-place change, rename/copy,
+		 * or just stays there.  We cannot rename it out.
+		 */
+		return 1;
+	}
+	return 0;
+}
+
+static int diff_used_as_source(struct diff_queue_struct *q, int lim,
+			       struct diff_filespec *it)
+{
+	int i;
+	for (i = 0; i < lim; i++) {
+		struct diff_filepair *p = q->queue[i++];
+		if (!strcmp(p->one->path, it->path))
+			return 1;
+	}
+	return 0;
+}
+
 void diffcore_prune(void)
 {
 	/*
 	 * Although rename/copy detection wants to have "no-change"
 	 * entries fed into them, the downstream do not need to see
-	 * them.  This function removes such entries.
+	 * them, unless we had rename/copy for the same path earlier.
+	 * This function removes such entries.
 	 *
 	 * The applications that use rename/copy should:
 	 *
@@ -585,6 +621,7 @@ void diffcore_prune(void)
 	 * (3) call diffcore_prune
 	 * (4) call other diffcore_xxx that do not need to see
 	 *     "no-change" entries.
+	 * (5) call diff_flush().
 	 */
 	struct diff_queue_struct *q = &diff_queued_diff;
 	struct diff_queue_struct outq;
@@ -595,22 +632,21 @@ void diffcore_prune(void)
 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
-		if (!uninteresting(p))
+		if (!diff_unmodified_pair(p) ||
+		    diff_used_as_source(q, i, p->one))
 			diff_q(&outq, p);
 		else
-			diff_free_filepair(p);
+			free(p);
 	}
 	free(q->queue);
 	*q = outq;
 	return;
 }
 
-static void diff_flush_one(struct diff_filepair *p)
+static void diff_flush_one(struct diff_filepair *p, const char *msg)
 {
-	if (uninteresting(p))
-		return;
 	if (generate_patch)
-		diff_flush_patch(p);
+		diff_flush_patch(p, msg);
 	else
 		diff_flush_raw(p);
 }
@@ -618,14 +654,7 @@ static void diff_flush_one(struct diff_f
 int diff_queue_is_empty(void)
 {
 	struct diff_queue_struct *q = &diff_queued_diff;
-	int i;
-
-	for (i = 0; i < q->nr; i++) {
-		struct diff_filepair *p = q->queue[i];
-		if (!uninteresting(p))
-			return 0;
-	}
-	return 1;
+	return q->nr == 0;
 }
 
 void diff_flush(int diff_output_style)
@@ -646,13 +675,35 @@ void diff_flush(int diff_output_style)
 		generate_patch = 1;
 		break;
 	}
-	for (i = 0; i < q->nr; i++)
-		diff_flush_one(q->queue[i]);
+	for (i = 0; i < q->nr; i++) {
+		char msg_[PATH_MAX*2+200], *msg = NULL;
+		struct diff_filepair *p = q->queue[i];
+		if (strcmp(p->one->path, p->two->path)) {
+			/* This is rename or copy.  Which one is it? */
+			if (diff_needs_to_stay(q, i+1, p->one)) {
+				sprintf(msg_,
+					"similarity index %d%%\n"
+					"copy from %s\n"
+					"copy to %s\n",
+					(int)(0.5 + p->score * 100/MAX_SCORE),
+					p->one->path, p->two->path);
+			}
+			else
+				sprintf(msg_,
+					"similarity index %d%%\n"
+					"rename old %s\n"
+					"rename new %s\n",
+					(int)(0.5 + p->score * 100/MAX_SCORE),
+					p->one->path, p->two->path);
+			msg = msg_;
+		}
+		diff_flush_one(p, msg);
+	}
+
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *p = q->queue[i];
 		diff_free_filespec_data(p->one);
 		diff_free_filespec_data(p->two);
-		free(p->xfrm_msg);
 		free(p);
 	}
 	free(q->queue);
@@ -674,10 +725,10 @@ void diff_addremove(int addremove, unsig
 	 * entries to the diff-core.  They will be prefixed
 	 * with something like '=' or '*' (I haven't decided
 	 * which but should not make any difference).
-	 * Feeding the same new and old to diff_change() should
-	 * also have the same effect.  diff_flush() should
-	 * filter uninteresting ones out at the final output
-	 * stage.
+	 * Feeding the same new and old to diff_change() 
+	 * also has the same effect.  diffcore_prune() should
+	 * be used to filter uninteresting ones out before the
+	 * final output happens.
 	 */
 	if (reverse_diff)
 		addremove = (addremove == '+' ? '-' :
diff --git a/diffcore-pathspec.c b/diffcore-pathspec.c
--- a/diffcore-pathspec.c
+++ b/diffcore-pathspec.c
@@ -55,7 +55,7 @@ void diffcore_pathspec(const char **path
 		    matches_pathspec(p->two->path, spec, speccnt))
 			diff_q(&outq, p);
 		else
-			diff_free_filepair(p);
+			free(p);
 	}
 	free(q->queue);
 	*q = outq;
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -48,7 +48,7 @@ void diffcore_pickaxe(const char *needle
 			 contains(p->two, needle, len))
 			diff_q(&outq, p);
 		if (onum == outq.nr)
-			diff_free_filepair(p);
+			free(p);
 	}
 	free(q->queue);
 	*q = outq;
diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -119,27 +119,34 @@ static void record_rename_pair(struct di
 			       int rank,
 			       int score)
 {
-	/* The rank is used to sort the final output, because there
-	 * are certain dependencies.
+	/*
+	 * These ranks are used to sort the final output, because there
+	 * are certain dependencies:
 	 *
-	 *  - rank #0 depends on deleted ones.
-	 *  - rank #1 depends on kept files before they are modified.
-	 *  - rank #2 depends on kept files after they are modified;
-	 *    currently not used.
-	 *
-	 * Therefore, the final output order should be:
-	 *
-	 *  1. rank #0 rename/copy diffs.
+	 *  1. rename/copy that depends on deleted ones.
 	 *  2. deletions in the original.
-	 *  3. rank #1 rename/copy diffs.
-	 *  4. additions and modifications in the original.
-	 *  5. rank #2 rename/copy diffs; currently not used.
+	 *  3. rename/copy that depends on the pre-edit image of kept files.
+	 *  4. additions, modifications and no-modifications in the original.
+	 *  5. rename/copy that depends on the post-edit image of kept files
+	 *     (note that we currently do not detect such rename/copy).
 	 *
-	 * To achieve this sort order, we give xform_work the number
-	 * above.
+	 * The downstream diffcore transformers are free to reorder
+	 * the entries as long as they keep file pairs that has the
+	 * same p->one->path in earlier rename_rank to appear before
+	 * later ones.  This ordering is used by the diff_flush()
+	 * logic to tell renames from copies, and also used by the
+	 * diffcore_prune() logic to omit unnecessary
+	 * "no-modification" entries.
+	 *
+	 * To the final output routine, and in the diff-raw format
+	 * output, a rename/copy that is based on a path that has a
+	 * later entry that shares the same p->one->path and is not a
+	 * deletion is a copy.  Otherwise it is a rename.
 	 */
+
 	struct diff_filepair *dp = diff_queue(outq, src, dst);
-	dp->xfrm_work = (rank * 2 + 1) | (score<<RENAME_SCORE_SHIFT);
+	dp->rename_rank = rank * 2 + 1;
+	dp->score = score;
 	dst->xfrm_flags |= RENAME_DST_MATCHED;
 }
 
@@ -161,10 +168,8 @@ static void debug_filepair(const struct 
 {
 	debug_filespec(p->one, i, "one");
 	debug_filespec(p->two, i, "two");
-	fprintf(stderr, "pair flags %d, orig order %d, score %d\n",
-		(p->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1)),
-		p->orig_order,
-		(p->xfrm_work >> RENAME_SCORE_SHIFT));
+	fprintf(stderr, "pair rank %d, orig order %d, score %d\n",
+		p->rename_rank, p->orig_order, p->score);
 }
 
 static void debug_queue(const char *msg, struct diff_queue_struct *q)
@@ -191,8 +196,8 @@ static int rank_compare(const void *a_, 
 {
 	const struct diff_filepair *a = *(const struct diff_filepair **)a_;
 	const struct diff_filepair *b = *(const struct diff_filepair **)b_;
-	int a_rank = a->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
-	int b_rank = b->xfrm_work & ((1<<RENAME_SCORE_SHIFT) - 1);
+	int a_rank = a->rename_rank;
+	int b_rank = b->rename_rank;
 
 	if (a_rank != b_rank)
 		return a_rank - b_rank;
@@ -209,28 +214,6 @@ static int score_compare(const void *a_,
 	return b->score - a->score;
 }
 
-static int needs_to_stay(struct diff_queue_struct *q, int i,
-			 struct diff_filespec *it)
-{
-	/* If it will be used in later entry (either stay or used
-	 * as the source of rename/copy), we need to copy, not rename.
-	 */
-	while (i < q->nr) {
-		struct diff_filepair *p = q->queue[i++];
-		if (!DIFF_FILE_VALID(p->two))
-			continue; /* removed is fine */
-		if (strcmp(p->one->path, it->path))
-			continue; /* not relevant */
-
-		/* p has its src set to *it and it is not a delete;
-		 * it will be used for in-place change or rename/copy,
-		 * so we cannot rename it out.
-		 */
-		return 1;
-	}
-	return 0;
-}
-
 int diff_scoreopt_parse(const char *opt)
 {
 	int diglen, num, scale, i;
@@ -359,27 +342,24 @@ void diffcore_rename(int detect_rename, 
 	 * downstream, so we assign the sort keys in this loop.
 	 *
 	 * See comments at the top of record_rename_pair for numbers used
-	 * to assign xfrm_work.
-	 *
-	 * Note that we have not annotated the diff_filepair with any comment
-	 * so there is nothing other than p to free.
+	 * to assign rename_rank.
 	 */
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *dp, *p = q->queue[i];
 		if (!DIFF_FILE_VALID(p->one)) {
 			/* creation or unmerged entries */
 			dp = diff_queue(&outq, p->one, p->two);
-			dp->xfrm_work = 4;
+			dp->rename_rank = 4;
 		}
 		else if (!DIFF_FILE_VALID(p->two)) {
 			/* deletion */
 			dp = diff_queue(&outq, p->one, p->two);
-			dp->xfrm_work = 2;
+			dp->rename_rank = 2;
 		}
 		else {
 			/* modification, or stay as is */
 			dp = diff_queue(&outq, p->one, p->two);
-			dp->xfrm_work = 4;
+			dp->rename_rank = 4;
 		}
 		free(p);
 	}
@@ -415,39 +395,21 @@ void diffcore_rename(int detect_rename, 
 			/* rename or copy */
 			struct diff_filepair *dp =
 				diff_queue(q, p->one, p->two);
-			int msglen = (strlen(p->one->path) +
-				      strlen(p->two->path) + 100);
-			int score = (p->xfrm_work >> RENAME_SCORE_SHIFT);
-			dp->xfrm_msg = xmalloc(msglen);
+			dp->score = p->score;
 
 			/* if we have a later entry that is a rename/copy
 			 * that depends on p->one, then we copy here.
 			 * otherwise we rename it.
 			 */
-			if (needs_to_stay(&outq, i+1, p->one)) {
-				/* copy it */
-				sprintf(dp->xfrm_msg,
-					"similarity index %d%%\n"
-					"copy from %s\n"
-					"copy to %s\n",
-					(int)(0.5 + score * 100 / MAX_SCORE),
-					p->one->path, p->two->path);
-			}
-			else {
-				/* rename it, and mark it as gone. */
+			if (!diff_needs_to_stay(&outq, i+1, p->one))
+				/* this is the last one, so mark it as gone.
+				 */
 				p->one->xfrm_flags |= RENAME_SRC_GONE;
-				sprintf(dp->xfrm_msg,
-					"similarity index %d%%\n"
-					"rename old %s\n"
-					"rename new %s\n",
-					(int)(0.5 + score * 100 / MAX_SCORE),
-					p->one->path, p->two->path);
-			}
 		}
 		else
-			/* otherwise it is a modified (or stayed) entry */
+			/* otherwise it is a modified (or "stay") entry */
 			diff_queue(q, p->one, p->two);
-		diff_free_filepair(p);
+		free(p);
 	}
 
 	free(outq.queue);
diff --git a/diffcore.h b/diffcore.h
--- a/diffcore.h
+++ b/diffcore.h
@@ -41,13 +41,18 @@ extern void diff_free_filespec_data(stru
 struct diff_filepair {
 	struct diff_filespec *one;
 	struct diff_filespec *two;
-	char *xfrm_msg;
+	int score; /* only valid when one and two are different paths */
 	int orig_order; /* the original order of insertion into the queue */
-	int xfrm_work; /* for use by tramsformers, not by diffcore */
+	int rename_rank; /* rename/copy dependency needs to enforce
+			  * certain ordering of patches that later
+			  * diffcore transformations should not break.
+			  */
 };
 #define DIFF_PAIR_UNMERGED(p) \
 	(!DIFF_FILE_VALID((p)->one) && !DIFF_FILE_VALID((p)->two))
 
+extern int diff_unmodified_pair(struct diff_filepair *);
+
 struct diff_queue_struct {
 	struct diff_filepair **queue;
 	int alloc;
@@ -59,6 +64,8 @@ extern struct diff_filepair *diff_queue(
 					struct diff_filespec *,
 					struct diff_filespec *);
 extern void diff_q(struct diff_queue_struct *, struct diff_filepair *);
-extern void diff_free_filepair(struct diff_filepair *);
+
+extern int diff_needs_to_stay(struct diff_queue_struct *, int,
+			      struct diff_filespec *);
 
 #endif
diff --git a/t/t4003-diff-rename-1.sh b/t/t4003-diff-rename-1.sh
--- a/t/t4003-diff-rename-1.sh
+++ b/t/t4003-diff-rename-1.sh
@@ -11,7 +11,8 @@ test_description='More rename detection
 test_expect_success \
     'prepare reference tree' \
     'cat ../../COPYING >COPYING &&
-    git-update-cache --add COPYING &&
+     echo frotz >rezrov &&
+    git-update-cache --add COPYING rezrov &&
     tree=$(git-write-tree) &&
     echo $tree'
 
@@ -22,9 +23,10 @@ test_expect_success \
     rm -f COPYING &&
     git-update-cache --add --remove COPYING COPYING.?'
 
-# tree has COPYING.  work tree has COPYING.1 and COPYING.2,
-# both are slightly edited.  So we say you copy-and-edit one,
-# and rename-and-edit the other.
+# tree has COPYING and rezrov.  work tree has COPYING.1 and COPYING.2,
+# both are slightly edited, and unchanged rezrov.  So we say you
+# copy-and-edit one, and rename-and-edit the other.  We do not say
+# anything about rezrov.
 
 GIT_DIFF_OPTS=--unified=0 git-diff-cache -M -p $tree |
 sed -e 's/\([0-9][0-9]*\)/#/g' >current &&
@@ -64,9 +66,10 @@ test_expect_success \
     'mv COPYING.2 COPYING &&
      git-update-cache --add --remove COPYING COPYING.1 COPYING.2'
 
-# tree has COPYING.  work tree has COPYING and COPYING.1,
-# both are slightly edited.  So we say you edited one,
-# and copy-and-edit the other.
+# tree has COPYING and rezrov.  work tree has COPYING and COPYING.1,
+# both are slightly edited, and unchanged rezrov.  So we say you
+# edited one, and copy-and-edit the other.  We do not say
+# anything about rezrov.
 
 GIT_DIFF_OPTS=--unified=0 git-diff-cache -C -p $tree |
 sed -e 's/\([0-9][0-9]*\)/#/g' >current
@@ -103,10 +106,11 @@ test_expect_success \
     'cat ../../COPYING >COPYING &&
      git-update-cache --add --remove COPYING COPYING.1'
 
-# tree has COPYING.  work tree has the same COPYING and COPYING.1,
-# but COPYING is not edited.  We say you copy-and-edit COPYING.1;
-# this is only possible because -C mode now reports the unmodified
-# file to the diff-core.
+# tree has COPYING and rezrov.  work tree has COPYING and COPYING.1,
+# but COPYING is not edited.  We say you copy-and-edit COPYING.1; this
+# is only possible because -C mode now reports the unmodified file to
+# the diff-core.  Unchanged rezrov, although being fed to
+# git-diff-cache as well, should not be mentioned.
 
 GIT_DIFF_OPTS=--unified=0 git-diff-cache -C -p $tree |
 sed -e 's/\([0-9][0-9]*\)/#/g' >current
diff --git a/t/t4005-diff-rename-2.sh b/t/t4005-diff-rename-2.sh
new file mode 100644
--- /dev/null
+++ b/t/t4005-diff-rename-2.sh
@@ -0,0 +1,82 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='Same rename detection as t4003 but testing diff-raw.
+
+'
+. ./test-lib.sh
+
+test_expect_success \
+    'prepare reference tree' \
+    'cat ../../COPYING >COPYING &&
+     echo frotz >rezrov &&
+    git-update-cache --add COPYING rezrov &&
+    tree=$(git-write-tree) &&
+    echo $tree'
+
+test_expect_success \
+    'prepare work tree' \
+    'sed -e 's/HOWEVER/However/' <COPYING >COPYING.1 &&
+    sed -e 's/GPL/G.P.L/g' <COPYING >COPYING.2 &&
+    rm -f COPYING &&
+    git-update-cache --add --remove COPYING COPYING.?'
+
+# tree has COPYING and rezrov.  work tree has COPYING.1 and COPYING.2,
+# both are slightly edited, and unchanged rezrov.  We say COPYING.1
+# and COPYING.2 are based on COPYING, and do not say anything about
+# rezrov.
+
+git-diff-cache -M $tree >current
+
+cat >expected <<\EOF
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178	COPYING	COPYING.1
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 06c67961bbaed34a127f76d261f4c0bf73eda471	COPYING	COPYING.2
+EOF
+
+test_expect_success \
+    'validate output from rename/copy detection' \
+    'diff -u current expected'
+
+test_expect_success \
+    'prepare work tree again' \
+    'mv COPYING.2 COPYING &&
+     git-update-cache --add --remove COPYING COPYING.1 COPYING.2'
+
+# tree has COPYING and rezrov.  work tree has COPYING and COPYING.1,
+# both are slightly edited, and unchanged rezrov.  We say COPYING.1
+# is based on COPYING and COPYING is still there, and do not say anything
+# about rezrov.
+
+git-diff-cache -C $tree >current
+cat >expected <<\EOF
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178	COPYING	COPYING.1
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 06c67961bbaed34a127f76d261f4c0bf73eda471	COPYING	COPYING
+EOF
+
+test_expect_success \
+    'validate output from rename/copy detection' \
+    'diff -u current expected'
+
+test_expect_success \
+    'prepare work tree once again' \
+    'cat ../../COPYING >COPYING &&
+     git-update-cache --add --remove COPYING COPYING.1'
+
+# tree has COPYING and rezrov.  work tree has the same COPYING and
+# copy-edited COPYING.1, and unchanged rezrov.  We should see
+# unmodified COPYING in the output, so that downstream diff-helper can
+# notice.  We should not say anything about rezrov.
+
+git-diff-cache -C $tree >current
+cat >expected <<\EOF
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 0603b3238a076dc6c8022aedc6648fa523a17178	COPYING	COPYING.1
+:100644 100644 6ff87c4664981e4397625791c8ea3bbb5f2279a3 6ff87c4664981e4397625791c8ea3bbb5f2279a3	COPYING	COPYING
+EOF
+
+test_expect_success \
+    'validate output from rename/copy detection' \
+    'diff -u current expected'
+
+test_done
------------------------------------------------


^ permalink raw reply

* [PATCH] Be careful with symlinks when detecting renames and copies.
From: Junio C Hamano @ 2005-05-23  4:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: git
In-Reply-To: <7vekbzfaze.fsf@assigned-by-dhcp.cox.net>

Linus, this is *not* based on the fixed rename/copy fix I am
going to send out shortly, but comes before that due to patch
dependency (the order I happened to have done things, that is).

Please apply this one before the rename/copy fix.

------------
Earlier round was not treating symbolic links carefully enough,
and would have produced diff output that renamed/copied then
edited the contents of a symbolic link, which made no practical
sense.  Change it to detect only pure renames.

Signed-off-by: Junio C Hamano <junkio@cox.net>
---

diffcore-rename.c              |   24 ++++++++------
t/t4004-diff-rename-symlink.sh |   66 +++++++++++++++++++++++++++++++++++++++++
2 files changed, 80 insertions(+), 10 deletions(-)
new file (100755): t/t4004-diff-rename-symlink.sh

diff --git a/diffcore-rename.c b/diffcore-rename.c
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -20,7 +20,7 @@ static void diff_rename_pool_add(struct 
 				 struct diff_filespec *s)
 {
 	if (S_ISDIR(s->mode))
-		return;  /* rename/copy patch for tree does not make sense. */
+		return;  /* no trees, please */
 
 	if (pool->alloc <= pool->nr) {
 		pool->alloc = alloc_nr(pool->alloc);
@@ -71,6 +71,13 @@ static int estimate_similarity(struct di
 	unsigned long delta_size, base_size;
 	int score;
 
+	/* We deal only with regular files.  Symlink renames are handled
+	 * only when they are exact matches --- in other words, no edits
+	 * after renaming.
+	 */
+	if (!S_ISREG(src->mode) || !S_ISREG(dst->mode))
+		return 0;
+
 	delta_size = ((src->size < dst->size) ?
 		      (dst->size - src->size) : (src->size - dst->size));
 	base_size = ((src->size < dst->size) ? src->size : dst->size);
@@ -268,7 +275,7 @@ void diffcore_rename(int detect_rename, 
 		struct diff_filepair *p = q->queue[i];
 		if (!DIFF_FILE_VALID(p->one))
 			if (!DIFF_FILE_VALID(p->two))
-				continue; /* ignore nonsense */
+				continue; /* unmerged */
 			else
 				diff_rename_pool_add(&created, p->two);
 		else if (!DIFF_FILE_VALID(p->two))
@@ -360,12 +367,9 @@ void diffcore_rename(int detect_rename, 
 	for (i = 0; i < q->nr; i++) {
 		struct diff_filepair *dp, *p = q->queue[i];
 		if (!DIFF_FILE_VALID(p->one)) {
-			if (DIFF_FILE_VALID(p->two)) {
-				/* creation */
-				dp = diff_queue(&outq, p->one, p->two);
-				dp->xfrm_work = 4;
-			}
-			/* otherwise it is a nonsense; just ignore it */
+			/* creation or unmerged entries */
+			dp = diff_queue(&outq, p->one, p->two);
+			dp->xfrm_work = 4;
 		}
 		else if (!DIFF_FILE_VALID(p->two)) {
 			/* deletion */
@@ -394,7 +398,7 @@ void diffcore_rename(int detect_rename, 
 	for (i = 0; i < outq.nr; i++) {
 		struct diff_filepair *p = outq.queue[i];
 		if (!DIFF_FILE_VALID(p->one)) {
-			/* created */
+			/* created or unmerged */
 			if (p->two->xfrm_flags & RENAME_DST_MATCHED)
 				; /* rename/copy created it already */
 			else
@@ -443,7 +447,7 @@ void diffcore_rename(int detect_rename, 
 		else
 			/* otherwise it is a modified (or stayed) entry */
 			diff_queue(q, p->one, p->two);
-		free(p);
+		diff_free_filepair(p);
 	}
 
 	free(outq.queue);
diff --git a/t/t4004-diff-rename-symlink.sh b/t/t4004-diff-rename-symlink.sh
new file mode 100755
--- /dev/null
+++ b/t/t4004-diff-rename-symlink.sh
@@ -0,0 +1,66 @@
+#!/bin/sh
+#
+# Copyright (c) 2005 Junio C Hamano
+#
+
+test_description='More rename detection tests.
+
+The rename detection logic should be able to detect pure rename or
+copy of symbolic links, but should not produce rename/copy followed
+by an edit for them.
+'
+. ./test-lib.sh
+
+test_expect_success \
+    'prepare reference tree' \
+    'echo xyzzy | tr -d '\\\\'012 >yomin &&
+     ln -s xyzzy frotz &&
+    git-update-cache --add frotz yomin &&
+    tree=$(git-write-tree) &&
+    echo $tree'
+
+test_expect_success \
+    'prepare work tree' \
+    'mv frotz rezrov &&
+     rm -f yomin &&
+     ln -s xyzzy nitfol &&
+     ln -s xzzzy bozbar &&
+    git-update-cache --add --remove frotz rezrov nitfol bozbar yomin'
+
+# tree has frotz pointing at xyzzy, and yomin that contains xyzzy to
+# confuse things.  work tree has rezrov (xyzzy) nitfol (xyzzy) and
+# bozbar (xzzzy).
+# rezrov and nitfol are rename/copy of frotz and bozbar should be
+# a new creation.
+
+GIT_DIFF_OPTS=--unified=0 git-diff-cache -M -p $tree >current
+cat >expected <<\EOF
+diff --git a/frotz b/nitfol
+similarity index 100%
+copy from frotz
+copy to nitfol
+diff --git a/frotz b/rezrov
+similarity index 100%
+rename old frotz
+rename new rezrov
+diff --git a/yomin b/yomin
+deleted file mode 100644
+--- a/yomin
++++ /dev/null
+@@ -1 +0,0 @@
+-xyzzy
+\ No newline at end of file
+diff --git a/bozbar b/bozbar
+new file mode 120000
+--- /dev/null
++++ b/bozbar
+@@ -0,0 +1 @@
++xzzzy
+\ No newline at end of file
+EOF
+
+test_expect_success \
+    'validate diff output' \
+    'diff -u current expected'
+
+test_done
------------------------------------------------


^ permalink raw reply

* Re: [PATCH] The diff-raw format updates.
From: Junio C Hamano @ 2005-05-23  1:33 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505221751070.2307@ppc970.osdl.org>

Delaying the message generation to just before output is part of
the fix I've been working on, yes.  But there are some other
issues.  Give me some more time as I asked.


^ permalink raw reply

* Re: [PATCH] The diff-raw format updates.
From: Linus Torvalds @ 2005-05-23  1:07 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <7vu0kubwyg.fsf@assigned-by-dhcp.cox.net>



On Sun, 22 May 2005, Junio C Hamano wrote:
> 
>  (2) diff_flush() currently prunes _all_ the unmodified entries
>      just before output happens.  This is a _bug_ as you said.
>      To fix this, first, separate that pruning part


I don't understand why this is so hard.

The built-in diff knows to do the right thing. That means that the 
information is there, in xfrm_msg. 

Instead of making a big deal out of this, why not admit that "xfrm_msg" 
was a mistake, and it should be replaced with just a flag about what that 
thing is. Make the flag be "diff", "copy" or "rename". Add another field 
to say how good a match it was, and then let the people who now use 
xfrm_msg generate the string instead. 

The fact is, we alreadt _have_ the information, but it's been corrupted
into a string too early, making it hard to get at. No?

		Linus

^ permalink raw reply

* Re: [PATCH] The diff-raw format updates.
From: Junio C Hamano @ 2005-05-23  0:35 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505221611020.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> You seem to think that it's a feature that you can't get the same output 
LT> out of git-diff-helper, and I think that's not a feature, but a total bug.

I do agree that it is a real problem.

And I think you've filled some holes in the idea I originally
had while working on the following (the part you omitted from
the quote):

    ...  Unfortunately, the current diffcore interface does not
    let the callers (diff-tree family) to tell it to keep the
    "stay" entries in its output.  I've been working on that
    part this morning before this discussion started, so that
    electively they can tell the diffcore layer not to do the
    "stay" pruning ("stay pruning" will simply become another
    diffcore transformation).

The change I am making right now is essentially what the above
paragraph says, with your "hole-filling", to do something like
this:

 (1) diff-tree family feeds unmodified entries to the diffcore,
     in addition to changes.  diffcore-rename matches and merges
     them into rename/copy; diffcore-pickaxe filters; other
     diffcore transformations may later be written (e.g
     reordering the patch so that .h files come first before .c
     files).  The result of the sequence of these
     transformations is fed to the diff_flush() for output.
     Essentially this is the same architecture as we already
     have.

 (2) diff_flush() currently prunes _all_ the unmodified entries
     just before output happens.  This is a _bug_ as you said.
     To fix this, first, separate that pruning part (which is
     currently embedded in diff_flush_one as the first
     statement) out, so this pruning happens independently and
     before diff_flush() happens; and make the pruning more
     careful, not to prune entries that we _need_ to express
     rename/copy distinction in diff-raw output.  Namely, it
     should keep the unmodified entry for a path _if_ the path
     is to be used as the source of a rename or copy entry.
     Also diff-patch output needs to be taught to take notice
     and ignore such entries.

     Optionally, the callers may want to tell diffcore not to do
     _any_ pruning, making diff-raw output similar to a merge of
     ls-files --stage outputs of two trees.

LT> I'm arguing that we should consider it a _requirement_ that "raw diffs" 
LT> can be translated into te same thing the "-p" flag internally does.

To this, I 100% agree, and I think it is doable.  Give me a bit
more time for testing and tweaking.


^ permalink raw reply

* Re: [PATCH] The diff-raw format updates.
From: Linus Torvalds @ 2005-05-22 23:14 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <7vis1adfvq.fsf@assigned-by-dhcp.cox.net>



On Sun, 22 May 2005, Junio C Hamano wrote:
> 
>     Everything but the last are copies.  If the last one have
>     different src and dst, then it is a rename.  Otherwise it
>     is a "stay".

If so, I disagree. Totally.

You seem to think that it's a feature that you can't get the same output 
out of git-diff-helper, and I think that's not a feature, but a total bug.

> LT> resulting in:
> 
> That's not a counter-example.  You are agreeing to what I said in
> this message:

No, I'm not agreeing at all. I'm saying that this is unacceptable, and if 
this was intentional, as you seem to be saying, then it was in my opinion 
a bad idea. We might as well go back to the original diff format, which 
had other problems, but they were no worse than the new one.

Basically, with the new format as-is, renames and copies cannot be 
described sanely. That was exactly the same problem as the old format had, 
except the old format was less verbose. So why do the new format at all?

I'm arguing that we should consider it a _requirement_ that "raw diffs" 
can be translated into te same thing the "-p" flag internally does.

		Linus

^ permalink raw reply

* Re: [PATCH] The diff-raw format updates.
From: Junio C Hamano @ 2005-05-22 23:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Glanzmann, Git Mailing List
In-Reply-To: <Pine.LNX.4.58.0505221259110.2307@ppc970.osdl.org>

>>>>> "LT" == Linus Torvalds <torvalds@osdl.org> writes:

LT> On Sun, 22 May 2005, Junio C Hamano wrote:
>> 
>> I deliberatly chose not to record rename/copy distinction in diff-raw
>> --- it is to be inferred from which entry touches the src _last_.  
>> Everything but the last one to touch the same path is copy, and the last
>> one is rename.

LT> My counter-example: there is no rename at all, just a copy.

My wording was wrong.  If you do not use the helper, you should
be able to get copy and in-place edit or no-modification (I
collectively call them "stay" in diffcore-rename.c).  If it does
not work then you have spotted a bug in the implementation but
not the design.  Anyway, I should have said:

    Everything but the last are copies.  If the last one have
    different src and dst, then it is a rename.  Otherwise it
    is a "stay".

LT> Try it. I added in the "&& detect_rename < 2" thing to diff-tree.c, and I 
LT> get:

LT> 	git-whatchanged -C --root | git-diff-helper | less -S

LT> resulting in:

That's not a counter-example.  You are agreeing to what I said in
this message:

    To: Linus Torvalds <torvalds@osdl.org>
    Cc: git@vger.kernel.org
    Subject: [PATCH] Teach diff-tree to report unmodified paths for -C option.
    Date: Sat, 21 May 2005 03:11:49 -0700
    Message-ID: <7vpsvkj3bu.fsf_-_@assigned-by-dhcp.cox.net>

    ...

    Another useless comment.  For obvious reasons, there is nothing
    we can do about the diff-helper to add "the other half of copy
    detection information", because what it can tell diff-core is
    limited to its input, which usually is just differences prepared
    by somebody else, and it cannot know anything about unchanged
    files.  When I started pushing '-p' flag to diff-tree family, I
    remember that your reaction was neutral to moderately negative
    ("I'd tolerate, although I think it is redundant and you are not
    even generating diff yourself anyway" as opposed to "That's just
    great").  I think now you would thank me for shoving the diff
    interface into them ;-).

If you want the diff-helper to be able to the full scale copy
detection, you must _feed_ the full information including "stay"
entries to it.  Unfortunately, the current diffcore interface
does not let the callers (diff-tree family) to tell it to keep
the "stay" entries in its output.  I've been working on that
part this morning before this discussion started, so that
electively they can tell the diffcore layer not to do the "stay"
pruning ("stay pruning" will simply become another diffcore
transformation).

I have to leave again now, but I promise you'll hear back from
me on this one later tonight (or tomorrow evening at the latest
if things do not work out).


^ permalink raw reply

* Re: Alternate Patch: [PATCH] Don't include device number in cache invalidation when running on NFS
From: Thomas Glanzmann @ 2005-05-22 22:07 UTC (permalink / raw)
  To: GIT
In-Reply-To: <Pine.LNX.4.58.0505221451590.2307@ppc970.osdl.org>

Hello,

> This is _really_ Linux-specific afaik. Which is ok for git, but at the
> same time it really makes me go "Ewww". It's testing that the major number 
> is 0, and it would be a lot more cleaner to use 

> 	if (!major(st.st_dev))

> but even that is very Linux-specific.

I see.

> I'll have to think about it. Maybe I should just remove the st_dev check. 
> I guess inode/size/mtime/ctime should be plenty safe enough in practice.

I think so. At least I kick this one out because it is just getting on
my nerves. :-)

	Thomas

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox