git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Make --follow support --find-copies-harder.
@ 2010-04-20 11:27 Bo Yang
  2010-04-21  3:05 ` Junio C Hamano
  0 siblings, 1 reply; 5+ messages in thread
From: Bo Yang @ 2010-04-20 11:27 UTC (permalink / raw)
  To: git; +Cc: torvalds, trast

'git diff --follow <commit1> <commit2> <path>' give users
the content difference of <path> between the two commits.
It will detect file copies/moves of <path> if there is any.
But with '--find-copies-harder', it does not take the
unmodified files as copy/move source. And this patch fix
this bug.

Signed-off-by: Bo Yang <struggleyb.nku@gmail.com>
---
 t/t4042-find-copies-harder.sh |   45 +++++++++++++++++++++++++++++++++++++++++
 tree-diff.c                   |    2 +
 2 files changed, 47 insertions(+), 0 deletions(-)
 create mode 100755 t/t4042-find-copies-harder.sh

diff --git a/t/t4042-find-copies-harder.sh b/t/t4042-find-copies-harder.sh
new file mode 100755
index 0000000..40d2122
--- /dev/null
+++ b/t/t4042-find-copies-harder.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Bo Yang
+#
+
+test_description='Test copy detection with --find-copies-harder in diff engine.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+echo >path0 'Line 1
+Line 2
+Line 3
+Line 4
+Line 5
+Line 6
+'
+
+test_expect_success \
+    'add a file path0 and commit.' \
+    'git add path0 &&
+     git commit -m "Add path0"'
+
+cat <path0 >path1
+test_expect_success \
+    'copy path0 to path1.' \
+    'git add path1 &&
+     git commit -m "Copy path1 from path0"'
+
+test_expect_success \
+    'find the copy path0 -> path1 harder' \
+    'git diff --follow --find-copies-harder HEAD^ HEAD path1 > current'
+cat >expected <<\EOF
+diff --git a/path0 b/path1
+similarity index 100%
+copy from path0
+copy to path1
+EOF
+
+test_expect_success \
+    'validate the output.' \
+    'compare_diff_patch current expected'
+
+test_done
diff --git a/tree-diff.c b/tree-diff.c
index fe9f52c..0dea53e 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -346,6 +346,8 @@ static void try_to_follow_renames(struct tree_desc *t1, struct tree_desc *t2, co
 
 	diff_setup(&diff_opts);
 	DIFF_OPT_SET(&diff_opts, RECURSIVE);
+	if (DIFF_OPT_TST(opt, FIND_COPIES_HARDER))
+		DIFF_OPT_SET(&diff_opts, FIND_COPIES_HARDER);
 	diff_opts.detect_rename = DIFF_DETECT_RENAME;
 	diff_opts.output_format = DIFF_FORMAT_NO_OUTPUT;
 	diff_opts.single_follow = opt->paths[0];
-- 
1.7.0.2.273.gc2413.dirty

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make --follow support --find-copies-harder.
  2010-04-20 11:27 [PATCH] Make --follow support --find-copies-harder Bo Yang
@ 2010-04-21  3:05 ` Junio C Hamano
  2010-04-21  4:17   ` Bo Yang
  2010-04-21  9:02   ` Bo Yang
  0 siblings, 2 replies; 5+ messages in thread
From: Junio C Hamano @ 2010-04-21  3:05 UTC (permalink / raw)
  To: Bo Yang; +Cc: git, torvalds, trast

Bo Yang <struggleyb.nku@gmail.com> writes:

> 'git diff --follow <commit1> <commit2> <path>' give users
> the content difference of <path> between the two commits.
> It will detect file copies/moves of <path> if there is any.

Because the "--follow" hack was done primarily as a "checkbox" item, and
also because it is not an option for the "diff" family (it is an option
for the "log" family), I would personally think that it is actually a bug
that "git diff" accepts "--follow" and pretends as if it is doing useful
work, but does so only some of the time.

    $ git diff --follow --name-status maint master -- builtin/log.c
    R089	builtin-log.c	builtin/log.c
    $ git diff --follow --name-status -R maint master -- builtin/log.c
    D	builtin/log.c
    $ git diff --follow --name-status master maint -- builtin/log.c
    D	builtin/log.c

As we can see, it doesn't quite work, and it is not a fault of 750f7b6
(Finally implement "git log --follow", 2007-06-19) by Linus, exactly
because the feature wasn't designed to work with "diff" to begin with.

If we were to add a support of "--follow" to "diff" family, I suspect that
we need to

 (1) make sure we get only one path, just like "log" family does;

 (2) add a logic to notice the reverse situation as demonstrated above and
     deal with it in a sensible way, without any --find-copies option
     given by the user.

among other things.  Also we of course need to document it as a new "diff"
option when we are done.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make --follow support --find-copies-harder.
  2010-04-21  3:05 ` Junio C Hamano
@ 2010-04-21  4:17   ` Bo Yang
  2010-04-21  9:02   ` Bo Yang
  1 sibling, 0 replies; 5+ messages in thread
From: Bo Yang @ 2010-04-21  4:17 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds, trast

On Wed, Apr 21, 2010 at 11:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
> Because the "--follow" hack was done primarily as a "checkbox" item, and
> also because it is not an option for the "diff" family (it is an option
> for the "log" family), I would personally think that it is actually a bug
> that "git diff" accepts "--follow" and pretends as if it is doing useful
> work, but does so only some of the time.

Ah, sorry for the confusion. I mean, I have found the bug when I use
git log. And take a look at:

git log --follow --find-copies-harder -p t/t4013/diff.show_--first-parent_master

This will report the file  t/t4013/diff.show_--first-parent_master as
a new file but it is copied from t/t4013/diff.show_master indeed.
'--find-copies-harder' should detect this, but it didn't. With this
patch it will find such copy and go on following
t/t4013/diff.show_master history.

And I locate the bug in the format of a diff test case and this cause
the confusion. What I really try to fix is,
1. --follow should support --find-copies-harder when using git-log
2. git-diff should support --find-copies-harder, I mean, diff should
find copies in unmodified files.

For 2, I find the --follow option works for git-diff, so I just take
consideration that it is the right way to support the
--find-copies-harder in git-diff. (and now I don't think so...) ;-)

>    $ git diff --follow --name-status maint master -- builtin/log.c
>    R089        builtin-log.c   builtin/log.c
>    $ git diff --follow --name-status -R maint master -- builtin/log.c
>    D   builtin/log.c
>    $ git diff --follow --name-status master maint -- builtin/log.c
>    D   builtin/log.c
>
> As we can see, it doesn't quite work, and it is not a fault of 750f7b6
> (Finally implement "git log --follow", 2007-06-19) by Linus, exactly
> because the feature wasn't designed to work with "diff" to begin with.

Hmm, really.

> If we were to add a support of "--follow" to "diff" family, I suspect that
> we need to
>
>  (1) make sure we get only one path, just like "log" family does;
>
>  (2) add a logic to notice the reverse situation as demonstrated above and
>     deal with it in a sensible way, without any --find-copies option
>     given by the user.

En, as above. I just want to teach git-diff to find copies among
unmodified files with '--find-copies-harder' option. Maybe, '--follow'
is the good choice to use for control whether git-diff will detect
file move/copy, and '--find-copies-harder' is the option to control
how hard we find the copies.

I will try to make this patch into two, one for fixing the git-log
--follow --find-copies-harder one, and the other try to make a sane
logic for '--follow' for git-diff.

Thanks for your advice!

Regards!
Bo
-- 
My blog: http://blog.morebits.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make --follow support --find-copies-harder.
  2010-04-21  3:05 ` Junio C Hamano
  2010-04-21  4:17   ` Bo Yang
@ 2010-04-21  9:02   ` Bo Yang
  2010-04-21  9:24     ` Junio C Hamano
  1 sibling, 1 reply; 5+ messages in thread
From: Bo Yang @ 2010-04-21  9:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: git, torvalds, trast

On Wed, Apr 21, 2010 at 11:05 AM, Junio C Hamano <gitster@pobox.com> wrote:
>
> Because the "--follow" hack was done primarily as a "checkbox" item, and
> also because it is not an option for the "diff" family (it is an option
> for the "log" family), I would personally think that it is actually a bug
> that "git diff" accepts "--follow" and pretends as if it is doing useful
> work, but does so only some of the time.
>
>    $ git diff --follow --name-status maint master -- builtin/log.c
>    R089        builtin-log.c   builtin/log.c
>    $ git diff --follow --name-status -R maint master -- builtin/log.c
>    D   builtin/log.c
>    $ git diff --follow --name-status master maint -- builtin/log.c
>    D   builtin/log.c

I am really wondering, when -R is used, how the file rename/copy
should defined? Now, I can make -R works with --follow, and it produce
something like:

byang@byang-laptop:~/git/git$ ./git diff --follow --name-status maint
master -- builtin/log.c
R089    builtin-log.c   builtin/log.c
byang@byang-laptop:~/git/git$ ./git diff --follow --name-status -R
maint master -- builtin/log.c
R089    builtin/log.c   builtin-log.c
byang@byang-laptop:~/git/git$ ./git diff --follow --name-status
master maint -- builtin/log.c
D       builtin/log.c
byang@byang-laptop:~/git/git$ ./git diff --follow --name-status -R
master maint -- builtin/log.c
A       builtin/log.c


The problem is whether it make sense to say 'builtin/log.c renamed to
builtin-log.c' when -R is given?

Regards!
Bo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] Make --follow support --find-copies-harder.
  2010-04-21  9:02   ` Bo Yang
@ 2010-04-21  9:24     ` Junio C Hamano
  0 siblings, 0 replies; 5+ messages in thread
From: Junio C Hamano @ 2010-04-21  9:24 UTC (permalink / raw)
  To: Bo Yang; +Cc: git, torvalds, trast

Bo Yang <struggleyb.nku@gmail.com> writes:

> I am really wondering, when -R is used, how the file rename/copy
> should defined? Now, I can make -R works with --follow, and it produce
> something like:
>
> byang@byang-laptop:~/git/git$ ./git diff --follow --name-status maint
> master -- builtin/log.c
> R089    builtin-log.c   builtin/log.c

As I already said, it is a bug that "diff" does not diagnose it an error
when you give it the "--follow" option.  It was designed to be used with
"log" family, and never with "diff" family.

When a command from the "log" family traverses the history, it internally
runs "diff-tree" between the commit C it is currently looking at, and its
parents C^$n.  When you give one path and --follow [*1*], it may notice
that C has the named path and C^$n doesn't.

At that point, it internally runs "diff -M C^$n C" to see if there is a
corresponding path in C^$n, and switch to follow the path it found in
the parent commit.

The logic only detects the case where "new" side has a path that "old"
side doesn't [*3*], and it is not even designed to be used with -R (where
it needs to be given a path that does not exist anymore on the "new" side
but used to exist in the "old" side).

Heck, it is not even designed to be used with "diff" as I already said
twice ;-).

Even in the context of "log", it is a hack.  It globally keeps one single
path that it follows, which obviously would not work in a history with
merges.


[Footnote]

*1* "log" command line parser enforces this "only one path" condition;
"diff" doesn't even bother catching it as an error to give "--follow", so
it lacks the logic to further catch it as an error to give more than one
paths.

*2* No, I don't think there is an interface to tweak the -M to -C or -C
-C; see tree-diff.c and look for try_to_follow_renames().  I think it is
probably Ok to make this tweakable, and I suspect that is what your patch
is about, but don't use "git diff" as an example nor in any of your tests.

*3* This is exactly why "diff --follow maint master -- builtin/log.c"
appears to do something remotely sensible (notice that the path is what
exists in the "new" side) but "maint master -- builtin-log.c" does not.
The logic doesn't even care if the named path does not appear in the "new"
side at all, because that is not useful at all in the way "log" internally
uses "diff-tree" logic.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-04-21  9:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-04-20 11:27 [PATCH] Make --follow support --find-copies-harder Bo Yang
2010-04-21  3:05 ` Junio C Hamano
2010-04-21  4:17   ` Bo Yang
2010-04-21  9:02   ` Bo Yang
2010-04-21  9:24     ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).