git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* bisect / history preserving on rename + update
@ 2007-08-14  8:38 Thomas Gleixner
  2007-08-14  9:33 ` Karl Hasselström
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Thomas Gleixner @ 2007-08-14  8:38 UTC (permalink / raw)
  To: git

Hi,

is there a built in way to handle the following situation:

file A is renamed to B
file A is created again and new content is added.

I found only two ways to do that, which both suck:

1)
	git-mv A B
	git-add A
	git commit

	results in a copy A to B and lost history of B

2)
	git-mv A B
	git commit
	git-add A
	git commit

	preserves the history of B, but breaks bisection because
	A is needed to compile

I have no real good idea how to solve this. After staring at the git
source for a while, I think that 1) is quite hard to solve. A sane
solution for 2) might be to add a flag to the second commit, which
bundles the two commits for bisection.

Any other solutions ?

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14  8:38 bisect / history preserving on rename + update Thomas Gleixner
@ 2007-08-14  9:33 ` Karl Hasselström
  2007-08-14 10:16   ` Thomas Gleixner
  2007-08-14 10:03 ` David Kastrup
  2007-08-14 16:14 ` Linus Torvalds
  2 siblings, 1 reply; 15+ messages in thread
From: Karl Hasselström @ 2007-08-14  9:33 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: git

On 2007-08-14 10:38:01 +0200, Thomas Gleixner wrote:

> is there a built in way to handle the following situation:
>
> file A is renamed to B
> file A is created again and new content is added.
>
> I found only two ways to do that, which both suck:
>
> 1)
>       git-mv A B
>       git-add A
>       git commit
>
>       results in a copy A to B and lost history of B

What exactly do you mean by "lost history of B"? You do know that git
doesn't record renames? So you could just as well do

  $ mv A B
  $ create a new A
  $ git add A B
  $ git commit

> 2)
>       git-mv A B
>       git commit
>       git-add A
>       git commit
>
>       preserves the history of B, but breaks bisection because A is
>       needed to compile

Yes. I wouldn't recommend this option for that reason.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14  8:38 bisect / history preserving on rename + update Thomas Gleixner
  2007-08-14  9:33 ` Karl Hasselström
@ 2007-08-14 10:03 ` David Kastrup
  2007-08-14 16:14 ` Linus Torvalds
  2 siblings, 0 replies; 15+ messages in thread
From: David Kastrup @ 2007-08-14 10:03 UTC (permalink / raw)
  To: git

Thomas Gleixner <tglx@linutronix.de> writes:

> is there a built in way to handle the following situation:
>
> file A is renamed to B
> file A is created again and new content is added.
>
> I found only two ways to do that, which both suck:
>
> 1)
> 	git-mv A B
> 	git-add A
> 	git commit
>
> 	results in a copy A to B and lost history of B
>
> 2)
> 	git-mv A B
> 	git commit
> 	git-add A
> 	git commit
>
> 	preserves the history of B, but breaks bisection because
> 	A is needed to compile
>
> I have no real good idea how to solve this. After staring at the git
> source for a while, I think that 1) is quite hard to solve. A sane
> solution for 2) might be to add a flag to the second commit, which
> bundles the two commits for bisection.
>
> Any other solutions ?

You are confused, probably because something like "git-mv" exists (it
is just syntactic sugar, and it might be less confusing to users to
actually remove it).  git does _not_ track file histories.  Not the
tiniest bit.

It _constructs_ them when you ask it nicely.  All commands that
display "tracking" information have options like -M -C -R and so on
that tell git just how much effort it should spend on keeping abreast
of copying/renaming/modification.

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14  9:33 ` Karl Hasselström
@ 2007-08-14 10:16   ` Thomas Gleixner
  2007-08-14 10:50     ` Karl Hasselström
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2007-08-14 10:16 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: git

On Tue, 2007-08-14 at 11:33 +0200, Karl Hasselström wrote:
> On 2007-08-14 10:38:01 +0200, Thomas Gleixner wrote:
> 
> > is there a built in way to handle the following situation:
> >
> > file A is renamed to B
> > file A is created again and new content is added.
> >
> > I found only two ways to do that, which both suck:
> >
> > 1)
> >       git-mv A B
> >       git-add A
> >       git commit
> >
> >       results in a copy A to B and lost history of B
> 
> What exactly do you mean by "lost history of B"? You do know that git
> doesn't record renames? So you could just as well do

Err.

git-mv A B
git commit
edit B
git commit
git blame B <- shows the full history of A & B

IMHO that's why we have git-mv

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 10:16   ` Thomas Gleixner
@ 2007-08-14 10:50     ` Karl Hasselström
  2007-08-14 11:06       ` Thomas Gleixner
  0 siblings, 1 reply; 15+ messages in thread
From: Karl Hasselström @ 2007-08-14 10:50 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: git

On 2007-08-14 12:16:40 +0200, Thomas Gleixner wrote:

> On Tue, 2007-08-14 at 11:33 +0200, Karl Hasselström wrote:
>
> > What exactly do you mean by "lost history of B"? You do know that
> > git doesn't record renames? So you could just as well do
>
> Err.
>
> git-mv A B
> git commit
> edit B
> git commit
> git blame B <- shows the full history of A & B
>
> IMHO that's why we have git-mv

Try replacing

  $ git-mv A B

with

  $ mv A B
  $ git rm A
  $ git add B

The result is exactly the same. git-mv is just a convenience.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 10:50     ` Karl Hasselström
@ 2007-08-14 11:06       ` Thomas Gleixner
  2007-08-14 11:12         ` David Kastrup
  2007-08-14 11:18         ` Karl Hasselström
  0 siblings, 2 replies; 15+ messages in thread
From: Thomas Gleixner @ 2007-08-14 11:06 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: git

On Tue, 2007-08-14 at 12:50 +0200, Karl Hasselström wrote:
> > Err.
> >
> > git-mv A B
> > git commit
> > edit B
> > git commit
> > git blame B <- shows the full history of A & B
> >
> > IMHO that's why we have git-mv
> 
> Try replacing
> 
>   $ git-mv A B
> 
> with
> 
>   $ mv A B
>   $ git rm A
>   $ git add B
> 
> The result is exactly the same. git-mv is just a convenience.

Fair enough, but it still does not solve my initial problem of keeping
the history of B (former A) intact, while creating a new A which is
necessary to compile the tree, simply because I can not change #include
<A> to #include <B> for various reasons.

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 11:06       ` Thomas Gleixner
@ 2007-08-14 11:12         ` David Kastrup
  2007-08-14 11:18         ` Karl Hasselström
  1 sibling, 0 replies; 15+ messages in thread
From: David Kastrup @ 2007-08-14 11:12 UTC (permalink / raw)
  To: git

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, 2007-08-14 at 12:50 +0200, Karl Hasselström wrote:
>> > Err.
>> >
>> > git-mv A B
>> > git commit
>> > edit B
>> > git commit
>> > git blame B <- shows the full history of A & B
>> >
>> > IMHO that's why we have git-mv
>> 
>> Try replacing
>> 
>>   $ git-mv A B
>> 
>> with
>> 
>>   $ mv A B
>>   $ git rm A
>>   $ git add B
>> 
>> The result is exactly the same. git-mv is just a convenience.
>
> Fair enough, but it still does not solve my initial problem of keeping
> the history of B (former A) intact, while creating a new A which is
> necessary to compile the tree, simply because I can not change #include
> <A> to #include <B> for various reasons.

Sigh.  Please use the right options for calling your history viewing
commands.  It is not like I haven't told you that already.  For
example, take git-blame.  Its manual page clearly states:

	-M|<num>|
	   Detect moving lines in the file as well. When a commit
	   moves a block of lines in a file (e.g. the original file
	   has A and then B, and the commit changes it to B and then
	   A), traditional blame algorithm typically blames the
	   lines that were moved up (i.e. B) to the parent and
	   assigns blame to the lines that were moved down (i.e. A)
	   to the child commit. With this option, both groups of
	   lines are blamed on the parent.

	   <num> is optional but it is the lower bound on the number
	   of alphanumeric characters that git must detect as moving
	   within a file for it to associate those lines with the
	   parent commit.

	-C|<num>|
	   In addition to -M, detect lines copied from other files
	   that were modified in the same commit. This is useful
	   when you reorganize your program and move code around
	   across files. When this option is given twice, the
	   command looks for copies from all other files in the
	   parent for the commit that creates the file in addition.

	   <num> is optional but it is the lower bound on the number
	   of alphanumeric characters that git must detect as moving
	   between files for it to associate those lines with the
	   parent commit.


-- 
David Kastrup

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 11:06       ` Thomas Gleixner
  2007-08-14 11:12         ` David Kastrup
@ 2007-08-14 11:18         ` Karl Hasselström
  2007-08-14 14:19           ` Thomas Gleixner
  1 sibling, 1 reply; 15+ messages in thread
From: Karl Hasselström @ 2007-08-14 11:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: git

On 2007-08-14 13:06:59 +0200, Thomas Gleixner wrote:

> On Tue, 2007-08-14 at 12:50 +0200, Karl Hasselström wrote:
>
> > The result is exactly the same. git-mv is just a convenience.
>
> Fair enough, but it still does not solve my initial problem of
> keeping the history of B (former A) intact, while creating a new A
> which is necessary to compile the tree, simply because I can not
> change #include <A> to #include <B> for various reasons.

Have you tried running blame with -C, or -C -C? That will make it try
harder to identify lines originating from other files.

-- 
Karl Hasselström, kha@treskal.com
      www.treskal.com/kalle

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 11:18         ` Karl Hasselström
@ 2007-08-14 14:19           ` Thomas Gleixner
  2007-08-14 14:45             ` David Kastrup
  0 siblings, 1 reply; 15+ messages in thread
From: Thomas Gleixner @ 2007-08-14 14:19 UTC (permalink / raw)
  To: Karl Hasselström; +Cc: git

On Tue, 2007-08-14 at 13:18 +0200, Karl Hasselström wrote:
> On 2007-08-14 13:06:59 +0200, Thomas Gleixner wrote:
> 
> > On Tue, 2007-08-14 at 12:50 +0200, Karl Hasselström wrote:
> >
> > > The result is exactly the same. git-mv is just a convenience.
> >
> > Fair enough, but it still does not solve my initial problem of
> > keeping the history of B (former A) intact, while creating a new A
> > which is necessary to compile the tree, simply because I can not
> > change #include <A> to #include <B> for various reasons.
> 
> Have you tried running blame with -C, or -C -C? That will make it try
> harder to identify lines originating from other files.

Does not help. Strange enough it results in

# git blame include/B

b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  1) #ifndef _A_H_
b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  2) #define _A_H_
b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  3) 
b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  4) #define TEST_1 1
f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  5) #define TEST_2 2
f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  6) 
f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  7) #endif

	tglx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 14:19           ` Thomas Gleixner
@ 2007-08-14 14:45             ` David Kastrup
  0 siblings, 0 replies; 15+ messages in thread
From: David Kastrup @ 2007-08-14 14:45 UTC (permalink / raw)
  To: git

Thomas Gleixner <tglx@linutronix.de> writes:

> On Tue, 2007-08-14 at 13:18 +0200, Karl Hasselström wrote:
>> On 2007-08-14 13:06:59 +0200, Thomas Gleixner wrote:
>> 
>> > On Tue, 2007-08-14 at 12:50 +0200, Karl Hasselström wrote:
>> >
>> > > The result is exactly the same. git-mv is just a convenience.
>> >
>> > Fair enough, but it still does not solve my initial problem of
>> > keeping the history of B (former A) intact, while creating a new A
>> > which is necessary to compile the tree, simply because I can not
>> > change #include <A> to #include <B> for various reasons.
>> 
>> Have you tried running blame with -C, or -C -C? That will make it try
>> harder to identify lines originating from other files.
>
> Does not help. Strange enough it results in
>
> # git blame include/B
>
> b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  1) #ifndef _A_H_
> b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  2) #define _A_H_
> b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  3) 
> b4062b16 include/A (Joe Hacker      2007-08-14 10:52:28 +0200  4) #define TEST_1 1
> f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  5) #define TEST_2 2
> f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  6) 
> f098c4ad include/B (Thomas Gleixner 2007-08-14 16:01:05 +0200  7) #endif

So it tells you commit and corresponding file that are responsible for
the lines in question.

How does this not help?

-- 
David Kastrup

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14  8:38 bisect / history preserving on rename + update Thomas Gleixner
  2007-08-14  9:33 ` Karl Hasselström
  2007-08-14 10:03 ` David Kastrup
@ 2007-08-14 16:14 ` Linus Torvalds
  2007-08-25  4:59   ` Junio C Hamano
  2 siblings, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2007-08-14 16:14 UTC (permalink / raw)
  To: Thomas Gleixner, Junio C Hamano; +Cc: Git Mailing List



On Tue, 14 Aug 2007, Thomas Gleixner wrote:
>
> is there a built in way to handle the following situation:
> 
> file A is renamed to B
> file A is created again and new content is added.

That "should just work".

[ However, there does seem to be a bug in the "-B" logic, so it doesn't 
  actually work as well as it should! See below ]

BUT! By default, rename detection isn't on at all, mostly because it 
results in patches that non-git "patch" cannot apply, but partly also 
because it can slow certain things down.

So to get nice diffs, use

	git show -B -C

where the magic is:

 - "-B" means "break file associations when a file is *too* dissimilar" 

   Normally, git will assume that if a filename stays around, it's the 
   same file. However, with "-B", it does similarity analysis even for 
   files that are the same, and if they are very different, git will 
   decide that maybe they weren't the same file after all!

 - "-C" is "find code movement and copying".

However, nobody ever actually uses "-B" (it's so rare as to effectively 
not exist, and it does slow things down a bit), so it seems to have 
bit-rotted (or maybe it had this bug even originally: as I said, I don't 
think anybody has ever really _used_ this functionality).

Junio, look at this:

	# create a repo in "testing"
	cd
	mkdir testing
	cd testing/
	git init

	# copy a file from the git repo
	cp ~/git/revision.c .
	git add revision.c
	git commit -a -m "Add file 'A'"

	# move it around, copy another file in its stead
	git mv revision.c old-revision.c
	cp ~/git/Makefile revision.c
	git add revision.c
	git commit -a -m "Move file 'A' to 'B', create new 'A'"
	git show -B -C

and notice how "-B" *did* actually work, and we get a nice:

	diff --git a/revision.c b/old-revision.c
	similarity index 100%
	rename from revision.c
	rename to old-revision.c

but then it breaks: instead of creating the new "revision.c", we get:

	diff --git a/revision.c b/revision.c
	dissimilarity index 98%
	index 038693c..4eb4637 100644
	--- a/revision.c
	+++ b/revision.c
	@@ -1,1572 +1,1117 @@
	-#include "cache.h"
	...

which uses "reivision.c" as the base, even though it was already broken 
up! I think it *should* have looked like

	diff --git a/old-revision.c b/old-revision.c
	new file mode 100644
	index 0000000..4eb4637
	--- /dev/null
	+++ b/revision.c
	+# The default target of this Makefile is...
	...

so I think there is a bug there where the "-B" thing doesn't really 
"stick", and some part still uses the old file content even though it was 
dis-associated with the new content!

Hmm?

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-14 16:14 ` Linus Torvalds
@ 2007-08-25  4:59   ` Junio C Hamano
  2007-08-25  7:35     ` David Kastrup
  2007-08-25 15:38     ` Linus Torvalds
  0 siblings, 2 replies; 15+ messages in thread
From: Junio C Hamano @ 2007-08-25  4:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Gleixner, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

> [ However, there does seem to be a bug in the "-B" logic, so it doesn't 
>   actually work as well as it should! See below ]

I finally had a bit of time to follow this through.  After
running your set-up using revision.c and Makefile to emulate the
situation, you can try running:

	$ git diff-tree -B -C --numstat --summary HEAD

or

	$ git diff-tree -B -M --numstat --summary HEAD

which would say:

        90028d007986de4db8c3af30a2d5e5c00e5a2c8b
        0       0       revision.c => old-revision.c
        1117    1579    revision.c
         rename revision.c => old-revision.c (100%)
         rewrite revision.c (98%)

The code is working as intended (it is a different discussion if
"as intended" is actually the desired behaviour).

We take the preimage tree as a whole, and express postimage in
terms of series of patches, _however_ we do not interpret the
series of patches as _incremental_.  IOW, when we talk about the
effect of the second patch that describes the postimage of
revision.c, we pretend as if nothing happened with the first
patch (which renamed away revision.c).  So "rewrite revision.c"
is what we say, not "create revision.c anew, because the first
one renamed it away".

This behaviour actually was a bit counterintuitive to me.  I did
not implement the very original rename/copy the way we currently
operate.  It was corrected into the current behaviour, following
the guiding principle described in this message:

	http://thread.gmane.org/gmane.comp.version-control.git/3807

which is reproduced below.

From: Linus Torvalds <torvalds@osdl.org>
Date: Mon, 23 May 2005 07:49:01 -0700 (PDT)
Subject: Re: [PATCH] Make sure diff-helper can tell rename/copy in the new
 diff-raw format.
Message-ID: <Pine.LNX.4.58.0505230736180.2307@ppc970.osdl.org>

    On Mon, 23 May 2005, Junio C Hamano wrote:
    >
    > This adds tests to make sure that diff-helper can tell renames
    > from copies using the same "everything but the last one are
    > copies and the last one is either rename or stay" logic.

    Btw, I still disagree...
    ...
    For example, let's say that you have modified "fileA" _and_ you have 
    created a "fileB" that is a copy of the original "fileA" with some _other_ 
    slight modifications. We'll call the SHA1's involved "sha_A", "sha_A'" and 
    "sha_B"

    I think it's perfectly valid to say

            :100644 100644 <sha_A> <sha_A'> M	fileA	fileA
            :100644 100644 <sha_A> <sha_B> C89	fileA	fileB

    which says "fileA" was modified from orig-A to new-A, and "fileB" is a 
    copy based on orig-A.

    Now, when the above is turned into a "diff", that diff is no longer
    something you can apply "incrementally" - you have to apply it as if
    you're applying all differences to the "original tree". But the thing is,
    that's actually what I _want_, because I was planning on writing a tool
    that applies patches that applies them all-or-nothing.

    Also, it turns out that this kind of "non-incremental" diff is the kind
    that I personally want to see as a _human_, because quite frankly, my
    brain-capacity is that of a demented ocelot, and I can't _remember_ what
    happened in other parts of the diff. I much prefer the stateless "oh, this
    file X is in that relation Y to the previous version of file Z".

    I do that partly because I actually routinely edit patches. If you have 
    the incremental format, that's practically impossible, while the stateless 
    version is fine.

    See?

    So I think all the clever "don't re-use files we have modified" etc is 
    actually wrong. If you want to make a traditional diff that can be applied 
    with normal "patch", you just don't use the -M or -C flags.

                    Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-25  4:59   ` Junio C Hamano
@ 2007-08-25  7:35     ` David Kastrup
  2007-08-25 15:38     ` Linus Torvalds
  1 sibling, 0 replies; 15+ messages in thread
From: David Kastrup @ 2007-08-25  7:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Linus Torvalds, Thomas Gleixner, Git Mailing List

Junio C Hamano <gitster@pobox.com> writes:

> Linus Torvalds <torvalds@linux-foundation.org> writes:
>
>> [ However, there does seem to be a bug in the "-B" logic, so it doesn't 
>>   actually work as well as it should! See below ]
>
> I finally had a bit of time to follow this through.  After
> running your set-up using revision.c and Makefile to emulate the
> situation, you can try running:
>
> 	$ git diff-tree -B -C --numstat --summary HEAD
>
> or
>
> 	$ git diff-tree -B -M --numstat --summary HEAD
>
> which would say:
>
>         90028d007986de4db8c3af30a2d5e5c00e5a2c8b
>         0       0       revision.c => old-revision.c
>         1117    1579    revision.c
>          rename revision.c => old-revision.c (100%)
>          rewrite revision.c (98%)
>
> The code is working as intended (it is a different discussion if
> "as intended" is actually the desired behaviour).

>From reading the argument of Linus, I would say that this "stateless,
not applicable by patch" behavior is desirable in some application.
And the "sequential, applicable by patch" behavior also is desirable
in a number of applications.

So there should be an option to select those behaviors.  This has the
added advantage that the manual page will explain that option, and so
the user gets to actively pick what he wants, and gets to _think_
about this choice.

This would be strictly better than "at some point of time, we figured
that this particular way suited Linus' personal workflow best, so we
obliterated all traces of other applications from code, documentation,
discussion and thought".

> This behaviour actually was a bit counterintuitive to me.  I did
> not implement the very original rename/copy the way we currently
> operate.  It was corrected into the current behaviour, following
> the guiding principle described in this message:
>
> 	http://thread.gmane.org/gmane.comp.version-control.git/3807
>
> which is reproduced below.

I think this is a case where restricting git's operation to a single
way of doing it is limiting the range of its applications.  And having
_neither_ an option _nor_ an explanation but rather pretending that
this is the only valid way one could want this feature to work is not
going to help even those users who would, in the end, decide to choose
that behavior after all.

-- 
David Kastrup, Kriemhildstr. 15, 44793 Bochum

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-25  4:59   ` Junio C Hamano
  2007-08-25  7:35     ` David Kastrup
@ 2007-08-25 15:38     ` Linus Torvalds
  2007-08-25 17:23       ` Junio C Hamano
  1 sibling, 1 reply; 15+ messages in thread
From: Linus Torvalds @ 2007-08-25 15:38 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Thomas Gleixner, Git Mailing List



On Fri, 24 Aug 2007, Junio C Hamano wrote:
> 
> I finally had a bit of time to follow this through.  After
> running your set-up using revision.c and Makefile to emulate the
> situation, you can try running:
> 
> 	$ git diff-tree -B -C --numstat --summary HEAD
> 
> or
> 
> 	$ git diff-tree -B -M --numstat --summary HEAD
> 
> which would say:
> 
>         90028d007986de4db8c3af30a2d5e5c00e5a2c8b
>         0       0       revision.c => old-revision.c
>         1117    1579    revision.c
>          rename revision.c => old-revision.c (100%)
>          rewrite revision.c (98%)

Yeah, in that format, git behaviour actually looks really nice.

> The code is working as intended (it is a different discussion if
> "as intended" is actually the desired behaviour).

I think it may be at times.

> We take the preimage tree as a whole, and express postimage in
> terms of series of patches, _however_ we do not interpret the
> series of patches as _incremental_.

IIRC, that's not strictly true. We do have logic to make sure that the 
difference between "copy" and "rename" is that the rename happens only 
once, ie I just tested this sequence:

	mkdir test-rename
	cd test-rename/
	cp /home/torvalds/git/revision.c .
	git init
	git add .
	git commit -m "add revision.c"
	cp revision.c rev1.c
	cp revision.c rev2.c
	rm revision.c
	em rev1.c
	em rev2.c
	git add .
	git commit -a -m "rename revision.c twice"

(the two "em" calls are just me in an editor, adding a line to the top 
of the file saying "This is rev[12].c")

After that, doing a "git show -C" shows:

	diff --git a/revision.c b/rev1.c
	similarity index 99%
	copy from revision.c
	copy to rev1.c
	...
	diff --git a/revision.c b/rev2.c
	similarity index 99%
	rename from revision.c
	rename to rev2.c
	...

so we do have a notion of "incremental" in that the first is a copy, the 
second is a rename, and that the rename is expected to remove the file.

(Doing a "--stat" doesn't show the difference between copy and rename, so 
we'll just see it as

	 revision.c => rev1.c |    1 +
	 revision.c => rev2.c |    1 +

which looks pretty).

> IOW, when we talk about the effect of the second patch that describes 
> the postimage of revision.c, we pretend as if nothing happened with the 
> first patch (which renamed away revision.c).  So "rewrite revision.c" is 
> what we say, not "create revision.c anew, because the first one renamed 
> it away".

If that was consistent, then we'd have used "rename" in both cases above..

> It was corrected into the current behaviour, following the guiding 
> principle described in this message:
> 
> 	http://thread.gmane.org/gmane.comp.version-control.git/3807

Ahh, you're a wily one. Using my own words against me.

But that earlier Linus was obviously a fake impostor, since he was wrong 
(and could thus by definition not _possibly_ be the true Linus!). So your 
judo mindtrick fails.

That said, I actually think that the earlier Linus might actually be me, 
and he's right in the case he mentions: we should *not* break the 
association if it results in a good diff!

Ie, the true "guiding principle" should be the principle of minizing the 
final diff - that's how diff is supposed to act within a single file, and 
I think it's how the rename/copy detection is supposed to act too.

So:

>     I think it's perfectly valid to say
> 
>             :100644 100644 <sha_A> <sha_A'> M	fileA	fileA
>             :100644 100644 <sha_A> <sha_B> C89	fileA	fileB
> 
>     which says "fileA" was modified from orig-A to new-A, and "fileB" is a 
>     copy based on orig-A.

This is 100% consistent with "how do I minimally show the differences 
between the original and the result": we decide that we can show it as a 
"copy" and a "modification" of the original file.

But it makes sense to "copy and modify the original", but it does *not* 
make sense to "rename and modify the original". That is, after all, the 
*only* difference between copying and renaming. A copy will leave the 
original around (so that it can be modified), while a rename will not.

So, by the very definition of "rename", doing a "rename and modify the 
original" would appear to be somewhat senseless, no?

			Linus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: bisect / history preserving on rename + update
  2007-08-25 15:38     ` Linus Torvalds
@ 2007-08-25 17:23       ` Junio C Hamano
  0 siblings, 0 replies; 15+ messages in thread
From: Junio C Hamano @ 2007-08-25 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Thomas Gleixner, Git Mailing List

Linus Torvalds <torvalds@linux-foundation.org> writes:

>> It was corrected into the current behaviour, following the guiding 
>> principle described in this message:
>> 
>> 	http://thread.gmane.org/gmane.comp.version-control.git/3807
>
> Ahh, you're a wily one. Using my own words against me.

I am not being wily.  I usually do not remember nor quote too
old histories, but June 2005 was somewhat special to me.  Those
two weeks of 18-hour-straight-doing-git-and-nothing-else,
working with git and with you in particular, were what taught me
how fun open source development and working with brilliant
others is.

> Ie, the true "guiding principle" should be the principle of minizing the 
> final diff - that's how diff is supposed to act within a single file, and 
> I think it's how the rename/copy detection is supposed to act too.

Ok, I would agree with that in principle, but that would be
rather intrusive change that I am sure would have fallout to
git-apply side (and anybody who interprets "git diff" output,
especially gitweb), too.  I am not rejecting the idea, but I
won't be able to look into it myself for some time.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2007-08-25 17:23 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-08-14  8:38 bisect / history preserving on rename + update Thomas Gleixner
2007-08-14  9:33 ` Karl Hasselström
2007-08-14 10:16   ` Thomas Gleixner
2007-08-14 10:50     ` Karl Hasselström
2007-08-14 11:06       ` Thomas Gleixner
2007-08-14 11:12         ` David Kastrup
2007-08-14 11:18         ` Karl Hasselström
2007-08-14 14:19           ` Thomas Gleixner
2007-08-14 14:45             ` David Kastrup
2007-08-14 10:03 ` David Kastrup
2007-08-14 16:14 ` Linus Torvalds
2007-08-25  4:59   ` Junio C Hamano
2007-08-25  7:35     ` David Kastrup
2007-08-25 15:38     ` Linus Torvalds
2007-08-25 17:23       ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).